VC here stands for Vapnik and Chervonenkis, after the two authors who first introduced this concept and used it in probability theory [VaCh].

The following shifting operation is very useful: when we shift a family of sets by $x$ we remove $x$ from every set that we can, as long as this doesn't collapse two sets into one. Precisely, if $A\in\mathcal{F}$ then the shift of $A$ by $x$ is defined to be \[ S_x(A) = \begin{cases} A\backslash\{x\} &\textrm{if }x\in A\textrm{ and }A\backslash\{x\}\not\in \mathcal{F}\textrm{, and}\\ A&\textrm{otherwise.}\end{cases}\] We say that $A$ is shifted if $S_x(A)\neq A$. Extending the notation in the obvious fashion, $S_x(\mathcal{F})$ is the family that results from shifting every set down by $x$. We note the following facts:

- Shifting preserves the size of the family: $\lvert S_x(\mathcal{F})\rvert=\lvert \mathcal{F}\rvert$ for every $x$.
- If $S_x(\mathcal{F})$ shatters $X$ then $\mathcal{F}$ shatters $X$: let $X'\subset X$, and suppose $A\cap X=X'$ for some $A\in S_x(\mathcal{F})$. We claim that there is some $B\in \mathcal{F}$ such that $B\cap X=X'$. If $A\in\mathcal{F}$ then we're done; otherwise, it must be the case that $x\not\in A$ and $A\cup \{x\}\in \mathcal{F}$. If $x\not\in X$ then $(A\cup\{x\})\cap X=A\cap X=X'$. If $x\in X$ then let $B\in S_x(\mathcal{F})$ be such that $B\cap X=X'\cup \{x\}$. Since $x\in B$ and $B\in S_x(\mathcal{F})$ we must have $B\backslash\{x\}\in \mathcal{F}$ (or else $B$ would have been shifted away), and $(B\backslash\{x\})\cap X = X'$.
- If $S_x(\mathcal{F})=\mathcal{F}$ for every $x$ then $\mathcal{F}$ is closed under subsets. That is, if $A\in \mathcal{F}$ and $A'\subset A$ then $A'\in\mathcal{F}$.

Every family $\mathcal{F}$ shatters at least $\lvert \mathcal{F}\rvert$ different sets. In particular, if $\mathcal{F}\subset 2^{[n]}$ has VC-dimension at most $d$ then
\[\lvert \mathcal{F}\rvert \leq \sum_{i=0}^d \binom{n}{i}\ll (cn/d)^d\]
for some absolute constant $c>0$.

In the rest of this note I'll give an exposition of a proof of the following result of Haussler, giving an improved upper bound on the size of families with small VC-dimension under the additional assumption that this family is well-separated. My interest in this result comes from a recent application in arithmetic combinatorics by Alon, Fox, and Zhao [AlFoZh].
Haussler [Ha]

If $\mathcal{F}\subset 2^{[n]}$ has VC-dimension at most $d$ and $\lvert A\Delta B\rvert \geq k$ for all $A,B\in\mathcal{F}$ then
\[\lvert \mathcal{F}\rvert \leq (Cn/k)^d\]
for some absolute constant $C>0$.
This shows that, for fixed $A$, the vector $(A\Delta B)_{B\in \mathcal{F}}$ does not change after shifting, except that some coordinates may be transposed and $x$ may be lost from some elements of this vector, which will happen only if $A$ is shifted and $x\not\in B$.

In particular, the number of $B$ such that $\lvert A\Delta B\rvert=1$ can only increase, since $A$ shifted and $S_x(A)\Delta S_x(B)=\emptyset$ implies $S_x(A)=A\backslash \{x\}=S_x(B)=B$, which contradicts the fact that $A\backslash\{x\}\not\in\mathcal{F}$. It follows that if we let \[ F(A) = F(A;\mathcal{F}) = \#\{ B\in\mathcal{F} : \lvert A\Delta B\rvert =1\} \] then $F(A;\mathcal{F})\leq F(A;S_x(\mathcal{F}))$ for every $A\in\mathcal{F}$ and $x$. In particular, by repeated shifts, we can assume that if $\mathcal{F}$ has VC-dimension at most $d$ then $F(A)\leq d$ for every $A\in\mathcal{F}$.

Lemma 1

If the VC-dimension of $\mathcal{F}$ is at most $d$ then, for any probability distribution on $\mathcal{F}$, and any set $I$,
\[\sum_{x\in I} \textrm{Var}(1_{x\in A} \vert \cup_{y\in I\backslash \{x\}}1_{y\in A})\leq d.\]
Lemma 2

If $\mathcal{F}\subset 2^{[n]}$ is a family of sets such that $\lvert A\Delta B\rvert \geq k$ for any $A,B\in \mathcal{F}$ then, if we choose a set $A\in\mathcal{F}$ uniformly, and from a fixed set $I\subset [n]$ of size $m+1$ choose a random $x\in I$ uniformly, then
\[\mathbb{E}_{x\in I} \textrm{Var}( 1_{x\in A} \vert \cup_{y\in I\backslash\{x\}} 1_{y\in A})\gg \frac{k}{n-m}\left(1-\frac{\lvert \mathcal{F}_{I\backslash\{x\}}\rvert }{\lvert \mathcal{F}\rvert}\right).\]
Here $\mathcal{F_{J}}$ denotes the set of equivalence classes in $\mathcal{F}$, with two sets equivalent if $A\cap J=B\cap J$.
It remains to prove Lemma 2. Fix some equivalence class $C\in\mathcal{F}_{I\backslash \{x\}}$. Note that the random variable $1_{x\in A}$, conditioned on $A\in C$, is a Bernoulli random variable, with probability $p_x$, say. The variance in question is therefore $p(1-p)$. Drawing two independent samples, say $A$ and $B$, we have \[\mathbb{E}_{x\in I}2p_x(1-p_x)=\mathbb{E}_{x\in I}\mathbb{P}(1_{x\in A}\neq 1_{x\in B})\geq \frac{k}{n-m}\mathbb{P}(A\neq B)=\frac{k}{n-m}(1-1/\lvert C\rvert).\] The inequality here is true because if $A\neq B$ then they differ on at least $k$ of the $n-m$ elements they can possibly disagree on. The actual quantity we need to estimate is \[\sum_{C\in \mathcal{F}_{I\backslash\{x\}}}\frac{\lvert C\rvert}{\lvert \mathcal{F}\rvert}\mathbb{E}\textrm{Var}(1_{x\in A}\vert A\in C)\gg \frac{k}{(n-m)\lvert \mathcal{F}\rvert}\sum_{C\in\mathcal{F}_{I\backslash \{x\}}}(\lvert C\rvert-1),\] and Lemma 2 follows.

**[AlFoZh]**N. Alon, J. Fox, and Y. Zhao, 2018**[KaVe]**V. Kaimanovich and A. Vershik, Efficient arithmetic regularity and removal lemmas for induced bipartite patterns, 2018, arXiv:1801.04675**[Ha]**D. Haussler, Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik-Chervonenkis dimension, J. Combin. Theory Ser. A (1995), 217-232.**[Pa]**A. Pajor, Sous-espaces $l_1^n$ des espaces de Banaach, Travaux en Cours (1985), 16.**[Sa]**N. Sauer, On the density of families of sets, J. Combin. Theory Ser. A (1972), 145-147.**[Sh]**S. Shelah, A combinatorial problem; stability and order for models and theories in infinitary languages, Pacific Journal of Mathematics (1972), 247-261.**[VaCh]**V. Vapnik and A. Chervonenkis, The uniform convergence of frequencies of the appearance of events to their probabilities, Akademija Nauk SSSR (1971), 264-279.