Almost-Periodicity [incomplete]

A period of a function $f$ is some $t$ such that $f(x+t)=f(x)$ for all $x$. An almost-period is a $t$ such that this is almost true, so that $f(x)\approx f(x+t)$. There are a number of ways that one could make this rigorous. In this note I'll just talk about $L^p$ almost-periods, so that we ask for $\| \tau_t f - f\|_p$ to be small, where $\tau_t$ is the translation operator, $\tau_tf(x)=f(x+t)$. Knowing that a function has many almost-periods is extremely useful information, and for applications to arithmetic combinatorics, the function we often care about is a convolution, so we're seeking almost-periods of $f\ast g$. Historically, such almost-periods were found using Fourier analysis - this was the method used by Bourgain to find large arithmetic progressions in sumsets, for example. These have the disadvantage, however, that they naturally work best when dealing with $L^2$ norms, and so the dependence on $p$ as $p\to\infty$ is quite poor.

In this note I'm going to explain a more recent advance, by Croot and Sisask, which works entirely in physical space, and uses no Fourier analysis at all. This leads to much greater dependence on $p$, which in turn has led to some spectacular advances in the quantitative side of arithmetic combinatorics, particularly in the work of Sanders.

Croot-Sisask almost-periodicity[CrSi] Let $f:G\to\mathbb{R}$. If $p\geq 2$ is an even integer and $S\subset G$ has density $\sigma=\lvert S\rvert/\lvert G\rvert$ then there is a set $T$ of size $\lvert T\rvert \geq \sigma^{O(p\epsilon^{-2})}\lvert G\rvert$ such that, for all $t\in T-T$, \[ \| f\ast \mu_{S-t} - f\ast \mu_S\|_p\leq \epsilon \| f\|_p.\]
The set $T$ of almost-periods does depend on $f$ and $S$, although the lower bound for its size does not. This theorem says, roughly, that relative to convolutions with $f$, and in an $L^p$ sense, $S$ does not change under shifts by $T-T$. That such a result can hold for arbitrary sets $S$, and with $T$ not particularly sparse, is a miracle.

$L^p$ norms of random variables

It is a natural kind of estimate to control the norm of a sum of random variables by the sum of their norms. Indeed, if we have $n$ random variables say, then the triangle inequality and Holder's inequality trivially give \[\mathbb{E} \left\lvert \tfrac{1}{n}\sum_i X_i\right\rvert^{2m}\leq \tfrac{1}{n}\sum_i \mathbb{E} \lvert X_i\rvert^{2m}.\] It is a very useful fact that, when the random variables are balanced by subtracting their mean, this estimate can be improved by a factor of around $(m/n)^m$ -- which, for fixed $m$ and $n\to\infty$, becomes very significant. Let $X_i$ be any sequence of independent random variables, with mean $\mu_i$. We want to understand, for $m\geq 1$, the average \[S= \mathbb{E} \left\lvert \sum_i X_i - \mu_i \right\rvert^{2m}.\] We first unpackage the $\mu_i$, by writing $\mu_i = \mathbb{E} Y_i$, where $Y_i$ is a new random variable distributed identically (but independently) to $X_i$. Then \[S=\mathbb{E}\left\lvert \sum_i X_i - \mathbb{E} Y_i\right\rvert^{2m} = \mathbb{E} \left\lvert \mathbb{E} \sum_i X_i-Y_i\right\rvert^{2m}\leq \mathbb{E} \left\lvert \sum_i X_i-Y_i\right\rvert^{2m}.\] Next, we make the crucial observation: that since $X_i$ and $Y_i$ are identically distributed, $X_i-Y_i$ has the same distribution as $Y_i-X_i$. This means we can introduce a new source of randomness for free, by multiplying each $X_i-Y_i$ by a random $\epsilon_i\in \{-1,+1\}$. Thus \[ S = \mathbb{E} \left\lvert \sum_i \epsilon_i(X_i-Y_i)\right\rvert^{2m}\leq 2^{2m} \mathbb{E}\left\lvert \sum_i \epsilon_i(X_i-\mu_i)\right\rvert^{2m}.\] We now consider the expectation over just $\epsilon_i$, viewing the $X_i-\mu_i=x_i$ as fixed quantities. Thus \[\mathbb{E}\left\lvert \sum_i \epsilon_ix_i\right\rvert^{2m} = \sum_{k_1+\cdots+k_n=2m}\binom{2m}{k_1,\ldots,k_n}x_1^{k_1}\cdots x_n^{k_n}\mathbb{E}\epsilon_1^{k_1}\cdots \epsilon_n^{k_n}.\] The inner expectation vanishes unless each $k_i$ is even, when they are trivially one. Therefore the above quantity is exactly \[\sum_{l_1+\cdots+l_n=m}\binom{2m}{2l_1,\ldots,2l_n}x_1^{2l_1}\cdots x_n^{2l_n}\leq \frac{(2m)!}{2^mm!}\left(\sum_{i=1}^nx_i^2\right)^{m}.\] In particular, \[S\leq \frac{2^m(2m)!}{m!}\mathbb{E}\left(\sum_i (X_i-\mu_i)^2\right)^m.\] Applying Holder's inequality once more yields the crucial inequality: \[ \mathbb{E} \left\lvert \tfrac{1}{n}\sum_i X_i - \mu_i \right\rvert^{2m} \leq \left(\frac{Cm}{n}\right)^m \tfrac{1}{n}\sum_i \mathbb{E}\left\lvert X_i-\mu_i\right\rvert^{2m}.\] This non-trivial saving of $(Cm/n)^m$ is the source of all the non-trivial savings in this area.

Applications to addition

Let $f:G\to\mathbb{R}$ be any function. We'll apply the ideas of the previous section to the random variables $f(x-y)$ where $x$ is fixed and $y$ is chosen from $G$ with distribution $\mu$, say. The expected value of this is $f\ast \mu(x)$. Choose $y_1,\ldots,y_n$ independently at random according to $\mu$. Since each random variable is identically distributed, taking the average over all $x\in G$ shows that \[\mathbb{E} \| f\ast \mu_{y} - f\ast \mu\|_{2m}^{2m} \leq \left(\frac{Cm}{n}\right)^{m} \| f\|_{2m}^{2m}.\] If we choose $n\approx m\epsilon^{-2}$ then the leading term here is $\epsilon^{2m}$. Therefore, using something like Markov's inequality, with 99% probability, choosing $n$ elements from $G$ with probability $\mu$ means that the $L^p$-approximation of convolving with just these elements rather than all of $\mu$ is $\epsilon$-good: \[ \| f\ast \mu_{y} - f\ast \mu\|_{2m}\leq \epsilon \| f\|_{2m}.\]

Almost-periods

We want to find some set of $t$ such that $\tau_t(f\ast \mu)$ is close to $f\ast \mu(x)$ in $L^p$-norm. The random sampling ideas we've used this far have given us many vectors $y\in G^n$ such that $f\ast \mu$ is very close to $f\ast \mu_y$ in $L^p$-norm. An important fact is that this implies that $\tau_t(f\ast \mu)$ is close to $f\ast \mu_{y+t}$ in $L^p$-norm. In particular, if both $y$ and $y+t$ are good, then we're done by the triangle inequality. If $\mu=\mu_S$ then we know that $99\%$ of the $n$-tuples in $S^n$ are good. The size of $\lvert S^n+A \rvert$ is at most $\lvert S+A\rvert^n\leq K^n\lvert S\rvert^n$. Therefore, there is a set of $K^{-n}\lvert A\rvert$ many $t\in A$ such that there exists $y$ such that $y+t$ are good for all $t\in T$.