## The Discrepancy function in two dimensions.

Tomorrow I am traveling to Heraklion, Crete, where among other things I will give a talk on Discrepancy in two Dimensions. The talk is based on a with D. Bilyk, M. Lacey and A. Vaghasrhakyan. Here follows an outline of the talk which can also be used as an easier first reading of the paper (which is admittedly quite technical).

1. Introduction.

Everything will take place in the unit cube ${Q:= [0,1]^d \subset \mathbb R^d}$. For ${\vec x = (x_1,\ldots,x_d)\in Q}$ we write

$\displaystyle \begin{array}{rcl} [0,\vec x) := [0,x_1)\times \cdots \times [0,x_d), \end{array}$

for the rectangle anchored at ${0}$ and at ${\vec x}$. Let ${\mathcal P_N}$ be an ${N}$-point distribution in ${[0,1]^d}$:

$\displaystyle \begin{array}{rcl} \mathcal P_N = \{p_1,p_2,\ldots,p_N\} \subset [0,1]^d. \end{array}$

Definition 1 The discrepancy function of ${\mathcal P _N}$ is

$\displaystyle D_N(x):=\sharp \mathcal P_N\cap [0,\vec x) - N |[0,\vec x)|.$

The first term in the previous definition is sometimes referred to as the counting part:

$\displaystyle \mathcal C _N(x):=\sharp \mathcal P_N\cap [0,\vec x) =\sum_{p\in\mathcal P_N} \mathbf 1 _{[\vec p,\vec 1)}(x)$

Obviously this term counts the number of points of ${\mathcal P_N}$ inside ${[0,\vec x)}$. We call the second term in the definition of the discrepancy function the linear part':

$\displaystyle \mathcal L_N(x):= N|[0,\vec x)|.$

Here ${|E| }$ denotes the Lebesgue measure of a set ${E}$. The linear part expresses the expected number of points in ${[0,\vec x)}$ if we pick the points uniformly and independently at random.

It turns out that the size’ of this function (in many different senses) must necessarily grow to infinity with ${N\rightarrow \infty.}$

Question 2 Is there a function ${f_{\|\cdot\|}}$ such that

$\displaystyle \|D_N\| \gtrsim f_{\|\cdot\|}(N),$

where ${f(N)\rightarrow\infty }$ as ${N\rightarrow \infty}$? Some typical choices for the norm ${\|\cdot\|}$ are ${\|\cdot\|_{L^p}}$, ${1\leq p \leq \infty}$, ${\|\cdot\|_{\mathrm{BMO}}}$, ${\|\cdot \|_{\mathrm{exp}(L^\alpha)}}$.

2. Motivation.

2.1. Koksma-Hlawka inequality

Let ${f:[0,1]^d\rightarrow\mathbb R}$ be a function of bounded variation ${V(f)}$ (in the sense of Hardy and Krause). Then

$\displaystyle \bigg | \int_{[0,1]^d} f(x)dx - \frac{1}{N}\sum_{j=1} ^N f(p_j)\bigg |\leq V(f)\frac{\|D_N\|_\infty}{N}.$

In dimension ${d=1}$ the variation of a ${C^1}$ function can be expressed as ${V(f)=\int_{(0,1)}|f'(t)|dt}$. In higher dimensions there is an expression for ${V(f)}$ involving the partial derivatives of ${f}$, assuming sufficient smoothness. Observe that if we replace ${f}$ by ${\mathbf 1 _{[0,\vec x)}}$ in the Koksma-Hlawka inequality, we recover the definition of the discrepancy function on the left hand-side.

Roughly speaking, the Monte-Carlo numerical integration scheme consists of picking ${N}$ random i.i.d points inside ${[0,1)^d }$ and approximating the integral of ${f}$ over the unit cube by the average of ${f}$ over ${\mathcal P_N}$. As we shall see shortly, the discrepancy function for ${N}$ points selected independently and at random inside ${Q}$ satisfies ${\|D_N\|_\infty \simeq \sqrt{N}}$ which by the Koksma-Hlawka inequality implies an error in the integration of ${f}$ of the order ${O({N^{-\frac{1}{2}}})}$.

We shall see that there exist point distributions that satisfy ${\|D_N\|_\infty=O(\log N)}$. By the way, these point distributions are extremal for discrepancy; one can never do better than that. However, using this selection of points, one integrates with error of the order ${O(\log N/N)}$, which is of course a huge improvement over the i.i.d. selection of points. Again, roughly speaking, this is the main idea behind the quasi Monte-Carlo integration scheme. Here the terminology is a bit deceiving as we will see that there is nothing random about point distributions that are extremal for discrepancy.

2.2. Uniformly distributed sequences in ${[0,1]}$.

Another application of two dimensional discrepancy relates to the notion of uniform distribution.

Definition 3 A sequence ${( a_n)_{n=1} ^\infty \subset [0,1]}$ is said to be uniformly distributed in ${[0,1)}$ if

$\displaystyle \frac{\sharp \{1\leq i \leq n : a_i \in [0,x )\}-nx}{n} \rightarrow 0 \quad \mbox{as}\quad n\rightarrow \infty,$

for all ${x\in[0,1)}$.

We can define the discrepancy function ${\Delta (a,n)}$ associated with a sequence ${a=(a_n)}$ as

$\displaystyle \Delta (a,n)=\sup_{x\in[0,1)} |\sharp \{a_1,a_2,\ldots,a_n \}\cap [0,x) - nx|.$

Thus, the sequence ${(a_n)}$ is uniformly distributed in ${[0,1)}$ if ${\Delta (a,n) =o(n),}$ as ${n\rightarrow\infty.}$ Of course this is just a weak, qualitative statement. It is many times desirable to replace the ${o(n)}$ with an explicit estimate in ${n}$.

Example 1 For ${\theta \in \mathbb R\setminus \mathbb Q}$, let us consider the sequence ${u=(\{\theta n \})_{n=1} ^\infty}$, where ${\{\cdot\}}$ is the fractional part. Then we have the estimate

$\displaystyle | \Delta(u,n)| \lesssim \log n.$

The consideration of uniformly distributed sequences leads naturally to the following question:

Question 4 Is there a function ${g(n)\rightarrow\infty}$ (as ${n\rightarrow \infty}$) such that

$\displaystyle |\Delta(a,n) | \gtrsim g(n),$

for infinitely many ${n}$?

Questions 2 and 4 are equivalent in the sense that finding a function ${f_{\|\cdot\|_\infty}}$ in Question 2 gives us the existence of a function ${g}$ in Question 4. This is due to Roth (’54). For example, to see that Question 2 implies Question 4 for a sequence ${(a_j)}$ justs consider the point distribution ${\mathcal P_N =\{(j/N,a_j),\quad j=1,2,\ldots,N \}}$. That was the initial motivation for studying the discrepancy function in two dimensions and it turns out that both Questions 2 and 4 have an affirmative answer with the lower bound function being ${\log N}$ and ${\log n}$ respectively.

There is a subtle distinction between discrepancy for point distributions in the unit square, and uniformly distributed sequences in ${[0,1]}$, that is between the functions ${\| D_N\|_\infty}$ and ${\Delta(a,n)}$. The difference is not so much in the finite/infinite setting but rather on a distinction between a dynamic’ setting in the case of ${\Delta(a,n)}$ and a static’ one in ${\|D_N\|_\infty}$. Indeed, in the case of sequences in ${[0,1)}$, the definition of ${\Delta(n,a)}$ depends on all the initial segments’ ${\{a_1\},\{a_1,a_2\}, \ldots,\{a_1,a_2,\ldots a_n\}}$ simultaneously and the discrepancy changes drastically if we rearrange the terms of the sequence. No such phenomenon occurs in the static two dimensional setting where we look at the discrepancy of the point distribution ${\mathcal P_N}$ for the whole set ${\mathcal P_N}$ and the ordering of the points plays absolutely no role in the definition of the discrepancy. Thus a dynamic’ problem in ${\mathbb R}$ is equivalent to a static’ problem in ${\mathbb R^2}$. More generally, there is always a dynamic’ discrepancy problem in ${d}$ dimensions which is equivalent to a static’ problem in ${d+1 }$ dimensions for ${d\geq 1}$. This rectangle has area ${\sim \epsilon}$ and contains ${\sqrt{N}}$ points of the set ${\mathcal P_N}$. In discrepancy language this means that ${\|D_N\|_\infty \gtrsim \sqrt{N}}$. As we have already mentioned, one can achieve an ${L^\infty}$ norm of the order ${\log N}$ which shows that lattices are very far from being extremal for discrepancy. Any other consideration of points where a very thin axis-parallel rectangle can contain many’ points of ${\mathcal P_N}$ will be also bad ‘ for discrepancy.

3. Bad’ and Good’ sets for discrepancy.

3.1. Highly structured sets are bad.

Here it is implied that the structure’ is consistent with the axis-parallel nature of the discrepancy function. Let us for example consider for example a ${\sqrt{N} \times \sqrt{N} }$ lattice in ${[0,1)^2}$. Consider a very thin axis parallel rectangle of the form ${[j/\sqrt{N}-\epsilon, j/\sqrt{N}+\epsilon]\times [0,1)}$ for some ${j<\sqrt {N}}$.

Let ${X_1,\ldots,X_N}$ be chosen independently and uniformly in ${[0,1]^d}$. Consider the rectangle ${R=[0,\frac{1}{2})\times [0,1)}$. Then ${ \sharp R \cap \mathcal P_N \sim B(N,\frac{1}{2})}$ which implies that

$\displaystyle \mathop{\mathbb P} (\{ |\sharp R\cap \mathcal P _N -{N}/{2}|<\lambda \sqrt{N} \} )=O(\lambda).$

This in turn means that ${|\sharp R \cap \mathcal P_N -N|R| | \gtrsim \sqrt{N}}$ with high probability. Another way to express this is to observe that by the Central limit theorem we get that

$\displaystyle \frac{1}{N}\sum_{j=1} ^N f(X_j) = \int_{[0,1]^d}f(x)dx +O(N^{-\frac{1}{2}}).$

3.3. The Roth heuristic.

This expresses the empirical observation that extremal sets for discrepancy are such that each dyadic box of volume ${\sim 1/N}$ contains exactly one point of the distribution ${\mathcal P_N}$. A typical example that satisfies the Roth heuristic (and is extremal for discrepancy) is the van der Corput set of ${N=2^n}$ points: every dyadic box of volume ${2^{-n}}$ contains exactly one point of the van der Corput set. We will go into a detailed analysis of the van der Corput set in the subsequent paragraphs.

4. Lower bounds for discrepancy

As we have mentioned in the introduction, no matter how clever one is when choosing the point distribution ${\mathcal P_N}$, the discrepancy must always grow to infinity. The most typical instance of this principle is the following ${L^p}$ lower bounds which are essentially optimal:

Theorem 5 (Roth ’54) For all dimensions ${ d\geq 2}$ and all point distributions ${\mathcal P_N\subset [0,1)^d}$, we have that

$\displaystyle \|D_N\|_2 \gtrsim_d (\log N)^\frac{d-1}{2}.$

We will refer to the bound of Theorem 5 as Roth bound’. We have the corresponding theorem in ${L^p}$, originally due to Schmidt:

Theorem 6 (Schmidt ’77) For all dimensions ${ d\geq 2}$ and all point distributions ${\mathcal P_N\subset [0,1)^d}$, we have that

$\displaystyle \|D_N\|_p \gtrsim_{d,p} (\log N)^\frac{d-1}{2},$

for all ${1.

These theorems are sharp as to the order of magnitude. In particular we have the following theorem:

Theorem 7 (Davenport ’56) Let ${0 and ${d\geq 2}$. For any positive integer ${N}$, there exists a point distribution ${\mathcal A_N}$ such that

$\displaystyle \| D_N \|_p \lesssim_{d,p} (\log N)^\frac{d-1}{2}$

Constructing point distribution that are extremal for discrepancy is a delicate matter, especially in high dimensions. For more extremal constructions see [Roth ’74, ’80], [Chen ’81].

Proof: We will work with dyadic rectangles ${R\subseteq [0,1]^d}$. Each dyadic rectangle ${R}$ is a tensor product

$\displaystyle R=R_1\times \cdots \times R_d ,$

where each side ${R_j}$, ${j=1,2,\ldots , d}$, is a dyadic interval in ${[0,1]}$. Each dyadic interval is divided by its midpoint to a left interval ${R_j ^{\mathrm{left}}}$ and a right interval ${R_j ^{\mathrm{right}}}$ which are also dyadic. The Haar function associated with the interval ${R_j}$ is

$\displaystyle h_{R_j}(x_j)= - \mathbf 1 _{R_j ^{\mathrm{left}}} (x_j)+ \mathbf 1_{R_j ^{\mathrm{right}}}(x_j),$

where ${x_j\in [0,1]}$. The Haar function associated with ${R}$ is just the corresponding tensor product

$\displaystyle h_R(x)=h_{R_1}(x_1)\times \cdots \times h_{R_d}(x_d),$

where ${x=(x_1,\ldots,x_d)\in[0,1]^d}$. Observe that we adopt an ${L^\infty}$ normalization of the Haar functions instead of the usual ${L^2}$ normalization so our formulas will look a bit different than the standard ones. For a non-negative integer ${n}$, we define

$\displaystyle \mathbb H_n :=\{\vec r \in \mathbb N_o ^d: |\vec r |:=r_1+\cdots+r_d =n \}$

. Now a vector ${\vec r \in \mathbb H_n}$ defines a geometry’ of dyadic rectangles inside ${[0,1]^d}$ as follows. Define

$\displaystyle \mathcal R_{\vec r} :=\{R=R_1\times \cdots\times R_d \ \ \mbox{dyadic in}\ \ [0,1]^d: |R_j|=2^{-r_j},\ 1\leq j \leq d\}.$

Thus for fixed ${\vec r\in\mathbb H_n}$, ${\mathcal R_{\vec r}}$ is a collection of disjoint dyadic rectangles in ${[0,1]^d}$, whose corresponding sides have the same lengths. The volume of each ${R\in\mathcal R_{\vec r}}$ is also fixed since ${|R|=2^{-|\vec r |}}$.

For the proof of Roth’s bound, let us now fix a non negative integer ${n}$ such that ${2N so that ${n\simeq \log N}$. We have that

$\displaystyle \begin{array}{rcl} \|D_N\|_2 ^2& \geq& \sum_{R \ \mathrm{dyadic} } \frac{\langle D_N,h_R \rangle^2}{|R|} \geq \\ \\ &\geq &\sum_{ \substack {R \ \mathrm{dyadic} \\ |R|=2^{-n} \\ R\cap\mathcal P_N=\emptyset } } \frac{\langle D_N,h_R \rangle^2}{|R|} = \\ \\ &=&\sum_{\vec r \in \mathbb H_n} \sum_{\substack{R\in\mathcal R_{\vec r}\\ R\cap\mathcal P_N=\emptyset}} \frac{\langle D_N,h_R \rangle^2}{|R|}. \end{array}$

Now let us call good’ the collection of rectangles ${\mathcal R_{\vec r} ^{\mathrm{good}}:=\{R \in \mathcal R_{\vec r}, \ R\cap \mathcal P_N=\emptyset\}}$. Thus good rectangles are dyadic rectangles of fixed geometry defined by ${\vec r}$, which do not contain points of ${\mathcal P_N}$. It is not hard to see that if a rectangle ${R}$ is empty of points of ${\mathcal P_N}$ then

$\displaystyle \langle \mathcal C_N, h_R\rangle = 0.$

Furthermore, a simple calculation shows that we always have

$\displaystyle \langle \mathcal L_N, h_R\rangle = N\frac{|R|^2}{4^d}.$

Thus if ${R\in\mathcal R_{\vec r} ^{\mathrm {good}}}$ we conclude that

$\displaystyle |\langle D_N,h_R \rangle| \simeq_d N |R|^2\simeq _d\frac{1}{N}.$

Furthermore, pigeonholing principle gives

$\displaystyle (|\mathcal R_{\vec r} ^{\mathrm{good}}|+ |\mathcal R_{\vec r} ^{\mathrm{bad}}|)2^{-n}=1\Rightarrow |\mathcal R_{\vec r} ^{\mathrm{good}}|= 2^n -|\mathcal R _{\vec r} ^{\mathrm{bad}}| >2^n-N \gtrsim N.$

We thus get

$\displaystyle \begin{array}{rcl} \|D_N\|_2 &\gtrsim_{d} &\sum_{\vec r \in \mathbb H_n} \sum_ {R\in \mathcal R_{\vec r} ^{\mathrm{good}} }\frac{1}{N}\simeq _d \\ \\ &\simeq_d& \frac{1}{N}\sum_{\vec r \in\mathbb H_n} | \mathcal R_{\vec r} ^{\mathrm{good}}| \gtrsim_d \frac{1}{N} \sum_{\vec r \in\mathbb H_n } N =|\mathbb H_n|\simeq_d n^{d-1}.\end{array}$

For the last equality observe that that are more or less ${n}$ choices for the first ${d-1}$ entries of ${\vec r \in\mathbb H_n}$ while the last one is fixed. $\Box$

The proof of Theorem 6, that is the proof of the ${L^p}$ bound is essentially the same if one knows the Littlewood-Paley inequality: Proof: } First we need a simple Lemma:

Lemma 8 Suppose that ${G_1,\ldots,G_N}$ are measurable subsets of a probability space ${(\Omega,\mathop{\mathbb P})}$ with ${\mathop{\mathbb P}(G_j)>\frac{1}{2}}$ for all ${j=1,2,\ldots, N}$. Then

$\displaystyle \mathop{\mathbb P}(\{ \sum_{j=1} ^N \mathbf 1_{G_j} >\frac{N}{4} \} )\geq \frac{1}{4}.$

We conclude that

$\displaystyle \big\| \sum_{j=1} ^N \mathbf 1_{G_j}\big\|_p \gtrsim N .$

Proof: Suppose, for the sake of contradiction, that the conclusion is not true. Then we have

$\displaystyle \begin{array}{rcl} \frac{N}{2}=\int_\Omega \sum_{j=1} ^N \mathbf 1 _{G_j} \leq \frac{N}{4}+N\mathop{\mathbb P}(\{\sum_{j=1} ^N \mathbf 1_{G_j} >\frac{N}{4} \})<\frac{N}{2}, \end{array}$

which proves the first conclusion of the lemma. Now we immediately get that

$\displaystyle \begin{array}{rcl} \bigg\| \sum_{j=1} ^N \mathbf 1_{G_j} \bigg\|_p ^p \geq \frac{N^p}{4^p}\mathop{\mathbb P}(\{\sum_{j=1} ^N\mathbf 1_{G_j} >\frac{N}{4} \})\gtrsim_p N^p, \end{array}$

which is the second conclusion in the statement of the Lemma. $\Box$

Now let’s go back to the proof of the ${L^p }$-lower bound for the discrepancy function. Again, we fix a positive integer ${n}$ such that ${2N<2^n \leq 4N}$ so that ${n\simeq \log N}$. For ${\vec r \in \mathbb H_n}$ we define ${G_{\vec r} =\cup _{R\in\mathcal R_{\vec r} ^{\mathrm{good}}} R}$. Let is write ${\mathop{\mathbb P}}$ for the Lebesgue measure in ${[0,1]^d}$. We have that

$\displaystyle \begin{array}{rcl} \mathop{\mathbb P}(G_{\vec r }) &=&\sum _{R\in\mathcal R_{\vec r} ^{\mathrm{good}}}\mathop{\mathbb P}(R) =\\ \\ &=& 1-\sum _{R\in\mathcal R_{\vec r} ^{\mathrm{bad}}}\mathop{\mathbb P}(R) \geq 1 - 2^{-n} |\mathcal R_{\vec r} ^\mathrm{bad}|\geq \\ \\ &\geq& 1 - 2^{-n}N \geq 1-\frac{1}{2}\geq\frac{1}{2}, \end{array}$

by our choice of ${n}$. Using the Littlewood-Paley inequalities we have, for every ${1, that

$\displaystyle \begin{array}{rcl} \|D_N\|_p ^p \simeq_{d,p} \|S(D_N)\|_p ^p \geq \int_{[0,1]^d} \big| \sum_{\vec r \in\mathbb H_n} \sum_{R\in \mathcal R_{\vec r} ^\mathrm{good}}\frac{\langle D_N, h_R \rangle ^2}{|R|^2} \mathbf 1_{R}(x) \big|^\frac{p}{2}dx. \end{array}$

Here ${S(D_N)}$ is the Littlewood-Paley square function associated with ${D_N}$. As observed before, for the good’ rectangles R we have the estimate ${ \langle D_N,h_R\rangle \simeq_d N |R|^2, }$ which means that ${{\langle D_N, h_R \rangle ^2}/{|R|^2}\gtrsim 1}$. Since, for fixed ${\vec r}$, all the rectangles ${R}$ in ${\mathcal R_{\vec r} ^\mathrm{good}}$ are disjoint, we can rewrite this as

$\displaystyle \begin{array}{rcl} \|D_N\|_p ^p \gtrsim_{d,p} \int_{[0,1]^d} \big| \sum_{\vec r \in\mathbb H_n} \mathbf 1_{G_{\vec r}}(x) \big|^\frac{p}{2}dx. \end{array}$

Finally, since ${\mathop{\mathbb P}(G_{\vec r})\geq \frac{1}{2}}$, we can use Lemma 8 to get

$\displaystyle \begin{array}{rcl} \|D_N\|_p ^p \gtrsim_{d,p} |\mathbb H_n|^\frac{p}{2}\simeq_{d,p} n^{(d-1)\frac{p}{2}} dx, \end{array}$

which is the desired estimate. $\Box$

While the ${L^p }$-bounds for ${1 are very well understood and are known to be sharp, the endpoints ${p=1}$ and ${p=\infty}$ are much harder with definitive information available only in two dimensions. Even in this case, the techniques required to prove the corresponding theorems are more involved than the ${L^p}$ bounds we just described. We have:

Theorem 9 (Schmidt ’72) In dimension ${d=2}$ we have

$\displaystyle \|D_N\|_\infty \gtrsim \log N.$

That is, there is a ${\sqrt{\log N}}$ jump over the average case bound.

Theorem 10 (HalĆ”sz ’81) In dimension ${d=2}$ we have

$\displaystyle \|D_N\|_1 \gtrsim (\log N) ^\frac{1}{2}.$

Because of Davenport’s construction in Theorem 7(as well as other construction due to several different people), Theorem 10 is known to be sharp. For the ${L^\infty}$ case we have constructions due to Halton:

Theorem 11 (Halton ’60) For any dimension ${d\geq 2}$ there exist point distributions with

$\displaystyle \|D_N\|_\infty \lesssim_d (\log N)^{d-1}.$

For any dimension ${d\geq 3}$ there is a long-standing conjecture

Conjecture 12 (Beck, Chen) For any dimension ${d\geq 2}$ we have that

$\displaystyle \|D_N\|_1 \gtrsim_d (\log N)^\frac{d-1}{2}.$

Observe that because of Davenport’s construction, Theorem 7, Conjecture 12 is the best possible result one can hope for, that is if true, it will be sharp. For the ${L^\infty}$ bound however, it is not even clear what the right lower bound should be. We just mention one of the conjecture due to Michael Lacey:

Conjecture 13 (Lacey) For any dimension ${d\geq 2}$ we have that

$\displaystyle \|D_N\|_\infty \gtrsim_d (\log N) ^\frac{d}{2},$

that is, we have a ${(\log N)^\frac{1}{2}}$ jump over the average case bound.

At the moment, we do not have any matching upper bounds for Conjecture 13. The only result point towards that direction, states that we have at least some improvement over the average case bound:

Theorem 14 (Bilyk, Lacey, Vagharshakyan) For all dimensions ${d\geq 3 }$ we have that

$\displaystyle \|D_N\|_\infty \gtrsim_d (\log N)^{\frac{d-1}{2}+\eta(d) },$

where ${\eta(d)}$ is a function of ${d}$ tending to ${0}$ as ${d\rightarrow\infty.}$

In our recent work with Bilyk, Lacey and Vagharshakyan, we focus on the two dimensional case and prove results that smoothly interpolate between the ${L^p}$ and the ${L^\infty}$ norms. The lower bounds we prove are more or less consequences of the Roth bound; the main point is that these results are shown to be sharp:

Theorem 15 (Bilyk, Lacey, P, Vagharshakyan) (i) For dimension ${d=2}$ we have that

$\displaystyle \|D_N\|_{\mathrm{exp} (L^\alpha)} \gtrsim (\log N)^{1-\frac{1}{\alpha}} ,$

for all ${\alpha\geq 2}$, and this is sharp.

(ii) For dimension ${d=2}$ we have that

$\displaystyle \|D_N\|_{\mathrm{BMO}} \gtrsim (\log N)^\frac{1}{2} ,$

for all ${\alpha\geq 2}$, and this is sharp.

The space ${\mathrm{exp}(L^\alpha)}$ is the exponential Orlicz space. Without going into a lot of details about its formal definition let’s just note here that we have

$\displaystyle L^\infty \subset \mathrm{exp}(L^\alpha) \subset L^p,$

for all ${\alpha \geq 2 }$ and ${1\leq p <\infty}$, so the first part of theorem interpolates between the bounds of Roth (Theorem 5) and Schmidt (Theorem 9}).

The space BMO is the dyadic BMO space defined by Chang and Fefferman. This space consists of all square integrable functions ${f}$ in the linear span of ${\{h_R:R \mbox{ dyadic in } \ [0,1]^2\}}$ for which we have

$\displaystyle \|f\|_{\mathrm{BMO}}:= \sup_{U\subset[0,1]^2} \bigg[ \frac{1}{|U|}\sum_{ \substack {R\subset U \\ R \ \mathrm{ dyadic}} } \frac{\langle f,h_R \rangle^2}{|R|} \bigg]^\frac{1}{2}<\infty.$

Here the supremum is over all measurable subsets of ${[0,1]^2}$, not just rectangles. Here i is useful to recall the simple observation that the BMO norm is insensitive to functions that are constant in either the vertical or horizontal direction (or both) since then the Haar functions have zero mean in each direction.

In this talk we will focus on the BMO result which is technically easier to analyze. In fact, the lower bound for the BMO norm of the Discrepancy function is just a repetition of the proof that gave us the Roth bound. The main interest is that this is sharp. In order to describe the point distribution that satisfies

$\displaystyle \|D_N\|_{\mathrm {BMO}} \lesssim (\log N)^\frac{1}{2},$

we need to take a closer look at the van der Corput set.

5. The van der Corput set (vdC).

The van der Corput set of ${N=2^n}$ points is defined as

$\displaystyle \mathcal P_N=\{(\frac{t_1}{2}+\cdots+\frac{t_n}{2^n}+\frac{1}{2^{n+1}},\frac{t_n}{2}+\cdots+\frac{t_1}{2^n}+\frac{1}{2^{n+1}} ):t_j\in\{0,1\}, \ j=1,2,\ldots, n\}.$

Abusing notation, we can rewrite the points of the van der Corput set in their binary expansion as

$\displaystyle \mathcal P_N =\{(0.t_1t_2\ldots t_n 1,0.t_nt_{n-1}\ldots t_1 1): t_j\in\{0,1\}, \ j=1,2,\ldots,n\}.$

If we order the ${x}$-coordinates of the set in increasing order,

$\displaystyle X_1

we can also write

$\displaystyle X_j=\frac{j-1}{2^n}+\frac{1}{2^{n+1}}, \ j=1,2,\ldots,2^n.$

In order to find the ${Y}$ that corresponds to a given ${X}$, write the ${X}$ in its binary expansion and then reverse the first ${n}$ binary digits. Note that the ${n+1}$-st digit is always ${1}$ the way we have set up things.

Claim 1 The van der Corput set is a binary net. Each dyadic rectangle of volume ${2^{-n}}$ contains exactly one point of the van der Corput set.

Indeed, let us consider a generic dyadic rectangle of volume ${2^{-n}}$:

$\displaystyle R=\bigg[\frac{j}{2^k},\frac{j+1}{2^k}\bigg]\times\bigg[\frac{m}{2^l},\frac{m+1}{2^l}\bigg],$

where ${j\leq 2^k -1 }$, ${m\leq 2^l -1}$ and ${k+l=n}$. Let ${p=(u,v)}$ be a point of the van der Corput set, ${u=0.u_1\ldots u_n1,v=0.u_nu_{n-1}\ldots u_1 1}$. If ${(u,v)}$ belongs to ${R}$, the first ${k}$ binary digits of ${u}$ must be the same as the corresponding binary digits of ${j/2^k}$ while the first ${l}$ binary digits of ${v}$ must coincide with the firt ${l}$ binary digits of ${m/2^l}$. Since ${k+l=n}$, this uniquely defines a point of the van der Corput set.

The van der Corput set thus agrees with the Roth heuristic (one point per box); it turns out that vdC is extremal for discrepancy

Theorem 16 For the vdC set ${\mathcal V_n}$, consisting of ${N=2^n}$ points, we have that

$\displaystyle \|D_N\|_\infty \lesssim \log N.$

Proof: First observe that we have the equivalent definition of ${L^\infty}$-discrepancy:

$\displaystyle \|D_N\|_\infty \simeq \sup_{R} |\sharp R\cap\mathcal V_n - N|R||,$

where the supremum is over all rectangles ${R}$ in ${[0,1]^2}$.

For any ${x\in[0,1)}$ consider a strip’ ${L_x}$ of the form ${L_x=[\frac{j}{2^q},\frac{j+1}{2^q}]\times[0,x)}$. Points of the vdC set inside ${L_x}$ have ${x}$-coordinates with their last ${q}$ binary digits fixed, so they can be written in the form

$\displaystyle 0.x_1x_2\ldots x_{n-q} X_{n-q+1}\ldots X_n1 .$

As a consequence, there are ${2^{n-q}}$ such points inside ${L_x}$, equally spaces with step ${2^{q-n}}$ Subdividing the strip ${L_x}$ into boxes ${B}$ of length ${2^{q-n}}$ in the ${x}$-direction, we see that each box ${B}$ has volume ${2^{-n}}$ and contains exactly one point of the vdC set. Thus these boxes hace discrepancy zero. There is (possibly) one last box that might not contain a point, or might have smaller volume (depending on ${x}$) but for this last box we can bound its discrepancy by ${1}$. Overall we get that the strip ${L_x}$ has discrepancy at most ${1}$.

Now for any ${(x,y)\in[0,1)^2}$, consider ${k}$ such that ${2^{-n}k . Then write the box ${[0,x)\times [0,y)}$ as ${[0,x)\times [0,2^{-n}k)\cup M}$, where ${M=[0,x)\times[2^{-n}k,y)}$. The discrepancy of the box ${M}$ is at most ${1}$ since ${M}$ has volume at most ${2^{-n}}$ and contains at most one point of the vdC set since points of the vdC set have ${y}$-coordinates which differ by at least ${2^{-n}}$. Since any interval ${[0,2^{-n}k]}$ can be written as a disjoint union of at most ${n}$ dyadic intervals, we get that ${[0,x)\times [0,2^{-n}k)}$ can be written as a union of at most ${n}$ strips ${L_x}$. Summing up we get that

$\displaystyle \|D_N\|_\infty \lesssim \sharp \{\mathrm {strips} L_x \}+1\lesssim n+1 \simeq \log N .$

The claim that any interval ${[0,k2^{-n}]}$ can be written as a disjoint union of at most ${n}$ dyadic intervals can be proved by induction on ${n}$. Indeed for ${n=1}$ this is obvious. Now suppose we know the claim for ${{n-1}}$ and consider any interval of the form ${[0,k2^{-n}]}$. This can be written in the form ${[0,\frac{k}{2}2^{-(n-1)}]}$ so if ${k}$ is even, by the induction hypothesis we are done. Suppose now that ${k}$ is odd. Then we can write

$\displaystyle [0,k2^{-n}]=[0,(k-1)2^{-n}]\cup[(k-1)2^{-n},k2^{-n}],$

and since now ${k-1}$ is even we can use the inductive hypothesis again to get a total of (at most) ${n}$ dyadic interval. $\Box$

6. The BMO norm of the discrepancy of the van der Corput set.

In this section, we set ourselves the task to prove the following theorem:

Theorem 17 (B,L,P,V) For the vdC set we have

$\displaystyle \|D_N\|_{\mathrm{BMO}}\lesssim \sqrt{\log N}.$

The heart of the proof is the following Lemma that gives estimates for the Haar coefficients of the van der Corput set:

Lemma 18 (Basic Lemma) For the van der Corput set, we have the basic estimate

$\displaystyle |\langle D_N,h_R\rangle| \lesssim \frac{1}{N}.$

Let us postpone the proof of the Lemma for now because it is quite technical and see how we can use it in order to prove Theorem 17.

Proof: Recall that the BMO norm of the discrepancy function is given by

$\displaystyle \|D_N\|_{\mathrm{BMO}} = \sup_{U} \bigg[ \frac{1}{|U|}\sum_{R\subset U} \frac{|\langle D_N,h_R\rangle|^2}{|R|} \bigg]^\frac{1}{2}.$

Here, ${U\subset [0,1]^2}$ is any measurable set. We consider two cases:

Big Volume Rectangles ${|R|\geq 2^{-n}}$: We have in this case that

$\displaystyle \begin{array}{rcl} \|D_N\|_{\mathrm{BMO}}^2 \lesssim \frac{1}{|U|}\sum_{k=0} ^n \sum_{\vec r \in\mathbb H_k} \sum _{\substack{R\in\mathcal R_{\vec r} \\ R\subset U}} \frac{1}{N^2} 2^k, \end{array}$

where we have already used the basic Lemma to estimate the Haar coefficients of the Discrepancy function. Now for fixed ${\vec r}$, the rectangles in ${\mathcal R_{\vec r}}$ are disjoint we are considering only the ones contained in ${U}$. Since each rectangle has fixed volume ${2^{-k}}$, there are at most ${2^k|U|}$ of them. Overall we get

$\displaystyle \begin{array}{rcl} \|D_N\|_{\mathrm{BMO}}^2 & \lesssim \frac{1}{|U|N^2}\sum_{k=0} ^n \sum_{\vec r \in\mathbb H_k} (2^k)^2|U| \\ \\ & \lesssim \frac{1}{N^2}\sum_{k=0} ^n k (2^k)^2 \\ \\ & \lesssim \frac{1}{N^2} n(2^n)^2=n\simeq \log N. \end{array}$

Small Volume Rectangles ${|R|< 2^{-n}}$: Here we treat the linear part’ and the counting part’ separately and just use triangle inequality. Observe that this means we can get away with that because in the small volume case there is no significant cancellation between the linear and the counting part.

For the linear part we have the estimate

$\displaystyle \begin{array}{rcl} &\lesssim \frac{1}{|U|}\sum_{k=n+1} ^\infty \sum_{\vec r \in \mathbb H_k} \sum _{\substack{R\in\mathcal R_{\vec r} \\ R\subset U}} N^2 |R|^3\\ \\ & \lesssim \frac{1}{|U|} \sum_{k=n+1} ^\infty \sum_{\vec r \in \mathbb H_k} N^2 (2^{-k})^2 |U| \lesssim \\ \\ &\lesssim N^2 \sum_{k=n+1} ^\infty k (2^{-k})^2\simeq N^2 n (2^{-n})^2=n\simeq \log N. \end{array}$

The counting part is a bit more involved. We let ${\mathcal M}$ be the family of maximal dyadic rectangles ${R}$ in ${U}$ which satisfy ${\langle \mathcal C_N,h_R\rangle \neq 0}$. Remember we are always in the small volume case so these rectangles ${R}$ also satisfy ${|R|<2^{-n}}$. Also the non-zero inner product with the haar function corresponding to ${R}$ implies that ${R}$ must contain at least one point of our set in its interior. However, the van der Corput set is a dyadic net at the scale ${2^{-n}}$ so each such ${R}$ contains at most one point. This means that every ${R\in\mathcal M}$ contains exactly one point of the van der Corput set. Thus the counting part contribution to the BMO norm can be estimated as

$\displaystyle \begin{array}{rcl} \lesssim \frac{1}{|U|}\sum_{R\in\mathcal M} \sum_{R'\subset R}\frac{\langle \mathbf 1_{[\vec p,\vec 1]},h_R\rangle^2 }{|R'|}, \end{array}$

where ${\vec p}$ is the unique point of the vdC set contained in ${R\in\mathcal M}$. Using Bessel inequality for the space ${L^2(R)}$ we get

$\displaystyle \begin{array}{rcl} \sum_{R'\subset R}\frac{\langle \mathbf 1_{[\vec p,\vec 1]},h_R\rangle^2 }{|R'|}\leq \|\mathbf 1_{[\vec p,\vec 1]}\|_{L^2(R)} ^2 \leq |R|. \end{array}$

Combining these estimates we get that the counting part contribution is bounded by

$\displaystyle \begin{array}{rcl} \lesssim \frac{1}{|U|}\sum_{R\in\mathcal M} |R|=\frac{1}{|U|}\sum_{\ell =0} ^n \sum_{\substack{R=R_x\times R_y \in\mathcal M\\ |R_x|=2^{-l}}}|R|. \end{array}$

In the inner sum we only consider rectangles ${R}$ with a side length ${|R_x| \geq 2^{-n}}$. But this is justified by the fact these rectangles contain one point of the vdC set in their interior and they are dyadic. Since the points of the vdC set have ${x}$-coordinates that differ by ${2^{-n}}$, this is only possible if the side length of the maximal rectangles is at least ${2^{-n}}$.

For ${|R_x|}$ fixed, the rectangles ${R}$ in the family ${\mathcal M}$ with side ${R_x}$ have to be disjoint since they are dyadic and maximal. Since they are all contained in , ${U}$ the inner sum can be estimated by

$\displaystyle \begin{array}{rcl} \sum_{\substack{R=R_x\times R_y \in\mathcal M\\ |R_x|=2^{-l}}}|R|\lesssim |U|. \end{array}$

Overall we get that the counting part can be estimated by

$\displaystyle \frac{1}{|U|} \sum_{l=0} ^n |U| \simeq n\simeq \log N,$

which is the desired estimate (squared).$\Box$

For the proof of our basic lemma we need some auxiliary definitions that actually come in handy in several calculations and considerations of the van der Corput set. First of all, for ${x\in[0,1]}$ we define ${{\mathrm d}_i(x)}$ to be the ${i}$-th digit in the binary expansion of ${x}$. Abusing notation again, if ${x\in[0,1)}$ has binary expansion ${0.x_1\ldots x_n 1}$ this means for example that ${{\mathrm d}_1(x)=x_1}$ and so on. Now for a ${\sigma\in[0,1]}$ which has ${n}$ binary digits, we define the digit scrambling or digital shift as

$\displaystyle \begin{array}{rcl} {\mathrm d}_i(x\oplus \sigma):= {\mathrm d}_i(x)+{\mathrm d}_i(\sigma)\space \textup{ mod 2}, \quad i=1,2,\ldots,n. \end{array}$

That is, the ${i}$-th digit of ${x}$ changes if ${{\mathrm d}_i(\sigma)=1}$ and stays the same if ${{\mathrm d}_i(\sigma)=0}$. We also need the auxiliary function ${\phi:\mathbb R \rightarrow \mathbb R}$ defined as

$\displaystyle \begin{array}{rcl} \phi(x)=\begin{cases} \{x\},& \ \mbox{if} \ \{x\}\leq \frac{1}{2},\\ 1-\{x\},& \ \mbox{if} \ 1>\{x\}\geq \frac{1}{2},\end{cases} \end{array}$

where ${\{x\}}$ is the fractional part of ${x}$. Observe that ${\phi}$ is a ${1}$-periodic function. We have the following proposition which contains the basic properties of the function ${\phi}$ that we will use in what follows.

Proposition 19 Let ${R}$ be a dyadic rectangle of the form

$\displaystyle R=[\frac{j}{2^k},\frac{j+1}{2^k}]\times[\frac{m}{2^\ell},\frac{m+1}{2^\ell}],$

where ${j\leq 2^k-1}$ and ${m\leq 2^\ell -1 }$. Let ${p\in[0,1)^2}$. Then

$\displaystyle \begin{array}{rcl} \langle \mathbf 1_{[\vec p, \vec 1)},h_R\rangle=\begin{cases} |R|\phi(2^kp_x)\phi(2^\ell p_y), \ &p\in R \\ 0, \ &\mbox{otherwise}.\end{cases} \end{array}$

Furthermore, for every ${x\in{\mathbb R}}$ we have that

$\displaystyle \begin{array}{rcl} \phi(x)+\phi(x\oplus\frac{1}{2})=\frac{1}{2}. \end{array}$

Finally, ${\phi}$ is a Lipschitz’ function in the sense that for every ${x,y,\in{\mathbb R}}$,

$\displaystyle \begin{array}{rcl} |\phi(y)-\phi(x)|\leq |\{x\}-\{y\}|. \end{array}$

We can now give the proof of Lemma 18:

Proof: Let ${R}$ be a dyadic rectangle of the form

$\displaystyle R=[\frac{j}{2^k},\frac{j+1}{2^k}]\times[\frac{m}{2^\ell},\frac{m+1}{2^\ell}].$

We write ${\mathcal V_n}$ for the van der Corput set of ${N=2^n}$ points. We consider again two cases depending on the volume of the rectangles:

Small volume rectangles ${|R|<4/N}$: Here we have that ${k+\ell. This means that the rectangle ${R}$ can contain at most ${2}$ points of the van der Corput set in its interior. That is because there is only one free digit in the binary expansion of the points of the vdC set contained in ${R}$ and for this digit there are two choices. Thus we have

$\displaystyle \begin{array}{rcl} |\langle D_N,h_R \rangle| \leq N|R|^2 + \sum_{\substack{p \in\mathcal V_n \\p\in R}} |R| \phi(2^kp_x)\phi(2^\ell p_y)\lesssim\frac{1}{N}. \end{array}$

Large volume rectangles ${|R|\geq4/N}$: Here we have that ${n-(k+\ell) \geq 2}$ which means there are at least ${4}$ points of ${\mathcal V_n}$ in the rectangle ${R}$. In more detail, let us consider ${(x,y)\in R\cap \mathcal V_n}$. Then

$\displaystyle x=x_1x_2\ldots x_k *\cdots * x_{n-\ell+1}\ldots x_n,$

for the ${x}$-coordinate of a point in ${R\cap\mathcal V_n}$. The first ${k}$ and the last ${\ell}$ binary digits of ${x}$ are fixed because ${(x,y)\in R}$. Now we group the points in quadruples ${Q}$ according to their ${x}$-coordinates:

$\displaystyle \begin{array}{rcl} &0.x_1\ldots x_k \ 0 \ x_{k+2} \ldots,x_{n-l-1} \ 0\ x_{n-l+1}\ldots x_n 1,\\ &0.x_1\ldots x_k \ 0 \ x_{k+2} \ldots,x_{n-l-1} \ 1 \ x_{n-l+1}\ldots x_n 1,\\ &0.x_1\ldots x_k \ 1 \ x_{k+2} \ldots,x_{n-l-1} \ 0 \ x_{n-l+1}\ldots x_n 1,\\ &0.x_1\ldots x_k \ 1\ x_{k+2} \ldots,x_{n-l-1} \ 1\ x_{n-l+1}\ldots x_n 1. \end{array}$

There are ${M=2^{n-(k+\ell)-2}= \frac{N|R|}{4}}$ such quadruples ${Q_1,\ldots, Q_M}$. We estimate

$\displaystyle \begin{array}{rcl} \langle D_N,h_R\rangle&=&\sum_{r=1} ^M |R| \sum_{p\in Q_r} \phi(2^k p_x)\phi(2^\ell p_y)-\frac{N|R|^2}{16}=\\ \\ &=&\sum_{r=1} ^M |R|\bigg[ \sum_{p\in Q_r} \phi(2^k p_x)\phi(2^\ell p_y) -\frac{1}{4}\bigg]:=\sum_{r=1} ^M|R|(\Sigma_r - \frac{1}{4}). \end{array}$

Observe that if ${(u,v)}$ is any one of the four points in a quadruple ${Q_r}$, then the quadruple can be described by the points

$\displaystyle \begin{array}{rcl} &(u,v),\\& (u\oplus\frac{1}{2^{k+1}},v\oplus\frac{1}{2^{n-k}}),\\ & (u\oplus\frac{1}{2^{n-\ell}},v\oplus \frac{1}{2^{\ell +1}}),\\& (u\oplus\frac{1}{2^{k+1}}\oplus\frac{1}{2^{n-\ell}},v\oplus\frac{1}{2^{n-k}}\oplus\frac{1}{2^{\ell+1}}). \end{array}$

Using this representation and the properties of the function ${\phi}$ we can calculate

$\displaystyle \begin{array}{rcl} \Sigma_r=\frac{1}{4}+[\phi(2^k u)-\phi(2^k(u\oplus 2^{-n+\ell }))][\phi(2^\ell v)-\phi(2^\ell( v\oplus 2^{-n+k}))], \end{array}$

which by the Lipschitz property of ${\phi}$ implies that

$\displaystyle |\Sigma_r-\frac{1}{4}|\leq (2^{-n+k+\ell})^2 =\frac{1}{N^2|R|^2}.$

We conclude that

$\displaystyle |\langle D_N,h_R\rangle|\leq \sum_{r=1} ^M|R|\big|\Sigma_r - \frac{1}{4}\big| \leq \frac{M}{|R|N^2}\lesssim \frac{1}{N}.$

This completes the proof of the lemma and thus the proof of the BMO estimate. $\Box$