Tomorrow I am traveling to Heraklion, Crete, where among other things I will give a talk on Discrepancy in two Dimensions. The talk is based on a recent paper with D. Bilyk, M. Lacey and A. Vaghasrhakyan. Here follows an outline of the talk which can also be used as an easier first reading of the paper (which is admittedly quite technical).
Everything will take place in the unit cube . For we write
for the rectangle anchored at and at . Let be an -point distribution in :
Definition 1 The discrepancy function of is
The first term in the previous definition is sometimes referred to as the counting part:
Obviously this term counts the number of points of inside . We call the second term in the definition of the discrepancy function `the linear part’:
Here denotes the Lebesgue measure of a set . The linear part expresses the expected number of points in if we pick the points uniformly and independently at random.
It turns out that the `size’ of this function (in many different senses) must necessarily grow to infinity with
where as ? Some typical choices for the norm are , , , .
2.1. Koksma-Hlawka inequality
Let be a function of bounded variation (in the sense of Hardy and Krause). Then
In dimension the variation of a function can be expressed as . In higher dimensions there is an expression for involving the partial derivatives of , assuming sufficient smoothness. Observe that if we replace by in the Koksma-Hlawka inequality, we recover the definition of the discrepancy function on the left hand-side.
Roughly speaking, the Monte-Carlo numerical integration scheme consists of picking random i.i.d points inside and approximating the integral of over the unit cube by the average of over . As we shall see shortly, the discrepancy function for points selected independently and at random inside satisfies which by the Koksma-Hlawka inequality implies an error in the integration of of the order .
We shall see that there exist point distributions that satisfy . By the way, these point distributions are extremal for discrepancy; one can never do better than that. However, using this selection of points, one integrates with error of the order , which is of course a huge improvement over the i.i.d. selection of points. Again, roughly speaking, this is the main idea behind the quasi Monte-Carlo integration scheme. Here the terminology is a bit deceiving as we will see that there is nothing random about point distributions that are extremal for discrepancy.
2.2. Uniformly distributed sequences in .
Another application of two dimensional discrepancy relates to the notion of uniform distribution.
Definition 3 A sequence is said to be uniformly distributed in if
for all .
We can define the discrepancy function associated with a sequence as
Thus, the sequence is uniformly distributed in if as Of course this is just a weak, qualitative statement. It is many times desirable to replace the with an explicit estimate in .
Example 1 For , let us consider the sequence , where is the fractional part. Then we have the estimate
The consideration of uniformly distributed sequences leads naturally to the following question:
for infinitely many ?
Questions 2 and 4 are equivalent in the sense that finding a function in Question 2 gives us the existence of a function in Question 4. This is due to Roth (’54). For example, to see that Question 2 implies Question 4 for a sequence justs consider the point distribution . That was the initial motivation for studying the discrepancy function in two dimensions and it turns out that both Questions 2 and 4 have an affirmative answer with the lower bound function being and respectively.
There is a subtle distinction between discrepancy for point distributions in the unit square, and uniformly distributed sequences in , that is between the functions and . The difference is not so much in the finite/infinite setting but rather on a distinction between a `dynamic’ setting in the case of and a `static’ one in . Indeed, in the case of sequences in , the definition of depends on all the `initial segments’ simultaneously and the discrepancy changes drastically if we rearrange the terms of the sequence. No such phenomenon occurs in the static two dimensional setting where we look at the discrepancy of the point distribution for the whole set and the ordering of the points plays absolutely no role in the definition of the discrepancy. Thus a `dynamic’ problem in is equivalent to a `static’ problem in . More generally, there is always a `dynamic’ discrepancy problem in dimensions which is equivalent to a `static’ problem in dimensions for . This rectangle has area and contains points of the set . In discrepancy language this means that . As we have already mentioned, one can achieve an norm of the order which shows that lattices are very far from being extremal for discrepancy. Any other consideration of points where a very thin axis-parallel rectangle can contain `many’ points of will be also `bad ‘ for discrepancy.
3. `Bad’ and `Good’ sets for discrepancy.
3.1. Highly structured sets are bad.
Here it is implied that the `structure’ is consistent with the axis-parallel nature of the discrepancy function. Let us for example consider for example a lattice in . Consider a very thin axis parallel rectangle of the form for some .
3.2. Random sets are bad.
Let be chosen independently and uniformly in . Consider the rectangle . Then which implies that
This in turn means that with high probability. Another way to express this is to observe that by the Central limit theorem we get that
3.3. The Roth heuristic.
This expresses the empirical observation that extremal sets for discrepancy are such that each dyadic box of volume contains exactly one point of the distribution . A typical example that satisfies the Roth heuristic (and is extremal for discrepancy) is the van der Corput set of points: every dyadic box of volume contains exactly one point of the van der Corput set. We will go into a detailed analysis of the van der Corput set in the subsequent paragraphs.
4. Lower bounds for discrepancy
As we have mentioned in the introduction, no matter how clever one is when choosing the point distribution , the discrepancy must always grow to infinity. The most typical instance of this principle is the following lower bounds which are essentially optimal:
We will refer to the bound of Theorem 5 as `Roth bound’. We have the corresponding theorem in , originally due to Schmidt:
for all .
These theorems are sharp as to the order of magnitude. In particular we have the following theorem:
Constructing point distribution that are extremal for discrepancy is a delicate matter, especially in high dimensions. For more extremal constructions see [Roth '74, '80], [Chen '81].
Proof: We will work with dyadic rectangles . Each dyadic rectangle is a tensor product
where each side , , is a dyadic interval in . Each dyadic interval is divided by its midpoint to a left interval and a right interval which are also dyadic. The Haar function associated with the interval is
where . The Haar function associated with is just the corresponding tensor product
where . Observe that we adopt an normalization of the Haar functions instead of the usual normalization so our formulas will look a bit different than the standard ones. For a non-negative integer , we define
. Now a vector defines a `geometry’ of dyadic rectangles inside as follows. Define
Thus for fixed , is a collection of disjoint dyadic rectangles in , whose corresponding sides have the same lengths. The volume of each is also fixed since .
For the proof of Roth’s bound, let us now fix a non negative integer such that so that . We have that
Now let us call `good’ the collection of rectangles . Thus good rectangles are dyadic rectangles of fixed geometry defined by , which do not contain points of . It is not hard to see that if a rectangle is empty of points of then
Furthermore, a simple calculation shows that we always have
Thus if we conclude that
Furthermore, pigeonholing principle gives
We thus get
For the last equality observe that that are more or less choices for the first entries of while the last one is fixed.
The proof of Theorem 6, that is the proof of the bound is essentially the same if one knows the Littlewood-Paley inequality: Proof: } First we need a simple Lemma:
We conclude that
Proof: Suppose, for the sake of contradiction, that the conclusion is not true. Then we have
which proves the first conclusion of the lemma. Now we immediately get that
which is the second conclusion in the statement of the Lemma.
Now let’s go back to the proof of the -lower bound for the discrepancy function. Again, we fix a positive integer such that so that . For we define . Let is write for the Lebesgue measure in . We have that
by our choice of . Using the Littlewood-Paley inequalities we have, for every , that
Here is the Littlewood-Paley square function associated with . As observed before, for the `good’ rectangles R we have the estimate which means that . Since, for fixed , all the rectangles in are disjoint, we can rewrite this as
Finally, since , we can use Lemma 8 to get
which is the desired estimate.
While the -bounds for are very well understood and are known to be sharp, the endpoints and are much harder with definitive information available only in two dimensions. Even in this case, the techniques required to prove the corresponding theorems are more involved than the bounds we just described. We have:
That is, there is a jump over the average case bound.
Theorem 11 (Halton ’60) For any dimension there exist point distributions with
For any dimension there is a long-standing conjecture
Observe that because of Davenport’s construction, Theorem 7, Conjecture 12 is the best possible result one can hope for, that is if true, it will be sharp. For the bound however, it is not even clear what the right lower bound should be. We just mention one of the conjecture due to Michael Lacey:
that is, we have a jump over the average case bound.
At the moment, we do not have any matching upper bounds for Conjecture 13. The only result point towards that direction, states that we have at least some improvement over the average case bound:
where is a function of tending to as
In our recent work with Bilyk, Lacey and Vagharshakyan, we focus on the two dimensional case and prove results that smoothly interpolate between the and the norms. The lower bounds we prove are more or less consequences of the Roth bound; the main point is that these results are shown to be sharp:
for all , and this is sharp.
(ii) For dimension we have that
for all , and this is sharp.
The space is the exponential Orlicz space. Without going into a lot of details about its formal definition let’s just note here that we have
The space BMO is the dyadic BMO space defined by Chang and Fefferman. This space consists of all square integrable functions in the linear span of for which we have
Here the supremum is over all measurable subsets of , not just rectangles. Here i is useful to recall the simple observation that the BMO norm is insensitive to functions that are constant in either the vertical or horizontal direction (or both) since then the Haar functions have zero mean in each direction.
In this talk we will focus on the BMO result which is technically easier to analyze. In fact, the lower bound for the BMO norm of the Discrepancy function is just a repetition of the proof that gave us the Roth bound. The main interest is that this is sharp. In order to describe the point distribution that satisfies
we need to take a closer look at the van der Corput set.
5. The van der Corput set (vdC).
The van der Corput set of points is defined as
Abusing notation, we can rewrite the points of the van der Corput set in their binary expansion as
If we order the -coordinates of the set in increasing order,
we can also write
In order to find the that corresponds to a given , write the in its binary expansion and then reverse the first binary digits. Note that the -st digit is always the way we have set up things.
Claim 1 The van der Corput set is a binary net. Each dyadic rectangle of volume contains exactly one point of the van der Corput set.
Indeed, let us consider a generic dyadic rectangle of volume :
where , and . Let be a point of the van der Corput set, . If belongs to , the first binary digits of must be the same as the corresponding binary digits of while the first binary digits of must coincide with the firt binary digits of . Since , this uniquely defines a point of the van der Corput set.
The van der Corput set thus agrees with the Roth heuristic (one point per box); it turns out that vdC is extremal for discrepancy
Theorem 16 For the vdC set , consisting of points, we have that
Proof: First observe that we have the equivalent definition of -discrepancy:
where the supremum is over all rectangles in .
For any consider a `strip’ of the form . Points of the vdC set inside have -coordinates with their last binary digits fixed, so they can be written in the form
As a consequence, there are such points inside , equally spaces with step Subdividing the strip into boxes of length in the -direction, we see that each box has volume and contains exactly one point of the vdC set. Thus these boxes hace discrepancy zero. There is (possibly) one last box that might not contain a point, or might have smaller volume (depending on ) but for this last box we can bound its discrepancy by . Overall we get that the strip has discrepancy at most .
Now for any , consider such that . Then write the box as , where . The discrepancy of the box is at most since has volume at most and contains at most one point of the vdC set since points of the vdC set have -coordinates which differ by at least . Since any interval can be written as a disjoint union of at most dyadic intervals, we get that can be written as a union of at most strips . Summing up we get that
The claim that any interval can be written as a disjoint union of at most dyadic intervals can be proved by induction on . Indeed for this is obvious. Now suppose we know the claim for and consider any interval of the form . This can be written in the form so if is even, by the induction hypothesis we are done. Suppose now that is odd. Then we can write
and since now is even we can use the inductive hypothesis again to get a total of (at most) dyadic interval.
6. The BMO norm of the discrepancy of the van der Corput set.
In this section, we set ourselves the task to prove the following theorem:
The heart of the proof is the following Lemma that gives estimates for the Haar coefficients of the van der Corput set:
Let us postpone the proof of the Lemma for now because it is quite technical and see how we can use it in order to prove Theorem 17.
Proof: Recall that the BMO norm of the discrepancy function is given by
Here, is any measurable set. We consider two cases:
Big Volume Rectangles : We have in this case that
where we have already used the basic Lemma to estimate the Haar coefficients of the Discrepancy function. Now for fixed , the rectangles in are disjoint we are considering only the ones contained in . Since each rectangle has fixed volume , there are at most of them. Overall we get
Small Volume Rectangles : Here we treat the `linear part’ and the `counting part’ separately and just use triangle inequality. Observe that this means we can get away with that because in the small volume case there is no significant cancellation between the linear and the counting part.
For the linear part we have the estimate
The counting part is a bit more involved. We let be the family of maximal dyadic rectangles in which satisfy . Remember we are always in the small volume case so these rectangles also satisfy . Also the non-zero inner product with the haar function corresponding to implies that must contain at least one point of our set in its interior. However, the van der Corput set is a dyadic net at the scale so each such contains at most one point. This means that every contains exactly one point of the van der Corput set. Thus the counting part contribution to the BMO norm can be estimated as
where is the unique point of the vdC set contained in . Using Bessel inequality for the space we get
Combining these estimates we get that the counting part contribution is bounded by
In the inner sum we only consider rectangles with a side length . But this is justified by the fact these rectangles contain one point of the vdC set in their interior and they are dyadic. Since the points of the vdC set have -coordinates that differ by , this is only possible if the side length of the maximal rectangles is at least .
For fixed, the rectangles in the family with side have to be disjoint since they are dyadic and maximal. Since they are all contained in , the inner sum can be estimated by
Overall we get that the counting part can be estimated by
which is the desired estimate (squared).
For the proof of our basic lemma we need some auxiliary definitions that actually come in handy in several calculations and considerations of the van der Corput set. First of all, for we define to be the -th digit in the binary expansion of . Abusing notation again, if has binary expansion this means for example that and so on. Now for a which has binary digits, we define the digit scrambling or digital shift as
That is, the -th digit of changes if and stays the same if . We also need the auxiliary function defined as
where is the fractional part of . Observe that is a -periodic function. We have the following proposition which contains the basic properties of the function that we will use in what follows.
Proposition 19 Let be a dyadic rectangle of the form
where and . Let . Then
Furthermore, for every we have that
Finally, is a `Lipschitz’ function in the sense that for every ,
We can now give the proof of Lemma 18:
Proof: Let be a dyadic rectangle of the form
We write for the van der Corput set of points. We consider again two cases depending on the volume of the rectangles:
Small volume rectangles : Here we have that . This means that the rectangle can contain at most points of the van der Corput set in its interior. That is because there is only one free digit in the binary expansion of the points of the vdC set contained in and for this digit there are two choices. Thus we have
Large volume rectangles : Here we have that which means there are at least points of in the rectangle . In more detail, let us consider . Then
for the -coordinate of a point in . The first and the last binary digits of are fixed because . Now we group the points in quadruples according to their -coordinates:
There are such quadruples . We estimate
Observe that if is any one of the four points in a quadruple , then the quadruple can be described by the points
Using this representation and the properties of the function we can calculate
which by the Lipschitz property of implies that
We conclude that
This completes the proof of the lemma and thus the proof of the BMO estimate.