~~https://noppa.aalto.fi/noppa/kurssi/mat-1.c/harmonic_analysis~~

the notes can now be found here

https://drive.google.com/open?id=0B7t_mQHDlsRsdnZkSE9LbndfOHM

and they’re slightly updated.

]]>

**1. Notions of Discrepancy **

I will begin this post by briefly recalling some notions of discrepancy that have been extensively studied in the literature, namely geometric and combinatorial discrepancy. I will close this introduction by describing the set-up for our current topic, that is discrepancy of sets on checkerboards. This notion was introduced by Kolountzakis in [K]. Although the notion has some characteristics in common with the two notions of discrepancy just mentioned, it can not be put in the context of either one. Thus some new definitions and investigations are necessary.

** 1.1. Geometric Discrepancy **

This refers to the following set-up. Consider the unit cube and let

be an -point set in . Now consider to be a family of sets such as rectangles, circles, convex polygons and so on. We are interested in studying the *discrepancy* of the point set with respect to the family . For we set

Here denotes the Lebesgue measure of a set . We are interested in the absolute value of the quantity above which describes the difference of the actual number of points of in minus the expected number of points in . Thus only the part of the set that lies inside is relevant. The -discrepancy is just the supremum over all sets :

A point distribution with `small’ discrepancy can be used to approximate the volume of any by the average . Let us now give a more concrete example by specializing the family to be the family of all rectangles contained in with one corner anchored at the origin and another corner in the interior of . For we write

for the rectangle anchored at and at . Let be an -point distribution in . The *star* or *corner* discrepancy is then defined as:

It turns out that the `size’ of this function (in many different senses) must necessarily grow to infinity with The most typical manifestation of this claim is the following lower bounds which are essentially optimal:

Theorem 1 (Roth ’54)For all dimensions and all point distributions , we have that

We have the corresponding theorem in , originally due to Schmidt:

Theorem 2 (Schmidt ’77)For all dimensions and all point distributions , we have thatfor all .

These theorems are sharp as to the order of magnitude. In particular we have the following theorem:

Theorem 3 (Davenport ’56)Let and . For any positive integer , there exists a point distribution such that

Constructing point distribution that are extremal for discrepancy is a delicate matter, especially in high dimensions. For more extremal constructions see [Roth ’74, ’80], [Chen ’81].

A nice motivating application of geometric discrepancy is the *Koksma-Hlawka inequality *. Let be a function of *bounded variation* (in the sense of Hardy and Krause). Then

In dimension the variation of a function can be expressed as . In higher dimensions there is an expression for involving the partial derivatives of , assuming sufficient smoothness. Observe that if we replace by in the Koksma-Hlawka inequality, we recover the definition of the discrepancy function on the left hand-side.

Remark 1 (Highly structured sets are bad for discrepancy)

Here it is implied that the `structure’ is consistent with the axis-parallel nature of the discrepancy function. Let us for example consider for example a lattice in . Considering a very thin axis parallel rectangle of the form for some we see that , much worse than the bound from Davenport’s theorem.

Remark 2 (Random sets are bad)Let be chosen independently and uniformly in . Consider the rectangle . Then

which implies thatThis in turn means that with high probability. Another way to express this is to observe that by the Central limit theorem we get that

The discrepancy of a random set is thus of the order , again much larger than the optimal point distribution discrepancy .

For a much more detailed discussion on star discrepancy you can also check this older post on my blog.

** 1.2. Combinatorial discrepancy **

Let be a set of elements and be a family consisting of subsets of . A *coloring* of is a function and let us agree that corresponds to white colored points and corresponds to black colored points of . Given a family the goal here is to color the elements of in such a way so that any of the sets in contains roughly the same number of `black points’ and `white points’. As in the geometric set-up it turns out that this is in most of the interesting cases not possible.

The relevant *combinatorial discrepancy* is defined as

The corresponding discrepancy is then

Of course it makes sense to consider discrepancies in the obvious way. To give an example, if one considers as the family of *all* subsets of then the corresponding discrepancy is large, at least , no matter how clever we are in coloring the elements of . That is, there exists a subset of with

uniformly for all colorings On the other hand, a good upper bound is obtained by coloring the set randomly, that is, each is colored black or white with probability and independently of other points of . Then for any system of subsets of we have

for all sets simultaneously with probability at least .

** 1.3. Discrepancy for checkerboard colorings **

Consider the Euclidean plane as a union of congruent cells

where . Now color each square in the union, black or white, in an arbitrary manner. This amounts to defining a function , where is constant on each one of the cells . We will call such a coloring a *checkerboard coloring* of the (infinite) Euclidean plane. Now let be a family of sets and for simplicity let us assume that consists of line segments. The question is whether one can choose a checkerboard coloring such that, for any line segment , the `white length’ of minus its `black length’ is bounded uniformly by an absolute constant. Note that the corresponding discrepancy here can be defined as

where is the arc-length measure on the segment , while the discrepancy

Like in many other similar questions in discrepancy theory it turns out that this is not possible. In particular there exist arbitrarily long line segments such that the discrepancy is at least for all checkerboard colorings of the plane. This is essentially best possible (up to an -loss): a random coloring has discrepancy about for every segment .

We will take up these issues in detail in the rest of this post. A general remark is the following. Although in most cases we want to prove lower bounds for a supremum like quantity, the -discrepancy, it is often more convenient to bound a corresponding average discrepancy. For this one has to place an appropriate measure on the family of sets . For example the set of lines can be parametrized by where is the one dimensional unit circle and there is the natural product measure on this space. The same principle holds for other types of geometric discrepancy.

**2. The discrepancy of a line segment on a checkerboard **

Like before we fix some checkerboard coloring where is constant on each cell , The family of sets consists of (finite) line segments of arbitrary position and length. We now translate the problem to a corresponding problem on a finite checkerboard as follows. Let where will be a large positive integer. Consider as a union of congruent cells as before:

A finite checkerboard coloring of is a function which is constant and identically equal to either or on each cell and identically zero on the complement of . We define the discrepancy

The main theorem proved in [K] says that for every large , there exists a line segment, contained in with discrepancy

Observe that this `finite checkerboard’ result implies the corresponding statement for the infinite coloring since can be arbitrarily large.

Let us parametrize the set of lines in the plane by their distance from the origin and the unit vector along the line. Thus any line can be described as

We set

This is the discrepancy of the line segment with respect to the coloring . At the same time it is the Radon transform of , hence the notation In order to consider all possible line segments one has to vary and . For any consider the quantities

where denotes the normalized Lebesgue measure on . Finally observe that the measure of the support of is about so we define the normalized discrepancies

Now observe that Plancherel’s theorem and the Fourier slice theorem imply that

An easy calculation shows that

where is the (bi-variate) trigonometric polynomial

and is the color corresponding to the cell . Observing the elementary bound

the previous calculation shows

which in turn implies that

so that .

** 2.1. Some remarks on discrepancies for . **

Kolountzakis proves in [K] that for random colorings we have with high probability so, ignoring -losses, the for the -discrepancies we have

For one can use the convexity of the norms to get the bound

On the other hand consider a coloring where each one of the lines of the checkerboard has a single color and adjacent lines have different colors. It is then easy to check that

Thus, while the previous bounds are sharp for , the discrepancy can be much smaller, of the order . See the end of this post for some related open questions.

**3. Discrepancy of a circle on a checkerboard **

Here we consider the family consisting initially of all circles in the Euclidean plane, of any center and radius. For the infinite checkerboard problem this is an appropriate set-up. Translating the problem to the finite checkerboard requires some discussion in this case. Fix some large and a coloring of . In this set-up, it makes sense to restrict the radii of the circles considered to be at most of the order otherwise one can consider radius tending to infinity so that the circle essentially becomes a line with respect to the checkerboard scale which is . If is a circle the relevant discrepancy now is

where we integrate with respect to the arc-length measure on the circle . Remember that we have assumed that is identically zero outside . This means that the previous integral ignores circles that do not `hit’ the finite checkerboard . This is of no consequence when translating to the infinite set-up. However, the previous integral takes into account arcs as well as circles if no restriction is imposed on the centers and radii of the circles. Thus if we manage to find a circle of radius say such that

is large this only amounts to *arc* in the infinite checkerboard which has large discrepancy. This is a technical difficulty to which we will come back later on.

Let us now write down a convenient formula for the circle discrepancy. Fixing we set

where denotes the circle of center and radius and denotes the arc length measure on the circle . Note that is *not* normalized so that

In our considerations below we will only consider radii which implies that the measure of the support of is about . Thus we define the normalized discrepancy for any :

By Plancherel’s theorem we have as usual

The main tool for the study of the discrepancy is the classical asymptotic description of the Fourier transform of the arc-length measure on the circle. We have

and

Studying the asymptotic description of one easily sees that the main obstruction in proving a lower bound for are the zeros of which, as , become almost periodic. This problem was dealt with in [IK] by also averaging in the radial variable, in an interval of the form . Since the zeros of are isolated this provides a lower bound for which is of the order . However the bound in [IK] only implies a bound for *arcs* in the infinite checkerboard. Furthermore, as a result of the averaging in the radial variable, the radius of the arc is not specified exactly; it lies in some interval of the form .

Our first result answers the problem of finding a full circle with large discrepancy.

Theorem 4Let be a real number and a large positive integer. We have that

*Proof:* A sketch of the proof of this result is the following. A simple calculation as before shows that

Here is a positive constant to be chosen appropriately. Now assume, as we may, that behaves likes when is large, which corresponds to the term . The cosine term above is zero exactly when

From the previous expression of the (approximate) zeros of one readily sees that if some real number is a zero, say for some non-negative integer , then That is if is a zero then stays bounded away from the zeros of ; we get that

The estimate for is thus:

For one needs to note that for an appropriate choice of , the disk has a positive distance from the first zero of so that in that region. We get the better estimate

since . Summing the estimates for and we get

by another use of the basic estimate (1).

Remark 3To make the previous argument rigorous on needs to make an explicit choice of the constant and prove that

One way to do that is the following. Fro classical estimates we have thatwhere is the -th order Bessel function and is an error term that satisfies . A direct computation then shows that (2) is valid whenever (say). For one can check directly the validity of (2) either by looking up the zeros of in a table or by drawing the graph of the function. Finally, by looking up the zeros of one checks again that the first zero of is so that has no zero for and the estimate for is also valid. Thus the previous argument works for .

Using Theorem 4 we have the following easy corollary

Corollary 5Let and be any checkerboard coloring of the infinite Euclidean plane. There exists a constant and a circle of radius , where is equal to either or , such that

*Proof:* Given consider the finite checkerboard with and the corresponding finite coloring . Theorem 4 then shows that

where has a fixed value equal to either or . Consider the cube . We have

since .

**4. Single radius discrepancy for arcs **

The previous discussion answers our main question by showing that for every there exists *a full circle* of radius *either or *, with discrepancy . If there is an unsatisfying element in this result it is exactly that we cannot prove the corresponding statement *for every* radius . Observe that we did not completely avoid the averaging in the radial direction, albeit we only only averaged between two discrete values and . An attempt to deal with this problem is contained in the following discussion. In order to avoid the ambiguity in the choice of the radius of the circle with large discrepancy we replace the radial averaging with pointwise estimates. Our methods however only work under the restriction for the finite checkerboard. As a result, the simple argument of Corollary 5, which required , does not work any longer and our current methods only prove the following:

Theorem 6For every radius and every coloring of the infinite checkerboard there existsa circular arc, of radius , withwhere is an absolute constant, independent of , and .

Once again we actually prove the corresponding lower bound for the discrepancy of the finite checkerboard. In particular we show that for we have

Here, the restriction is essential and results to the existence of an arc, instead of a full circle, in the infinite checkerboard with discrepancy The idea is the following. We write

Assuming arbitrarily large values of we can assume that the asymptotic description

is valid. We denote by

the roots of the cosine term as before. It will also be convenient to introduce a notation for annular neighbourhoods around the roots by setting

where is a small real parameter. Now using the asymptotic expansion of it is easy to see that

where is a large constant, which may depend on , which is chosen so that for the asymptotic expansion of is valid. On the other hand for we have no asymptotic description of be we now that its zeros are of the form where is a finite sequence without accumulation points. This can be seen by arguing via the -th order Bessel function as before, or, alternatively, by noting that is radial and can only have isolated zeros in the radial direction since it is the Fourier transform of a compactly supported measure. We conclude by compactness that

Using (4) and (3) we can write for every small parameter the estimate

Setting the previous considerations imply that

Note that the implicit constant in the estimate above depends on the parameter which is yet to be chosen. The main idea to complete the proof is that for small enough , the integral

captures a positive proportion of the integral

This can be formulated more generally for any reasonable function as follows:

Proposition 7For consider any ball and let . Fix a finite sequence

and set Define as before the annuli

Then for we have

Before giving the proof of this proposition let us see how we can use it to complete the proof of Theorem 6. We have

where by (4) and (3) we see that is either or and tha in every case we have

so for , where is some numerical constant, the hypotheses of the proposition are satisfied. We thus get

since . Thus for small enough, but not depending on any of the parameters of the problem. We now give the proof of Proposition 7.

*Proof:* } Fix and for consider the annulus with . For and we have

Using and Cauchy-Schwarz we get

Multiplying both sides by and integrating for and we have

Now for and we have

and

so that . Thus the last estimate can be written in the form

We integrate the previous inequality for . For the left hand side we have

For the first term on the right hand side we estimate

since Finally, for the last term in the right hand side (5) we have

Summing up the previous estimates we get for every :

Observe that

Summing the previous estimates for we get

Now adding the term to both sides of the estimate above gives the claim of the proposition.

**5. Some open questions **

We close the discussion on the discrepancy for checkerboard colorings by summarizing some of the open questions that we alluded to earlier:

Question 8Is it true thatfor everythere is afull circlewith discrepancy ? In particular, is it true thatfor all ?

We believe that yes but the methods discussed here do not seem to be sufficient to prove such a result.

Consider the discrepancies of circles defined in the obvious way

As we just saw we have for all , at least whenever .

Question 9Construct a coloring with , such that . A simple interpolation argument shows that if then we have

Is this best possible? (probably yes). What is the correct order of magnitude of ?

Consider the discrepancy of a segment on a checkerboard defined above:

and the corresponding discrepancies

As we saw in the introduction we have the bounds (up to -losses)

The following question is open:

Question 10Is it true thatfor any coloring of the finite checkerboard? Failing that, one could try to show that we have

for any coloring .

[IK] Alex Iosevich and Mihail N. Kolountzakis, The discrepancy of a needle on a checkerboard. II, Unif. Distrib. Theory 5 (2010), no. 2.

[K] Mihail N. Kolountzakis, The discrepancy of a needle on a checkerboard, Online J. Anal. Comb. 3 (2008), Art.7, 5.

[KP] Mihail N. Kolountzakis and Ioannis Parissis, Circle Discrepancy for Checkerboard measures, (2012), preprint, arXiv:1201.5544.

]]>**1. The Littlewood-Paley decomposition **

We start our analysis with forming a smooth Littlewood-Paley decomposition as follows. Let be a smooth real radial function supported on the closed ball of the frequency plane, which is identically equal to on . We then form the function as

Observing that if and also that if we see that is supported on the annulus .

Now the sequence of functions forms a *partition of unity*:

To see this first observe that each function is supported on the annulus . Thus for each given there are only finite terms in the previous sum. In particular if , then

Note that we miss the origin in our decomposition of the frequency space as each piece is supported away from . Some attention is needed concerning this point but usually it creates no real difficulty.

Thus we partition the unity in the form and each is smooth and has frequency support on an annulus of the form . Now for let us define the multiplier operators

and

initially defined for or . The operator frequency cut-off operator is almost a projection to the corresponding frequency annulus . It is not exactly a projection since the function is a smooth approximation of the indicator function , introducing a small tail in the region which is mostly harmless. Similarly, the operator is almost a projection on the ball .

We have the following simple properties of the Littlewood-Paley decomposition:

Proposition 1(i) For every we have that is in . (ii) For every we have and where the limits are taken in the -sense.(iii) For every we have that

in .

Remark 1Property (iii) above holds in a more general sense and for a wider class of functions, for example functions and more generally locally integrable functions that have some decay at infinity. The decomposition fails however if has no decay. Indeed, the function satisfies for all . Observe here that the function has frequency support on which is the point missed in our partition of unity.

Thus, with a Littlewood-Paley decomposition we managed to write any function (and thus any Schwartz function) as a sum of pieces , each piece being well localized in frequency on the annulus .

It is pretty obvious how the operators act on the frequency variable so let us take a look on what the pieces look in the *physical space*. From the general facts about the Fourier transform (see for example Exercise 2 of Notes 3) we know already that cannot have compact spatial support. Since

and , we have

Here note that . From the discussion that followed the definition of convolutions in Notes 2 we thus see that is an average of around the point at scale . Remembering that is supported on the ball this is also consistent with the uncertainty principle which also implies that the function is essentially constant at scales . Now since a piece has frequency support contained in we get that

Thus is almost constant on scales . On the other hand, since has frequency support on the annulus we have that

As before we can rewrite this as

The previous identity roughly says that the function has zero mean on every ball around of radius .

Remark 2We have mentioned in passing that the operators can be seen as smooth approximations of the exact projections operatorsSimilarly, can be viewed as a smooth approximation of the frequency projection

There are however important differences between the rough and smooth versions of these projections. For example, since is a Schwartz function the function is also Schwartz and Young’s inequality shows that

thus is bounded on . Now, consider the rough version given as

Of course is still bounded on because of Plancherel’s theorem. However, the function is no longer in and Young’s inequality cannot be used. In fact, is not bounded on whenever and . This is a deep result of C. Fefferman.

**2. Littlewood-Paley Projections and derivatives **

Recall the basic relation describing the interaction of derivatives with the Fourier transform:

In particular

If has support on some annulus we immediately get

and thus for any function that

In fact the same approximate identity extends to all spaces for .

Proposition 2For all we have that

We won’t prove this proposition here since it will be covered by a lecture in the student’s seminar.

**3. The Littlewood-Paley inequalities **

The Littlewood-Paley inequalities quantify the heuristic principle that the pieces , having well separated frequency supports, behave independently of each other, meaning that

in some appropriate sense (for example in ). In this is already an easy consequence of the Plancherel identities. Indeed, note that

Like before observe that for every there are only two terms which don’t vanish, and these add up to . Thus

and

We can equivalently write this identity in the form

The following theorem extends this approximate identity to all spaces for .

Theorem 3Define the Littlewood-Paley square function asThen for all we have

*Proof:* Consider the vector valued singular integral operator

and observe that

Thus the statement of the Theorem is equivalent to

Observe that is a bounded linear operator from to . Indeed the strong type of follows from the remarks before the theorem. Furthermore, defining

we can verify that is a singular kernel:

Lemma 4The kernel defined above is a singular kernel from to .

Postponing the proof of this lemma for now, we use the vector valued version of the Calderón-Zygmund theorem to show that is bounded from to :

which is one of the estimates in (1). To prove the lower estimate we argue as follows. Let . Then

By vector valued duality and the estimate we conclude that the adjoint operator satisfies

Now we repeat the Littlewood-Paley decomposition but starting with the function

and setting

or equivalently

Using exactly the same arguments as before we can show that we also have that

Observe that for we have that and thus for any function with we have that .

Now choose

and observe that since we already have that . We get

However on the left hand side we have the pointwise identity which shows that

as we wanted to show.

We now go back to the proof of Lemma 4.

*Proof of Lemma 4:* Remember that the kernel is given as

Let so that

First of all we prove the estimates

For (2) we write

On the one hand we have that

On the other hand for any positive integer we have

by integrating by parts times and passing the derivatives to . Applying this estimate for gives the second estimate in (2). The proof of (3) is very similar by observing that

Now the same analysis as in (2) applies (with an extra factor) and gives (3). Estimates (2) and (3) now imply the size and regularity conditions for the singular kernel in .

** 3.1. A rough version for -dimensional dyadic intervals **

So far we carried out the Littlewood-Paley decomposition based on a smooth partition of unity. The use of smooth functions to form the Littlewood-Paley decomposition has many advantages since then the projections are bounded multiplier operators. On the other hand, Remark 2 shows that in dimensions , the multiplier associated with a Euclidean ball is not bounded on . This means that the Littlewood-Paley inequalities based on the projections

The previous discussion leaves the one-dimensional case open. In fact we will see now that one can form the Littlewood-Paley decomposition in one dimension based on the rough partition of unity

and still have the Littlewood-Paley inequalities. So let us define to be the exact frequency projection defined by (4). We have the following.

Theorem 5Let , . Then we have the one dimensional Littlewood-Paley inequalities for the rough projections :

*Proof:* Writing in the form

we have the following representation in terms of the Hilbert transform

For let us define the vector valued analogue

Using the fact that is a CZO and the representation (5) of in terms of we can see that is a vector valued Calderón-Zygmund operator, thus is bounded from to . Applying this property to the function

we get

Now observe that since we have the identity . Thus the previous estimate implies that

By Theorem 3 we get one of the inequalities in the statement of the theorem:

To prove the opposite inequality, we write the dual estimate that was obtained in proof of Theorem 3:

Now take and observe that

so the previous estimate implies

which gives the other inequality in the theorem.

Exercise 1Let be a scalar valued CZO and . Show that

Hint:Consider the vector valued operatorThe problem reduces to showing that is bounded from to . Observe that is associated with the kernel

where is the identity from to and is the (scalar) kernel associated with . You can assume a Banach space version of the vector valued Calderón-Zygmund theorem.

Exercise 2Let be a sequence of bounded or unbounded intervals on the real line, where is a finite or countably infinite index set. Define the frequency projectionsShow that

Hint:Like in the proof of Theorem 5 use the representation of the projections in terms of the Hilbert transform and Exercise 1.

We have already remarked (see remark 2) that Theorem 5 does not generalize to annuli in the -dimensional Euclidean space if we insist on using the rough projections . However, there is a generalization of the `rough’ Littlewood-Paley theorem to dimensions . This is based on decomposing the frequency space to a union of disjoint dyadic `intervals’, that is, -dimensional rectangles with axes parallel to the coordinate axes, where every side of the rectangle is an interval of the form or . This allows for `tensoring’ Theorem 5 to several dimensions without great difficulty. This is done as follows. For we set

where each is the one-dimensional projection previously defined acting only on the -th variable. For we have

The corresponding square function is defined as

This leads to

Theorem 6For we have

We omit the proof of this theorem as it is mostly technical, based on induction and starting from the one dimensional version of the theorem already proved. You can find the proof for example in [D] or [S].

**4. Two theorems on multipliers **

We now go back to multiplier operators and reconsider them from the point of view of Calderón-Zygmund theory. We have already seen that a multiplier operator is the linear operator with for some . This definition automatically implies that is bounded on with norm . Alternatively, the discussion from Paragraph 8.1 of Notes 4 reveals that these are all the bounded linear operators on that commute with translations and can be realized in the form

where is the unique tempered distribution such that .

If the operator extends to a bounded linear operator on we say that is an -multiplier and write . We set

The previous remarks then show that . It turns out that the space is a Banach space but we will not dwell on this issue here. We also have the following easy proposition:

Proposition 7(i) Let and be the conjugate exponent of . Thenand in this case we have that

(ii) For all we have

*Proof:* This is a consequence of the following obvious identity; for we have

That is, is the adjoint of . Thus

since and have the same norm. To prove the second assertion assume that otherwise there is nothing to prove. By (i), the linear operator is of strong type and with the same operator norm. By the Riesz-Thorin interpolation theorem we get that

which proves .

Remark 3Observation (ii) above shows that multipliers are necessarily bounded functions. The opposite however is not true. Another easy consequence of the discussion above is the following. We always havewhere and as observed above. The problem with this representation is that we don’t know whether is actually a function that can give meaning to the formula

If however it happens that then Young’s inequality readily applies to yield that

so that

The main problem in the theory of multipliers is to get away from the case and place suitable conditions on so that we can conclude that . The previous generalities easily imply that if then since in this case. A similar result with weaker hypothesis is the following.

Proposition 8For we define the Sobolev space to be the space of tempered distributions such that agrees with a function that satisfies

Suppose that for some . Then and .

Remark 4Observe that for any tempered distribution we have thatIf is an even integer we can write

Thus, at least when is an even integer, the Sobolev space is the space of tempered distributions such that

where makes sense as a partial differentiable operator since is an integer. Similarly one can define the Sobolev spaces to be the space of tempered distributions such that

In fact one can take one step further and define the space for any real number . In the case this presents no difficulty since one has a direct interpretation of as a Fourier integral operator. In particular, is apseudo-differentialoperator. Although this sounds a bit cryptic at the moment, we want to make the point here that for example is a condition that imposes decay on derivatives of .

Exercise 3Prove Proposition 8 above.

The general flavor of the previous results is that if a function has no local singularities and, together with its derivatives, decays fast enough at infinity, then is an multiplier for all . Besides a (controllable) singularity at infinity, one can also allow for a singularity at the origin.

We present two instances of this principle, usually referred to as the Hörmander multiplier theorem. We start with an `easy’ version where the function is bounded, to assure the hypothesis is satisfied, away from the origin and its derivatives decay at least as fast as their order.

Theorem 9 (Hörmander-Mikhlin multiplier theorem version I)Let be a bounded function which belongs to the class and satisfiesfor all multi-indices . Then agrees with a function away from the origin and satisfies

for all multi-indices . In particular, is an multiplier for all with .

*Proof:* Using the Littlewood-Paley decomposition we can write

whenever . Each piece is supported on the annulus and as a product of smooth functions so it makes sense to define

Furthermore, from our hypotheses on we can get some good estimates on each together with its derivatives. Indeed since by our hypothesis (with the zero multi-index ) we have

Likewise

On the other hand for every multi-index we have

for every non-negative integer . Integrating by parts times to pass the derivatives to the term , using Leibniz’s rule and the hypothesis on the derivatives we get the estimate

for all multi-indices and non-negative integers . We summarize these estimates in the form

for all multi-indices and non-negative integers . Using (6) for we have

On the other hand, using (6) for we get

Now since the series converges absolutely and uniformly in (when ) for every multi-index we conclude that the series converges in to some function which also satisfies the estimate

for all multi-indices . On the other hand converges to in we conclude that when . In particular,

whenever has compact support and since then . However, satisfies

by taking the zero multi-index and furthermore

by considering multi-indices with . These estimates are enough to assure that and thus is a singular kernel so is a CZO associated with . However this means that and we are done.

Observe that what we really used in order to show that is the estimates with of the derivatives of which in turn required a control of the derivatives of up to order . Thus we have the following corollary.

Corollary 10Let be a function such that

for all multi-indices with . Then for all .

Remark 5The hypothesis of the previous theorem is not optimal as one can get away with less derivatives of . However it already applies to many practical case. For example for any multi-index of order , consider the operator with symbolObserve that falls into the scope of Theorem 9 since

for all multi-indices . So for all . Now observe that for (say) we have

which shows in particular that

for all multi-indices of order , whenever . Thus all partial derivatives of order are control by the Laplacian in .

Now consider the space to be the space of functions such that all the partial derivatives of order up to are in and equip this space with the norm

By the remarks above this norm is equivalent to

Similar conclusions hold for any even integer and the space . Thus the two definitions of the Sobolev space , the one given here and then one given in Remark 3 coincide whenever is an even integer:

We now give a sharper form of the multiplier theorem which requires control only on derivatives of .

Theorem 11 (Hörmander-Mikhlin multiplier theorem version II)(i) Let be the smallest integer and suppose that the multiplier is of class withfor all multi-indices with . Then agrees with a function away from the origin which is locally integrable away from the origin and satisfies

for all .

(ii) Under the assumptions of (i) we have that .

*Proof:* As in the proof of Theorem 9 it will be enough to control the pieces . For this, let be a multi-index. We have

For this implies that

Now for any we have

where . Choosing and these estimates imply that

We will now prove a similar estimate for the derivatives of using a very similar approach. Indeed, we start from the identity

Now for and using the Leibniz rule we get

Thus we have

Also

Choosing and combining the last two estimates we conclude

This estimate for together with the mean value theorem implies that

We now have for all

On the other hand

by (7). Using now that converges in to some locally integrable function for every compact set that doesn’t contain we conclude that coincides with a locally integrable function away from and satisfies

Now since away from the origin we have that

whenever in and has compact support and . Furthermore, by the assumption we automatically get that is bounded on . Here condition (8) is enough to substitute the conditions given in the definition of a singular kernel and show that is a CZO with playing the role of the kernel. Indeed, the type of can be used to treat the bad part in the Calderón-Zygmund decomposition of a function . On the other hand, if is a bad piece supported on a dyadic cube with center and is the cube with the same center and twice the side-length, we have

Now if and we have that . Thus for we have from (8) that

so that

This treats the bad part of the Calderón-Zygmund decomposition of so we conclude the proof that is of weak type as in the general case of a CZO. Interpolating between this bound and the strong bound we get that for . By Proposition 7 or using the symmetry of in and , we also get the range with .

Exercise 4The purpose of this exercise is to clear out some of the calculation in the proofs of the two versions of Hörmander’s theorem. (i) Prove the identityfor any positive integer . Here the meaning of the symbol is

Let and be two multi-indices in . We write of for all . With this notation the

Leibniz rulesays that for any multi-index and functions which are say smooth, we haveHere the

generalized binomial coefficientsare defined asAlternatively we use the notation

(ii) For any two multi-indices show that

(iii) Let satisfy the estimate

for some and , be as in the Littlewood-Paley decomposition. Show that satisfies the same estimates, that is,

with different implied constants of course. Remember that and thus is supported on .

(iv) Let satisfy the estimate

for some and , be as in the Littlewood-Paley decomposition. Set . Show that for any multi-index of order we have

(v) Let be a smooth function which is supported on . Show that

and by iterating that

for all positive integers .

Exercise 5Let be such that . Furthermore suppose that satisfies the mean regularity conditionShow that .

Hint:Briefly describe the key elements of the proof showing that is of weak type . Argue why this implies that for . You get the complementary interval for free (why?).

]]>

for an appropriate kernel . Let us quickly review what we used in order to show that the Hilbert transform is of weak type and strong type . First of all we essentially used the fact that the linear operator is defined on and bounded, that is, that it is of strong type . This information was used in two different ways. First of all, the fact that is defined on means that it is defined on a dense subspace of for every . Furthermore, the boundedness of the Hilbert transform on allowed us to treat the set where is the `good part’ in the Calderón-Zygmund decomposition of a function . Secondly, we used the fact that there is a specific representation of the operator of the form

whenever and has compact support and . For the Hilbert transform we had that the kernel is given as

We used the previous representation and the formula of to prove a sort of restricted boundedness of on functions which are localized and have mean zero, which is the content of Lemma 7 of Notes 6. This, in turn, allowed us to treat the `bad part’ of the Calderón-Zygmund decomposition of . From the proof of that Lemma it is obvious that what we really need for is a Hölder type condition. Note as well that for the Hilbert transform we first proved the bounds for and then the corresponding boundedness for followed by the fact that is essentially self-adjoint.

**1. Singular kernels and Calderón-Zygmund operators **

We will now define the class of Calderón-Zygmund operators in such a way that we will be able to repeat the schedule used for the Hilbert transform. We begin by defining an appropriate class of kernels , name the singular (or standard) kernels.

Definition 1 (Singular or Standard kernels)Asingular(orstandard) kernel is a function , defined away from the diagonal , which satisfies the decay estimate

Example 1Let be given as for with . Then is a singular kernel. Observe that is the singular kernel associated with the Hilbert transform.

where is a Hölder-continuous function:

for some . Then is a singular kernel.

Exercise 1Prove that the kernel of example 2 is a singular kernel.

Example 3Let satisfy the size estimateand the regularity estimates

away from the diagonal . Then is a singular kernel. In particular, the kernel given as

is a singular kernel since the gradient of is of the order . Thus the estimates (2) and (3) are consistent with (1) but of course do not follow from it.

Remark 1The constant appearing in (2), (3) is inessential. The conditions are equivalent with the corresponding conditions where is replaced by any constant between zero and one.

We are now ready to define Calderón-Zygmund operators.

Definition 2 (Calderón-Zygmund operators)ACalderón-Zygmund operator(in shortCZO) is a linear operator which is bounded on :and such that there exists a singular kernel for which we have

for all with compact support and .

Remark 2Note that the integral converges absolutely whenever has compact support and lies outside the support of . Indeed,by (1), for some . Observe that the integral in the last estimate converges.

Remark 3For any singular kernel one can define by means offor with compact support and . It is not necessary however that is a CZO since it might fail to be bounded on .

Remark 4It is not hard to see thatuniquelydetermines the kernel . That is iffor all with compact support, then almost everywhere (why?). The opposite is not true. Indeed, for any bounded function the operator defined as is a Calderón-Zygmund kernel with kernel zero. A more specific example is the identity operator which also falls in the previous class, and is CZO with kernel 0. However, this is the only ambiguity. See Exercise 2.

Exercise 2Let be two CZOs with the same singular kernel . Show that there exists a bounded function such thatfor all .

If is a CZO, the definition already contains the fact that is defined and bounded on , so we don’t need to worry about that. The next step is to establish the restricted boundedness for functions with mean zero. The following lemma is the analogue of Lemma 7 of Notes 6.

Lemma 3Let be a Euclidean ball in and denote by the ball with the same center and twice the radius, that is . Let have mean zero, that is . Then we have thatfor all . We conclude that

*Proof:*Using the fact that has zero mean on , for we can estimate

Integrating throughout we also get the second estimate in the lemma.

The only thing missing in order to conclude the proof of the bounds for CZOs is the the fact that they are self adjoint *as a class*. In particular, we need the following.

Lemma 4Let be a CZO. Consider the adjoint defined by means of

*Proof:* It is immediate from (4) and the fact that is bounded on that is also bounded on with the same norm. Now let have disjoint compact supports. We have

Now let and have support inside with . For , the functions are supported in so, for small enough, the support of is disjoint from the support of . By (5)we conclude that

Letting we get

for almost every . Since the conditions defining singular kernels are symmetric in the variables , the kernel is again a singular kernel so we are done.

The discussion above leads to the main theorem for CZOs:

Theorem 5Let be a Calderón-Zygmund operator. Then extends to a linear operator which is of weak type and of strong type for all where the corresponding norms depend only on and and .

**2. Pointwise convergence and maximal truncations **

Let be a CZO. The example of the Hilbert transform suggests that we should have the almost everywhere convergence

at least for nice functions . The truncated operators

certainly make sense for because of (1). However, the limit need not even exist in general or may exist and be different from . Here we can use the trivial example of the operator . As we have already observed this is a CZO operator with kernel . Thus for all but clearly in general.

The following lemma clears out the situation as far as the existence of the limit is concerned:

Lemma 6The limitexists almost everywhere for all if and only if the limit

exists almost everywhere.

*Proof:*First suppose that the limit exists for all and let with on . Then

Observe that by (1)the second integral on the right hands side converges absolutely. Since the limit on the left hand side exists we conclude that the limit on the right hand side exists as well. Conversely, suppose that the limit

exists and let . We have that

By the same considerations are before is a positive number that does not depend on . By the hypothesis we also have that . Finally for observe that we have

by (1). Since

dominated convergence implies that exists as well.

Thus, for specific kernels one has an easy criterion to establish whether the limit exists a.e. for `nice’ functions . For example, for the kernel of the Hilbert transform, the existence of the limit

is obvious. In order to extend the almost everywhere convergence to the class we need to consider the corresponding maximal function.

Definition 7Let be a CZO and define the truncations of as beforeThe

maximal truncationof is the sublinear operator defined as

The maximal truncation of a CZO has the same continuity properties as itself.

Theorem 8Let be a CZO and denote its maximal truncation. Then is of weak type and strong type for .

The proof of Theorem 8 depends on the following two results.

Lemma 9Let be an operator of weak type and . Then for every set with we have that

The proof of this lemma is a simple application of the representation of the norm in terms of level sets and is left as an exercise.

Exercise 3Prove Lemma 9 above.

The second result we need is the following lemma that gives a pointwise control of the maximal truncations of the CZO by an expression that involves the maximal function of and the maximal function of .

*Proof:*Let us fix a function and and consider the balls and its double . We decompose in the form

Since and obviously has compact support we can write

Also every is not contained in the support of thus

by (3), since for in the area of integration above. By this estimate we get that

Combining the previous estimates we conclude that for any

for some constant depending only on and .

If then we are done. If then there is such that . Let

and

Let . Then either or or . In the last case so in every case we conclude that thus . However we have that

Also, by the type of we get

Finally, if then . Otherwise so

Thus in every case we get that

Since the previous estimate is true for any we conclude that

which gives the desired estimate in the case .

For estimate (7)implies that

and integrate in to get

and thus

Note that

and by Lemma 9the last term is controlled by

since is of weak type . Gathering these estimates we get

as we wanted to show.

We can now give the proof of the fact that maximal truncation of a CZO is of weak type and strong type for .

*Proof:* Proof of Theorem 8. By Lemma 10 for we immediately get that is of strong type for since both and are. In order to show that is of weak type we argue as follows. By Lemma 10we have that

Thus the proof will be complete if we show that

As we have seen in Corollary 18 of Notes 5 we have that

where is the dyadic maximal function. Furthermore, using the Calderón-Zygmund decomposition it is not hard to see (see Exercise 4) that

Applying the last estimate to we get

For the set has finite measure. Thus by Lemma 9we conclude that

and thus by (8)that

This concludes the proof.

**3. Singular integral operators on and . **

The theory of Calderón-Zygmund operators developed so far is pretty satisfactory except for one point, the action of a CZO on . Exercise 4 from Notes 6 shows for example that in general a CZO cannot be bounded on . Furthermore, it is at the moment unclear how to define the action of on a general bounded function or even on a dense subset of . With a little effort however this can be achieved.

Let us first fix a function and look at the formula

As we have already mentioned several times, such a formula is not meaningful throughout . Indeed the integral above need not converge, both close to the diagonal , since is singular, as well as at infinity since only decays like , not fast enough to make the integral above absolutely convergent. The first problem we have dealt with so far by considering functions with compact support and requiring the validity of (9)only for . A similar solution could work now but we still have a problem at infinity. Note that we didn’t run into this problem yet since we only considered functions in which necessarily possess decay at infinity. This is not necessarily the case for bounded functions. However, looking at the difference of the values of at two points with , we can formally write

Using the regularity condition (3)we see that

when . This is enough to assure integrability in the previous integral, as long as Motivated by this heuristic discussion we define for :

for some Euclidean ball so that . First of all it is easy to see that the integrals above make sense. Indeed, is well defined since is in . On the other hand, the integral in the second summand converges absolutely since we integrate away from , is bounded and behaves like for . However, (10)only defines up to a constant. Indeed it is easy to see that if are two different balls containing the difference in the two definitions is equal to

which is a constant independent of . Thus we only define modulo constants. This definition of gives a linear operator which extends our previous definitions on or . To deal with the ambiguity in the definition, we have to define the appropriate space.

Definition 11We say that two functions areequivalent modulo a constantif there exists a constant such that almost everywhere on . This is an equivalence relationship. By abuse of language and notation we will oftentimes identify an equivalence class with a representative of the class, much like we do with measurable functions.

Definition 12 (Bounded Mean Oscillation)Let be a locally integrable function , defined modulo a constant. We setto be the average of on the Euclidean ball . The norm of is the quantity

where the supremum varies over all Euclidean balls . The space is the set of all locally integrable functions , defined modulo a constant, such that . Thus, an element of is only defined up to a constant.

First of all observe that this is a good definition since replacing a function by for any constant does not affect its BMO norm. Thus, all elements in the equivalence class of have the same BMO norm. The previous quantity actually defines a norm, always keeping in mind that we identify functions that differ by a constant. For example any constant is equivalent to the function in BMO and thus if and only if almost everywhere for some .

It is not hard to give the following alternative description of the BMO norm, which is maybe a bit more revealing:

Proposition 13(i) Let . We have that(ii) For any locally integrable function and a cube set . We set

where the supremum is taken over all cubes Then

as in . Moreover

*Proof:*For (i) observe that for any ball we have

On the other hand for any we have

which gives the opposite inequality as well by taking the infimum over . The proof of the first claim in is identical. For the second claim in let and be a cube. Consider the smallest ball with the same center as . Then

Thus,

for any cube . Taking also the supremum over cubes proves the one direction of the inequality. The proof of the opposite inequality is similar.

Thus a function in BMO has the property that for any ball there is a constant such that . That is, the values of oscillate around by at most in average. Locally, and in the *mean*, the function has bounded oscillation.

The space BMO contains but also contains unbounded functions.

Proposition 14(i) For every we have thatthus .

(ii) The function is in . Thus is a proper subset of .

Our interest in the space BMO mainly lies in the fact that it serves as a substitute endpoint for the boundedness of CZOs, namely a CZO is bounded from to BMO, where should be defined as in (10). Note here that even though (10) only defines `up to constants’, this is the only possible definition of a BMO function.

Theorem 15Let be a CZO. Then for every we have that

*Proof:*Let be some ball in . We need to show that

and denote . We set

Since is of strong type we have

Thus by Cauchy-Schwartz we have

On the other hand for , the ball certainly contains both and so

Remembering that (10)only defines up to a constant we get

By Proposition 13 this proves the theorem.

** 3.1. The John-Nirenberg Inequality **

We will now see that although the space BMO contains unbounded functions like , this is in a sense the maximum possible growth for a BMO function. Although such a claim is not precise in a pointwise sense, it can be rigorously proved in the sense of level sets. Indeed, assuming then

for all balls . Using Chebyshev’s inequality this implies

This estimate is interesting for large, and states that on any ball the function exceeds its average by only on a small fraction of the ball . In fact, this can be improved.

Theorem 16 (John-Nirenberg inequality)Let . Then for any Euclidean cube we have thatfor all , where the constant depends only on the dimension .

Remark 5Obviously it doesn’t make any difference to work with balls instead of cubes so the the previous theorem remains valid with balls replacing cubes .

*Proof:*For let us denote by the best constant in the inequality

valid for any cube and with By Chebyshev’s inequality combined with the trivial bound we get

which is of course quite far from the desired estimate

This will be achieved by iterating a local Calderón-Zygmund decomposition as follows.

Let us fix a cube and consider the family of cubes inside which are formed by bisecting each side of . Then define the second generation by bisecting the sides of each cube in and so on. The family of all cubes in all generation will be denoted by . For a level to be chosen later let be the `bad’ cubes in , that is the cubes such that

where .

Finally let be the family of maximal bad cubes. Since for the original cube , every bad cube is contained in a maximal bad cube. As in the global Calderón-Zygmund decomposition we conclude that

for each cube where the constant depends only on the dimension . We also conclude that

if by the dyadic maximal theorem. Remembering the initial normalization we get

and for

Now consider . We have

However this means that

whenever . Suppose that . Since is non-increasing and the trivial estimate we get

for (say) and . On the other hand, for we have

so the proof is complete.

Corollary 17Consider the version of the BMO normThen

Exercise 5Use the John-Nirenberg and the description of norms in terms of level sets to prove Corollary 17

Finally, we show how we can use the space as a different endpoint in the Log-convexity estimates for the norms.

Lemma 18Let and . Then and

*Proof:*Obviously it is enough to assume that otherwise there is nothing to prove. Also by homogeneity we can normalize so that . Now form the Calderón-Zygmund decomposition of at level and denote by the family of bad cubes as usual. For each cube we then have

From the John-Nirenberg inequality we conclude that

for all the bad cubes . Since we have that for we get

for all . On the other hand, since we have

We conclude the proof by using the description of the norm in terms of level sets and using (12) for and (11) for .

Exercise 6 (The sharp Maximal function)For definethe sharp maximal functionObserve that if and only if and, in particular,

Show that for every we have

**4. Vector valued Calderón-Zygmund Singular integral operators **

We close this chapter on CZOs by describing a vector valued setup in which all our results on CZOs go through almost verbatim. We will see an application of these vector valued results in our study of *Littlewood-Paley* inequalities.

So let be a separable Hilbert space with inner product and norm and consider a function . All the well known facts about spaces of measurable scalar functions have almost obvious generalizations in this setup once we fix some analogies. For example, the function will be called measurable if for every the function is a measurable function of . If is measurable then is also measurable. We then denote the space of all measurable functions such that

and the usual corresponding definition for

It is not hard to check the duality relations for these spaces; for example

for all . Also our interpolations theorems, the Marcinkiewicz interpolation theorem and the Riesz thorin interpolation theorem go through in this setup as well.

Moreover, if a function is absolutely integrable, we can define its integral as an element of by defining the functional

Note here that is uniquely defined as a functional in . Indeed, is obviously linear and by the Cauchy-Schwartz inequality we have

By the Riesz representation theorem on Hilbert spaces, there is a unique element of , which we denote by , such that , that is

Finally, if are separable Hilbert spaces we denote by to be the space of bounded linear operators , equipped with the usual operator norm:

Again, a function will be called measurable if for every the function

is a measurable -valued function.

We are now ready to give the description of vector valued CZOs. We start with the definition of a singular kernel.

Definition 19 (Vector valued singular Kernel)Let be two separable Hilbert spaces and be a function defined away from the diagonal Then will be called a (vector-valued)singular kernelif it obeys the size estimate

Definition 20Let be separable Hilbert spaces. An linear operator is called a (vector valued)Calderón-Zygmundoperator (vector valued CZO) from to if it is bounded from tofor all , and there exists a vector valued singular kernel such that

whenever has compact support and .

Adjusting the proof of the scalar case to this vector valued setup we get the corresponding statement of Theorem 5.

]]>

Theorem 21Let be separable Hilbert spaces and be a vector valued Calderón-Zygmund operator from to .(i) The operator is of weak type

for all .

(ii) For all , is of strong type

for all .

defined initially for `nice’ functions . Here we typically want to include the case where has a singularity close to the diagonal

which is not locally integrable. Typical examples are

and in one dimension

and so on. Observe that these kernels have a non integrable singularity both at infinity as well as on the diagonal . It is however the local singularity close to the diagonal that is important and will lead us to characterize a kernel as a singular kernel. For example, the kernel

*is not* a singular kernel since its singularity is locally integrable. Observe that for Schwartz functions it makes perfect sense to define

and in fact the previous integral operator was already considered in the Hardy-Littlewood-Sobolev inequality of Exercise 12 in Notes 5 and can be treated via the standard tools we have seen so far.

Thus, if one insists on writing the representation formula (1) throughout then will not be a function in general. Indeed, the discussion in Notes 4 reveals that if the operator is translation invariant then the kernel must necessarily be of the form for an appropriate tempered distribution :

Bearing in mind that there are tempered distributions which do not arise from functions or measures we see that (1) does not make sense in general and it should be understood in a different way. To give a more concrete example, think of the principal value distribution and write

Here we would like to rewrite this in the form

but this does not make sense even for since the function is not locally integrable on the diagonal .

In fact, the representation (1) of the operator will not be true in general but we will satisfy ourselves with its validity for functions , of compact support, and whenever does not lie in the support of . Indeed, if has compact support and then in (1) and thus we are away from the diagonal. Indeed, returning to the principal value example, observe that the integral

makes perfect sense when has compact support and .

Eventually the theory of singular integral operators does not depend on translation invariance; singular kernels of the type can be viewed as a special case of the more general class of singular kernels which satisfy appropriate growth and regularity assumptions. It is however instructive to consider the translation invariant case first. In the Calderón-Zygmund theory of singular integral operators we will start with more or less assuming that the operator is well defined and bounded on and that its kernel satisfies certain growth and regularity conditions. Alternatively, assumptions on will allow us to show the -boundedness. We will see that under these conditions will extend to a bounded operator on for and of weak type .

**1. The Hilbert transform **

In order to illustrate the general ideas let us consider what is probably the primordial example of a singular integral operator, the *Hilbert transform*, given in the form

Remembering the principal value distribution we can rewrite this in the form

at least whenever . The previous formula makes sense just because the principal value of is a well defined tempered distribution. Alternatively, we can repeat the argument we used for to write for any and a Schwartz function

Observe that we heavily rely on the fact that the kernel has zero mean on symmetric intervals around (and away from) the origin:

The mean value theorem now shows that is uniformly bounded by thus the limit of the first summand as exists and we have that

Remark 1Trying to write the Hilbert transform as an integral operator with respect to a kernel ,

we immediately run into the problem that the principal value distribution does not arise from a function. The previous discussion allows us however to write

whenever is a compactly supported function in or and . This is essentially equivalent to the fact that the integrals

are absolutely convergent whenever and is fixed.

Thus we see that the Hilbert transform is a linear operator which is at least well defined on the Schwartz class . This is quite promising since we know that is dense in for . Of course, in order to extend the action of to say we need to exhibit the continuity of on the dense subclass . In our general theory this will be a `given’, that is that our operator is bounded on . To make this general assumption meaningful we have to exhibit that it is indeed satisfied in the model case of the Hilbert transform. We begin this investigation by first showing a simple asymptotic relationship.

Before giving the proof of this Lemma let us discuss its consequences. Already the expression (2) shows that is a bounded function whenever . Indeed, using the mean value theorem for the first term in (2) and Hölder’s inequality for the second term we have that

As a result, the integrability of for solely depends on the behavior of at infinity. Now the lemma just stated shows that

whenever with . Thus for a general with non-zero mean, fails to be in since it doesn’t decay fast enough at infinity. It is however in for any . As we shall see the failure of continuity of on has a weak substitute, namely that is of weak type and this is the typical behavior of all singular integral operators we want to consider.

*Proof of Lemma 1*: The proof is a variation of the idea used in (2). For any and large we can write

For observe that whenever thus we have that

as since is a Schwartz function. On the other hand, for we have that whenever . We get

as since is integrable, being a Schwartz function. Now consider the expression

thus

as .

Exercise 1Let . Show that if and only if

Hint:Examine the decay of for by using the identity .

** 1.1. The Hilbert transform on **

Having exhibited that whenever our next task is to show that is bounded as an operator , that is to show that

for all . Remember that since is dense in such an estimate will allow us to extend to a bounded linear operator on . There are several different approaches to such a theorem, most of them connected to the significance of the Hilbert transform in complex analysis and in the theory of holomorphic functions. First we exhibit the connection with Cauchy integrals.

Proposition 2Let be a function on such that is well defined, say and for large. Then

for every .

*Proof:* By translation invariance of and taking complex conjugate in both sides of the identity it suffices to show that

Changing variables this is equivalent to

Now let

For we have that

while for we can calculate

The previous estimates obviously imply that is absolutely integrable on . Furthermore

as can be seen by a direct calculation. Thus by the previous calculations it suffices to show that

which follows by dominated convergence since and is bounded.

Exercise 2Show that for satisfying for the Hilbert transform is indeed well defined. Furthermore, show that it indeed suffices to show (3) in the previous proposition. In particular exhibit how the full statement of the previous follows from (3).

*Proof:* Let us define the Cauchy-type integral

Then Proposition 2 shows that

Observe by the proof of the proposition applied to the function that

for all . Thus by Minkowski’s integral inequality we get that

By dominated convergence we conclude that converges to in as well. By Plancherel’s theorem we get that we must also have that

in , as . Note here that the Fourier transform is well defined since and in this case we have exhibited that . The problem now reduces to calculating the Fourier transform of for and see what happens in the limit. Consider the truncations

Let us write

Then as in by dominated convergence and thus

as . We now have that

However we have that

Now Cauchy’s theorem from Complex analysis shows that whenever .

The previous definitions allow us to conclude that the Fourier transform

whenever and thus that

whenever . We conclude that

Now not that the Hilbert transform satisfies

where remember that . So for we can write

In other words for we get that .

The previous theorem shows in particular that for all . This allows us to extend the Hilbert transform to a bounded linear operator on . In fact is an isometry by Plancherel’s theorem and the fact that . Furthermore, although at the current stage it is not clear that our original definition makes sense on , we can directly define the Hilbert transform on by means of

which is a good definition whenever . In fact, recalling the discussion on multiplier transformations it is clear that the operator on is the multiplier transformation associated with the multiplier which is obviously a bounded function. This is automatic from the definition

and the fact that . We also have that which is also obvious from the fact that is an isometry.

Corollary 4The Hilbert transform extends to an isometry on . We have that

for all . Furthermore, for the Hilbert transform can be defined as

Corollary 5Consider the Hilbert transform . Then we have the following properties (i) The Hilbert transform commutes with translations and dilations (but not modulations).

(ii) The Hilbert transform is skew-adjoint on

(iii) We have the identity on :

Exercise 3Prove Corollary 5 above.Hint:Use the formula of Theorem 3.

Exercise 4Let . Show that

Conclude that the Hilbert transform is not of strong type nor of strong type .

** 1.2. The Hilbert transform on **

So far we have defined our first singular integral operator, the Hilbert transform. This is an operator that is bounded on and that has the representation

whenever has compact support and . The function

is the singular kernel associated with the Hilbert transform. Although we have seen that the Hilbert transform can be described for all , at least for nice functions , the restricted representation just described is all we really need to execute our program. Furthermore, this approach will serve as a good introduction to the general case of Calderón-Zygmund operators. From the previous discussion we know that the Hilbert transform is not of type nor of type . The following theorem is the main result of the theory.

Theorem 6(i) The Hilbert transform is of weak type ; for we have that

(ii) For , the Hilbert transform is of strong type ; for we have

*Proof:* We will divide the proof in several steps. The most important one however is the proof of the weak type . All the rest really relies on exploiting the symmetries of the Hilbert transform, interpolation and duality. ** **

**step 1; the weak bound:** We fix a level and a function and write the Calderón-Zygmund decomposition of the function at level in the form

Recall that the `bad part’ is described as

where is a collection of disjoint dyadic intervals (since ) and each is supported on . Furthermore we have that

and

Recall also that

by the maximal theorem. On the other hand the `good part’ is bounded

and its norm is controlled by the norm of :

Observe that thus and by the log-convexity of the norm we have

Remark 2Since it follows that as well. Also, by the definition of the pieces it is easy to see that as well. However, we will not use the bounds on nor on , the fact that they belong to being merely a technical assumption that allows us to define their Hilbert transforms. Overall, the hypothesis that cannot be used in any quantitative way if we ever want to extend our results to for .

Since and is linear, we have the following basic estimate

The part that corresponds to is the easy one to estimate. This is not surprising since is the good part. Since we already know that is of strong type it’s certainly of weak type thus we have

by (5). Thus this estimate for the good part is exactly what we want. Let’s move now to the estimate for the bad part. The main ingredient for the estimate of the bad part is the following statement which we formulate as a lemma for future reference.

Lemma 7Let be any interval in and denote by the interval with the same center as and twice its length. For support in and with zero mean on , , we have

for all . We conclude that

Remark 3Here we require that is also in just in order to make sure that is well defined. Note that in the case of the Hilbert transform it can be verified directly that is well defined for and . However we prefer this formulation since for more general Calderón-Zygmund operators we will only have a formula available to us for with compact support and .

*Proof:* Using the zero mean value hypothesis for we can write for

Now since we have that

so we can write

as we wanted to show. The second claim of the lemma follows easier by integrating this estimate.

We now go back to the estimate of . First of all note that

for almost every . Indeed, if we enumerate the cubes in as then we have that for every thus in . Since is an isometry on it follows that converges to in as well. Taking subsequences we then have that almost everywhere. Thus

almost everywhere and we get the claim by letting .

For each let denote the cube with the same center and twice the side-length. We now estimate the `bad part’ as follows

By the Calderón-Zygmund decomposition we have that

which takes care of the first summand. For the second we use Lemma 7 to write

again by the Calderón-Zygmund decomposition. Observe that each and has mean zero on so the appeal to Lemma 7 is legitimate. Summing up the estimates for all the bad cubes in we get

By Chebyshev’s inequality we thus get

Summing up the estimates for the bad part we conclude that

By (6) now we conclude that

whenever .

We have a priori assumed that in order to have a good definition of . However, the weak inequality on allows us to extend the Hilbert transform to a linear operator on which is also of weak type . The details are left as an exercise.

Exercise 5Let be a linear operator which is of weak type . Show that extends to a linear operator on which is of weak type , with the same constant.

**step 2; the strong bound:** As promised, the difficult part of the proof was the weak bound. The rest is routine. first of all observe that since is of weak type and strong type , the Marcinkiewicz interpolation theorem allow us to show that is of strong type for any . To treat the interval we argue by duality, exploiting the fact that is almost self-adjoint (in fact it is skew adjoint as we have seen in Corollary 5). Indeed, let and . Now for any we have

using the fact that is of strong type since . Taking the supremum over all with we get

for as well, whenever . Using standard arguments again this shows that extends to a bounded linear operator on , .

Remark 4In fact, tracking the constants in the previous argument we see that

and

Overall we have proved that is of strong type with a norm bound of the order

Remark 5We have exhibited that extends to a bounded linear operator to for and that it is of weak type . However, for a general , , there is no reason why should by given by the same formula by which it was initially defined; remember that

Thus the question whether a.e., for , is very natural. Since we know this convergence is true for the dense subset , the study of the pointwise convergence amounts to studying the boundedness properties of the corresponding maximal operator

Thus if one can show that is of weak type for example, the pointwise convergence of to would follow by Proposition 1 of Notes 5. Such an estimate is actually true and thus this formula extends to all functions for . We will however see this in the general theory of Calderón-Zygmund operators of which the Hilbert transform is a special case and so we postpone the proof until then.

** 1.3. The Hilbert transform and the boundary values of holomorphic functions **

In this section we briefly discuss the connection of the Hilbert transform with the boundary values of holomorphic functions in the upper half plane. Let us write

for the upper half plane. Two function on are called *conjugate harmonic functions* if they are the real and imaginary part respectively of a holomorphic function in the upper half plane, where . Thus we have that

By definition both are real and harmonic. Moreover, they satisfy the *Cauchy-Riemann* equations (since is holomorphic). Now assume that has a boundary value on the real line . Then

Of course, some technical assumptions are needed to make all these claims rigorous as for example assuming that the holomorphic function F has some decay of the form in the upper half plane.

Conversely, Let be a real function and be the Poisson kernel for the upper half plane

As we have seen, the convolution is a harmonic function in the upper half plane . Observe that

Consider now the *conjugate Poisson kernel*

The name comes from the fact that both are both real harmonic functions and writing we have

which is holomorphic in the upper half plane. Thus , are conjugate harmonic functions which is what makes the functions conjugate harmonic functions as well. We conclude that the function

is harmonic in the upper half plane and that

is holomorphic in the upper half plane.

Finally observe that according to the previous formulae we have

In this language, Proposition 2 just states that converges to its boundary value as . We also see that the imaginary part of converges to the Hilbert transform:

both in and almost everywhere.

** 1.4. Frequency cut-off multipliers and partial Fourier integrals **

Remember that for a bounded function the operator

is a multiplier operator (associated to the multiplier ) and that . We also say that is a multiplier on if extends to a bounded linear operator . Thus we see that the Hilbert transform is a multiplier operator on associated with the multiplier

which is obviously a bounded function with . A very closely related multiplier is the *frequency cutoff multiplier*. Given an interval in the frequency space, where , we define the operator by means of the formula

Thus the operator applied to , localizes the function in frequency, in the interval . Such operators as well as their multidimensional analogues turn out to be very important in harmonic analysis as well as in the theory of partial differential operators. Obviously is bounded on , since . However, the corresponding estimate in is far from obvious. After all the work we have done for the Hilbert transform though, we can get the bounds for as a simple corollary. This is based on the observation that

where the equality should be understood as an equality of operator in . Here remember that

The verification of this formula is left as an exercise. Formula (7) is also true when or with obvious modifications.

Exercise 6Prove formula (7).

A simple corollary of the boundedness of the Hilbert transform is the corresponding statement for .

Lemma 8The operator is of strong type for :

Note that the operator norm of does not depend on .

Now for and define the partial Fourier integral operator

Observe that these integrals are the -means of the integral . We have seen that the Gauss-Weierstrass or Abel means of this integral converge to , both almost everywhere as well as in the sense. However the function is much rougher. We still have the following theorem as a consequence of the bound for the Hilbert transform.

Theorem 9For the operator has a unique extension to a bounded linear operator on for .

However the boundedness of control the convergence of partial Fourier integrals.

Lemma 10The partial Fourier integrals converge to in the norm for if and only if is of strong type uniformly in .

Now Theorem 9 and Lemma 10 immediately imply:

Corollary 11For the partial Fourier integrals converge to in the norm.

The question whether converges to almost everywhere is much harder. For the answer is positive and this is the content of the famous Carleson-Hunt theorem. This theorem was first proved by Carleson for and then extended to by Hunt. A counterexample by Kolmogorov shows that both the and the almost everywhere convergence of the partial Fourier integrals fail for .

Exercise 7Show that extends to an operator of weak type on and that the partial Fourier integrals converge toin measurefor . Conclude that for almost every there is a subsequence such that .

*[Update 15th May 2011: Equation (7) moved to the right place, Exercise 1 slightly changed.]
*

This week we will be discussing the Hardy-Littlewood maximal function and some closely related maximal type operators. In order to have something concrete let us first of all define the averages of a locally integrable function around the point :

where is the Euclidean ball with center and radius and denotes its Lebesgue measure. Note that since Lebesgue measure is translation invariant we have

where denotes the Lebesgue measure (or volume in this case) of the -dimensional unit ball . Denoting by the indicator function of the normalized unit ball

and noting that the balls centered at zero are -symmetric, we can write

Thus

and of course is an approximation to the identity since and are just the *dilations* of the function :

Remembering the discussion that followed the definition of the convolution in Notes 2, the convolution of a locally integrable function with the dilations of an function was viewed as an averaging process. We see now that when this is exact, that is, is the average of with respect to a ball around of radius where the implied constant only depends on the dimension. A similar conclusion follows if we start with any set that is say -symmetric and convex and normalized to volume . We then have that

that is, are the averages of with respect to the dilations of the fixed convex body at every point . Here we denote by the dilations of

It is an easy exercise to show that all these averages are uniformly bounded in size. For all we have

One of course could consider more general sets instead of convex sets which are -symmetric and in fact this leads to one of the most interesting family of problems in harmonic analysis. This however falls outside the scope of this course and we will mostly focus on the case of the normalized unit ball which in some sense is the prototypical example.

The Hardy-Littlewood maximal operator (with respect to Euclidean balls) is defined as

Observe that this is a sublinear operator that is well defined at least when is locally integrable. Although maximal operators are interesting in their own right, there are some very specific applications we have in mind. The first has to do with pointwise convergence of averages of a function and is a consequence of the following simple proposition.

Proposition 1Let be a family of sub-linear operators on and define the maximal operator

If is of weak type then for any the set

is closed in

*Proof:* In order to show that the set

is closed, consider a sequence of functions with in . We need to show that . To see this observe that for almost every we have

Thus for any we can write

Since the right hand side tends to as and the left hand side does not depend on we conclude that for every

Now we have that

Thus for almost every so that .

Remark 1We have indexed the family in for the sake of definiteness but one can of course consider more general index sets and the previous proposition remains valid. In every case that the index set is uncountable some attention should be given in assuring the measurability of .

Remark 2To get a clearer picture of what this proposition says consider the family of operators

for some with integral . As we have seen already many times, these averages of converge to in many different senses for different classes of functions . In particular if then converges to even uniformly as . Thus we have

Since is dense in , Proposition 1 implies that if is of weak type then

for almost every . Thus in order to show that approximations to the identity converge to the function almost everywhere it is enough to show that the corresponding maximal operator is of weak type . In what follows we will show that the Hardy-Littlewood maximal operator is of weak type and this already implies the corresponding statement for a wide class of `nice’ approximations to the identity.

To avoid confusion, remember that in Theorem 15 of Notes 3 we have already exhibited that

for every Lebesgue point of . However this is only interesting if we already know that has `many’ Lebesgue points (in particular almost every point in ). In Theorem 15 of Notes 3 we took for granted that the integral of a locally integrable function is almost everywhere differentiable and this in turn implied that almost every point in is a Lebesgue point of . In this part of the course we will fill in this gap by showing that the integral of a locally integrable function is almost everywhere differentiable.

Exercise 1Let be of weak type . Show that for every the set

is closed in .

Hint:The proof is very similar to that of Proposition 1. Observe that it suffices to show that

for every .

**2. The Hardy-Littlewood maximal theorem **

We focus our attention to the Hardy-Littlewood maximal operator; for

The discussion in the previous section suggests that one should try to prove weak bounds for the operator . In fact we will prove the following theorem which summarizes the boundedness properties of .

Theorem 2 (Hardy-Littlewood maximal theorem)(i) The Hardy-Littlewood maximal operator is of strong type for :

for all and .

(ii) The Hardy-Littlewood maximal operator if of weak type :

for all .

Remark 3The Hardy-Littlewood maximal operator isnotof strong type . To see this note that for any we have that

which shows in particular that is never integrable whenever is not identically . Moreover, no strong estimates of type are possible whenever as can be seen by examining the dilations of and .

Exercise 2Prove the assertions in the previous remark.

Exercise 3Let and let be a ball such that for every . Let be the ball with the same center and twice the radius of . Show that for every .

* Proof of Theorem 2: *First of all let us observe that is of strong type . This is just a consequence of the general fact that an average never exceeds a `maximum’. In view of the Marcinkiewicz interpolation theorem it then suffices to show the assertion (i) of the theorem, namely that is of weak type . Furthermore, by homogeneity, it suffices to show that

We now fix some and set

and let be any compact subset of and our task is to obtain an estimate of the form , uniformly in .

For every there is a ball (of some radius) such that

The family clearly covers the compact set so we can extract a finite subcollection of balls which we denote by which still cover . Since we get that

Observe on the other hand that

so if we manage to show that

we would be done. The main obstruction to such an estimate is that the balls may overlap a lot. On the other hand, if the balls where disjoint (or `almost’ disjoint) then there would be no problem. Although we can’t directly claim that the family is non-overlapping, the following lemma will allow us to extract a subcollection of balls which has this property, without losing too much of the measure of the union of balls in the collection.

Lemma 3 (Vitali-type covering lemma)Let be a finite collection of balls. Then there exists a subcollection ofdisjoint ballssuch that

Before giving the proof of this covering lemma let us see how we can use it to conclude the proof of Theorem 2. Recall that we have extracted a finite collection of balls which cover the set and which satisfy

Now applying the covering lemma we can extract a subcollection of disjoint balls so that the measure of their union exceeds a multiple of the measure of the union of the original family of balls. Thus, we can write

Observe that this estimate is uniform over all compact sets so taking the supremum over such sets and using the inner regularity of the Lebesgue measure we conclude that

which concludes the proof.

*Proof of the Covering Lemma 3:* First of all let us assume that the balls are arranged in decreasing order of size (thus is the largest ball). We will choose the subcollection by the *greedy algorithm*. The first ball we choose in the subcollection is the largest ball, thus . Now assume we have chosen the balls for some . We choose the ball to be the largest ball which doesn’t intersect any of the balls already chosen. Observe that this amounts to choosing

We continue this process until we run out of balls. It is clear that the resulting subcollection consists of disjoint balls. On the other hand, every ball of the original collection is either selected or it intersects one of the selected balls, say, in the subcollection of greater or equal radius (otherwise the ball would be selected). Then it is not hard to see that

where is the ball with the same center as and three times its radius. Thus we have that

Taking the Lebesgue measure of both unions we conclude

and we are done.

Exercise 4 (The maximal function on the class )We saw that if is a non-trivial integrable function then is never integrable. Suppose however that is supported in a finite ball and that it is a `bit better’ than being integrable, namely it satisfies

where . We say in this case that . Then we have that and

Hints:(a) For show thatIt will help you to split the function as

and observe that .

(b) Show that

From this, (a) and Fubini’s theorem you can conclude the proof.

**3. Consequences of the maximal theorem **

Our first application of the maximal theorem has to do with the differentiability of the integral of a locally integrable function. Indeed, using Theorem 2 and Proposition 1 we immediately get the following.

Corollary 4 (Lebesgue differentiation theorem)Let be a locally integrable function. Then, for almost every we have that

For the proof just observe that and that the claimed convergence property is a local property thus one can confine any locally integrable function in a ball around the point which turns into an function. As we have already seen in Notes 3, the previous statement also implies the following:

Corollary 5Let . Then almost every point in is aLebesgue pointif , that is, we have that

for almost every .

Lebesgue’s differentiation theorem generalizes to more general averages. A manifestation of this is already presented in Theorem 15 of Notes 2 which asserts that for `nice’ approximations to the identity , the means converge to at every Lebesgue point of . Here we will give an alternative proof of this theorem by controlling the maximal operator by the Hardy-Littlewood maximal function.

Proposition 6Let be a positive and radially decreasing function with . Then we have that

*Proof:* First suppose that is of the form where and are Euclidean balls centered at for all . Then we have

However, any function which is positive and radially decreasing can be approximated monotonically from below by a sequence of simple functions of the form so we are done.

As an immediate corollary we get the same control for approximations to the identity which are controlled by positive radially decreasing functions.

Corollary 7Let almost everywhere where is positive, radially decreasing and integrable. Then we have that

In particular is of weak type and strong type for all . We conclude that

for almost every .

Remark 4The qualitative conclusion of the previous corollaries is that maximal averages of with radially decreasing integrable kernels are controlled by the Hardy-Littlewood maximal function. A typical radially decreasing integrable kernel is the Gaussian kernel

By dilating by we get

The function can be viewed as smooth approximation of the indicator function of a ball of radius (up to constants). Indeed, for say, we have that , while for the function decays very fast. Thus the kernel is not so different from .

** 3.1. Points of density and the Marcinkiewicz Integral **

A direct consequence of Lebesgue’s differentiation theorem is that almost every point of a measurable set is `completely’ surrounded by other points of the set. To make this precise, let us give a definition.

Definition 8Let be be a measurable set in and let . We say that is apoint of densityof the set , if

Of course the limit in the previous definition might not exist in general or not be equal to . Observe however that if the previous limit is equal to then is a point of density of the set the complement of . On the other hand, applying Lebesgue’s differentiation theorem to the function which is obviously locally integrable we get

for almost every . Thus we immediately get the following

Proposition 9Let be a measurable set. Then almost every point of is a point of density of . Likewise, almost every point is a point of density of .

Thus a point of density is in a measure theoretic sense completely surrounded by other points of . The measure of the set in the ball is proportional to the measure of the ball as and is a point of density.

Another way to describe this notion is the following. Let be a closed set and define . Of course if . Now think of in a neighborhood of zero so that the vector is in the neighborhood of . If then the distance of the point from is at most since and . Thus we have that whenever . That is, when the points approaches , the distance , that is the distance of from approaches zero. In fact the estimate above can be improved.

Proposition 10Let be a closed set. Then for almost every , as . This is true in particular if is a point of density of the set .

Exercise 5Prove Proposition 10 above. The is interpreted as follows: For every there exists some such that whenever .

We will be mostly interested in another instance of this principle that is reflected in the Marcinkiewicz integral. This will also come in handy in our study of oscillatory integrals in the next chapter.

For a closed set as before we define the *Marcinkiewicz integral associated to *, , as

Remark 5The previous theorem shows that, in average, is small enough whenever to make the integral converge locally. This can be seen as a variation of Proposition 10 though no direct quantitative connection is claimed.

Part (i) is obvious and is left as an exercise. For (ii) it will be enough to show the following:

Lemma 12Let be a closed set whose complement has finite measure. Then we set

Then for almost every . In particular we have

*Proof:* It is enough to show

since then is finite for almost every . To that end we write

Now fix a . As we obviously have that thus . Since all the quantities under the integral signs are positive the previous estimate implies

whenever . Integrating for we get

To get the proof of Theorem 11 we now use the previous lemma as follows. Let be a closed set and let be a ball of radius centered at . Let . Then is closed and so that . Thus the previous lemma applies to and we get that

for almost every where we denote by the distance from the set . Now observe that for and we have that ; indeed and thus . We conclude that

for almost every . Since every eventually belongs to some for some large we get the conclusion of the theorem.

Exercise 6(i) Show the following strengthened form of Lemma 12: For and locally integrable then

whenever is closed and .

(ii) Use (i) and the maximal theorem to conclude that for all .

**4. The dyadic maximal function **

We now come to a different approach to the maximal function theorem. On the one hand the `dyadic’ approach we will follow here already implies the maximal theorem presented in the previous paragraph. It is however interesting in its own right and it will give us the chance to present a dyadic structure on the Euclidean space which will come in handy in many different cases.

Consider the basic cube . A dyadic dilation of this cube is the cube where . Now we also consider integer translations of this cube of the form for some integer vector . We have the following definition:

Definition 13A dyadic cube ofgenerationis a cube of the form

where and . The family of disjoint cubes

defines the -th generation of dyadic cubes.

The dyadic cubes have the following basic properties.

(d1) The dyadic cubes in the generation are disjoint and their union is . Thus any point belongs to unique dyadic cube in the -th generation.

(d2) Two (different) dyadic cubes are either disjoint or one contains the other.

(d3) A dyadic cube in consists of exactly dyadic cubes of the generation . On the other hand, for any dyadic cube and any there is a unique dyadic cube in the generation that contains .

As a first instance of how things simplify and get sharper in the dyadic world, let us see the analogue of the Vitali covering lemma in the dyadic case.

Lemma 14 (Dyadic Vitali-type covering lemma)Let be a finite collection of dyadic cubes. There exists a subcollection of disjoint dyadic cubes such that

*Proof:* Let be the maximal cubes among , that is, the cubes that are not contained in any other cube of the collection . Then the cubes are disjoint (otherwise they wouldn’t be maximal). Also any cube that is not maximal is contained in the union .

Given a function and we set

Observe that given there is a unique cube that contains and then the value of at equals the average of the function over the cube . In fact, is the *conditional expectation* of with respect to the -algebra generated by the family . Observe that for every generation , if is a union of cubes in then

The operator is the discrete dyadic analogue of an approximation to the identity dilated at level . A difference however is that the averages here are not `centered’. Indeed, is the average of with respect to the cube whenever for some . However, is not the `center’ of the cube .

The *dyadic maximal function* is defined as

Thus the supremum is taken over all dyadic cubes that contain or, equivalently, over all generations of dyadic cubes. We have the analogue of the maximal theorem:

Theorem 15 (Dyadic Maximal Theorem)(i) The dyadic maximal function is of weak type with weak type norm at most :

for all . (ii) The dyadic maximal function is of strong type , for all ; for all we have

where the implied constant depends only on .

We conclude using Proposition 1 that

(iii) For every we have that

Exercise 7Give the proof of Theorem 15 above. Observe that the proof is essentially identical to that of Theorem 2 using the dyadic version of the Vitali covering Lemma instead of the non-dyadic one. For (ii) you need to observe that the statement is true for continuous functions (for example) and use Proposition 1.

Exercise 8 (The maximal function with respect to cubes)Let denote the maximal function with respect to cubes, that is,

where is the indicator function of the cube . Show that

where the implied constants depend only on the dimension .

Exercise 9Show the pointwise estimate

where the implied constant depends only on the dimension . On the other hand, show that the opposite estimate cannot be true. For example when test against the function . Conclude that the dyadic maximal theorem follows from the non-dyadic one (with a different constant though).Hint:Observe that if and is a dyadic cube, there exists a ball which contains and .

Exercise 10Consider thenon-centered maximal functionwith respect to cubes, or balls

where the supremum is taken over all Euclidean balls containing . Likewise

where the supremum is taken over all cubes (with sides parallel to the coordinate axes) that contain . Show that and are all pointwise equivalent, that is

**5. The Calderón-Zygmund decomposition **

Let be a measure space and be a measurable function (say) in . For a level we have many times used the decomposition of at level :

The function is the `good’ part of ; indeed we have that

Thus the good part adopts the -integrability of and furthermore it is bounded. On the other hand the `bad’ part satisfies

Thus the bad part also inherits the -integrability of but it also has `small’ support.

In a general measure space one cannot do much more than that in terms of decomposing in a good part and a bad part. If however there is also a metric structure in the space which is compatible with the measure, one can do a bit better and also get some control on the local oscillation of the bad part . Various forms of this decomposition are usually referred to as Calderón-Zygmund decompositions. We present here the basic example in the dyadic Euclidean setup.

Proposition 16 (Dyadic Calderón Zygmund decomposition)Let and . There exists a decomposition of of the form

where is a collection of disjoint dyadic cubes and the sum is taken over all the cubes . This decomposition satisfies the following properties:

(i) The `good part’ satisfies the bound(ii) The `bad part’ is ; each function is supported on and

(iii) For each we have

Furthermore we have that

In particular, from the dyadic maximal theorem we have

*Proof:* The proof is very similar to the proof of the dyadic covering lemma. We fix some level and let us call a dyadic cube *bad* if

If a dyadic cube is not bad we call it *good*. A bad cube will be called *maximal* if is bad and also there is no dyadic cube strictly containing is bad. Let us denote by the collection of maximal bad cubes. Since the cubes in the collection are dyadic and maximal, they are disjoint. Also, for any bad cube , let . We have that

Also, Since , every bad cube is contained in some maximal bad cube. Indeed, if is bad cube then as so monotone convergence implies that . It follows that there is a large enough such that

for all . Thus the dyadic cube is maximal and bad.

Now let be a maximal bad cube and consider the parent of , , that is the unique dyadic cube with double the side-length that contains . Since is maximal, has to be good so we have

and thus

for all maximal bad cubes . We set

whenever is a maximal bad cube. We also set

It is not hard to verify all the required properties of except maybe that . It is easy to see that

whenever is a bad cube. If and , then necessarily is good. We thus have that

since is good. Now, by the dyadic maximal theorem, we have that as with . Since we conclude that and we are done in this case as well.

Observe that in the previous decomposition of , the `bad set’, that is the set where lives, is given in the form

One could prove the Calderón-Zygmund decomposition starting from the set and decomposing it as a union of disjoint dyadic cubes. This sort of decomposition is interesting in its own right. Let us see how this can be done.

Proposition 17 (Dyadic Whitney decomposition)Let be an open set which is not all of . Then there exists a decomposition

where is a collection of disjoint dyadic cubes. For each we have

*Proof:* Let denote the dyadic cubes inside such that

Obviously but the opposite inclusion is also true. Indeed, if note that is contained in some dyadic cube since is open. Now for a dyadic cube let be its `parent’, that is the unique dyadic cube of side twice the side-length of , containing . Considering successive parents of there will be a dyadic cube containing with diameter greater than and less than . Thus and . The collection of dyadic cubes is not necessarily disjoint so we only choose the cubes in which are maximal with respect to set inclusion and call this collection again . Now maximal and dyadic means disjoint so we are done.

Using the Whitney decomposition lemma one can give an alternative proof of the Calderón-Zygmund decomposition by taking

and noting that the latter set is open.

As a corollary we get a control of the level sets of the usual (non-dyadic) maximal function by the level sets of the dyadic maximal function.

Corollary 18For all we have that

*Proof:* Let be the collection of dyadic cubes obtained by the Calderón-Zygmund decomposition at level . We have that

We write for the cube with the same center as and twice its side-length.

Indeed, let and be any cube centered at . Denoting by the side-length of , we choose so that . Then intersects cubes in the -th generation , and let us call them . Observe that none of these cubes can be contained in any of the because otherwise we would have that . Thus the average of on each is at most so

This proves the claim (2) and thus the corollary.

Exercise 11Using the dyadic maximal theorem only, conclude that the operators are of weak type .

** 5.1. The Fefferman-Stein weighted inequality. **

We give a first application of the Calderón-Zygmund decomposition which in some sense is the prototype of a weighted norm inequality. It is a variation of the maximal theorem where the Lebesgue measure is replaced by a measure of the form for some non-negative measurable function . It then turns out that the maximal function maps to boundedly for all and that it also satisfies a weak endpoint analogue for . In particular we have

Theorem 19 (Fefferman-Stein inequality)Let be a non-negative locally integrable function (a `weight’).

(i) We have that

for all with .

(ii) In the endpoint case we get the weak analogue

for all .

*Proof:* We will show that and that the weak inequality in (ii) holds. Then the Marcinkiewicz interpolation theorem will give (i) as well.

The bound

is trivial and is left as an exercise. We turn our attention to the -bound. Let be the collection of the dyadic cubes obtained from the Calderón-Zygmund decomposition at level . By the proof of Lemma 18 we have that

where is the cube with the same center as and twice its side-length. We have

Again, from the Calderón-Zygmund decomposition (at level ) we have that

for all of the decomposition. Combining the last two estimates we can write

For fixed the term is non-zero if and only if . Thus the previous estimate implies

where is the non-centered maximal function associated to cubes. See Exercise 8. Since this concludes the proof.

Exercise 12 (Heldberg’s inequality and Hardy-Littlewood-Sobolev theorem)Let , and .

(i) Show Heldberg’s inequality: If then

(ii) Use the Hardy-Littlewood maximal theorem and (i) to conclude that Hardy-Littlewood-Sobolev theorem: For every we have that

Hint:In order to show (i) split the integralwhere is a parameter to be chosen later on. For observe that

Observe that is decreasing, radial, non-negative and integrable (since ). Use Proposition 6 and the calculation in its proof to show the bound

For use Hölder’s inequality to show

Choose the parameter to minimize the sum . Part (ii) is a trivial consequence of (i).

*[Update 4 Apr 2011: Section 3.1 concerning the Marcinkiewicz integral added; numbering changed.*

*Update 9th May 2011: Typo in the hint of Exercise 1 corrected.]
*

**1. The space of Schwartz functions as a Fréchet space **

We recall that the space of Schwartz functions consists of all smooth (i.e. infinitely differentiable) functions such that the function itself together with all its derivatives decay faster than any polynomial at infinity. To make this more precise it is useful to introduce the *seminorms* defined for any non-negative integer as

where are multi-indices and as usual we write . Thus if and only if and for .

It is clear that is a vector space. We have already seen that a basic example of a function in is the Gaussian and it is not hard to check that the more general Gaussian function , where is a positive definite real matrix, is also in . Furthermore, the product of two Schwartz functions is again a Schwartz function and the space is closed under taking partial derivatives or multiplying by complex polynomials of any degree. As we have already seen (and it’s obvious by the definitions) the space of infinitely differentiable functions with compact support is contained in , , and each one of these spaces is a dense subspace of for any and also in , in the corresponding topologies.

The seminorms defined above define a topology in . In order to study this topology we need the following definition:

Definition 1A Fréchet space is a locally convex topological vector space which is induced by a complete invariant metric.

**A translation invariant metric on .** It is not hard to actually define a metric on which induces the topology. Indeed for two functions we set

The function is translation invariant, symmetric and that it separates the elements of . The metric induces a topology in ; a set is open if and only if there exists exists and such that

**Convergence in .** By definition, a sequence converges to if as . A more handy description of converging sequences in is given by the following lemma.

Lemma 2A sequence converges to if and only if

for all .

*Proof:* First assume that as . Then, since

converges to zero as and all summands are positive, we conclude that for every we have that

as . However, this easily implies that as , for every .

Assume now that as for every and let . We choose a positive integer such that .

Thus,

Now, every term in the finite sum of the first summand converges to as and we get that as .

** is a topological vector space.** The topology induced by turns into a topological vector space. To see this we need to check that addition of elements in and multiplication by complex constants are continuous with respect to . This is very easy to check.

**Local convexity.** For and consider the family of sets

We claim that is a neighborhood basis of the point for the topology induced by . Indeed, the system defines a neighborhood basis of . On the other hand it is implicit in the proof of Lemma 2 that for every there is some and some such that . This proves the claim.

Now, in order to show that endowed with the topology induced by is locally convex it suffices (by translation invariance) to show that the point has a neighborhood basis which consists of convex sets. This is clear for the neighborhood basis defined above since the seminorms are positive homogeneous. Observe however that the balls are not convex.

Exercise 1Show that the balls , , arenotconvex sets.

**Completeness.** The space is a complete topological vector space with the topology induced by . If is a Cauchy sequence in then for every , the sequence

is a Cauchy sequence in the space , with the topology induced by the supremum norm. Since this space is complete we conclude that converges uniformly to some . A standard uniform convergence argument shows now that .

Remark 1In general, a sequence in a topological vector space is called aCauchy sequenceif for every open neighborhood of zero , there exists some positive integer so that for all . If the topology is induced by a translation invariant metric, this definitions coincides with the more familiar one, that is: for every there exists such that whenever .

The discussion above gives the following:

Theorem 3The space , endowed with the metric and the topology induced by , is a Fréchet space.

We now give a general Lemma that describes continuity of linear operators acting on by giving a simple description of continuity of linear transformations.

Lemma 4(i) Let be a Banach space and be a linear operator. Then is continuous if and only if there exists and such that

for all .

(ii) Let be a linear operator. Then is continuous if and only if for each there exists and such that

for all .

*Proof:* For *(i)* it is clear that is continuous if (1) holds. On the other hand, suppose that is continuous and let be the open ball of center and radius in . Then is a neighborhood of in and hence it contains some . Thus implies that . Now we have that

Similarly, if is continuous then for every there is so that

This implies (2) using the same trick we used to deduce (1).

It is obvious that for every , . Let us show however that this embedding is also continuous:

Proposition 5Let . Then the identity map is continuous, that is, there exists so that

for all .

*Proof:* Let . For and we have that

If observe that so there is nothing to prove.

**2. The Fourier transform on the Schwartz class **

Since there is no difficulty in defining the Fourier transform on by means of the formula

All the properties of that we have seen in the previous week’s notes are of course valid for the Fourier transform on . As we shall now see, there is much more we can say for the Fourier transform on .

For and every polynomial we have that . Using the commutation relations

we see that . Furthermore, since we can use the inversion formula to write

This shows that is onto and of course it is a one to one operator as we have already seen. Finally let us see that it is also a continuous map. To see this observe that

for every , by Proposition 5. However, so we get that

for every which shows that is continuous.

We have thus proved the following:

Theorem 6The Fourier transform is ahomeomorphismof onto itself. The operator

is the continuous inverse of on :

on .

We immediately get Plancherel’s identities:

Corollary 7Let . We have that

In particular, for every we have that

*Proof:* The multiplication formula of the previous week’s notes reads

for and thus for . Now let and apply this formula to the functions where . Observing that we get the first of the identities in the corollary. Applying this identity to the functions and we also get the second.

We also get an nice proof of the fact that convolution of Schwartz functions is again a Schwartz function.

Corollary 8Let . Then .

*Proof:* For we have that . Since we conclude that and thus that .

**3. The Fourier transform on **

We have already seen that the Fourier transform is defined for functions by means of the formula

While this integral converges absolutely for , this is not the case in general for . However, Corollary 7 says that the Fourier transform is a bounded linear operator on which is a dense subset of and in fact we have that

for every . As we have seen several times already, this means that the Fourier transform has a unique bounded extension, which we will still denote by , throughout . In fact the Fourier transform is an *isometry* on as identity (3) shows.

Definition 9A linear operator which is an isometry and maps onto is called aunitary operator.

Corollary 10The Fourier transform is a unitary operator on .

The definition of the Fourier transform on given above suggest that given , one should find a sequence such that in and define

This, however, is a bit too abstract. The following lemma gives us an alternative way to calculate the Fourier transform on .

Lemma 11Let . The following formulas are valid

where the notation above means that the limits are considered in the norm.

*Proof:* Given let us define the functions

Then on the one hand we have that in . On the other hand the functions belong to for all so we can write

Since the Fourier transform is an isometry on we also have that as in . The proof of the second formula is similar.

**4. The Fourier transform on and Hausdorff-Young **

Having defined the Fourier transform on and on we are now in position to interpolate between these two spaces. Indeed, we have established that

and that is of strong type and of strong type both with norm . We have also seen that it is well defined on the simple functions with finite measure support and on the Schwartz class, both dense subsets of all spaces for . Setting we get where is the dual exponent of . This shows that . The Riesz-Thorin interpolation theorem now applies to show the following:

Theorem 12 (Hausdorff-Young Theorem)For the Fourier transform extends to bounded linear operator

of norm at most , that is we have

Remark 2This is one instance where the Riesz-Thorin interpolation theorem fails to give the sharp norm, although the endpoint norms are sharp. Indeed, the actual norm of the Fourier transform is

This is a deep theorem that has been proved firstly by K.I. Babenko in the special case that is an even integer and then by W. Beckner in the general case.

Exercise 2Let be a general Gaussian function of the form

for some positive definite real matrix . Show that

Observe that this gives a lower bound on the norm .

Hint:Write as a composition of translations, modulations and generalized dilations of the basic Gaussian function .

Remark 3The inversion problem for , has a similar solution as the case. One can easily see that the means of converge to in as well as for every Lebesgue point of if is appropriately chose. In particular this is the case for the Abel or Gauss means of .

We also have the following extension on the action of the Fourier transform on convolutions.

Proposition 13Let and for some . Then, as we know, the function belongs to . We have that

for almost every .

We close this section by discussing the possibility of other mapping properties of the Fourier transform, besides the ones given by the Hausdorff-Young theorem. In particular we have seen that the Fourier transform is of strong type for all . But are there any other pairs for which the Fourier transform is of strong, or even weak type ?

The easiest thing to see is that whenever is of type we must have that .

Exercise 3Suppose that is of weak type . Show that we must necessarily have .

Hint:Exploit the scale invariance of the Fourier transform; in particular remember the symmetry .

The previous exercise thus shows that the only possible type for is of the form . The Hausdorff-Young theorem shows that this is actually true whenever . It turns out however that the bound fails for . The following exercise describes one way to prove this.

Exercise 4Suppose that cannot be of strong type when . (i) Let be a large positive integer and . For consider the function

Show that

(ii) For any show that

if and are large enough. For this show first the endpoint bounds for and . This will also give you the intermediate upper bounds by log-convexity. For the lower bounds, consider the values of close to integer multiples of .

(iii) The previous steps show that

which allows you to conclude the proof.

**5. The space of tempered distributions **

The purpose of this paragraph is to introduce a space of `generalized functions’ that is much larger than all the spaces we have seen so far, namely *the space of tempered distributions*. Let us begin with an informal discussion, drawing some analogies with some more classical (though not so classical) function spaces.

We have seen already that at whenever and the underlying measure is -finite, then the space can be identified with the dual , by means of the pairing:

This is already quite interesting. A function in is already a generalized object in the sense that it is only defined up to sets of measure zero; so in fact it represents and equivalent class. Furthermore, it can be identified with a linear functional acting on another function space.

We have see that the space is contained in every space and furthermore that it is dense in for all . Restricting our attention to a smaller class of function, the space , we get a larger dual space:

We thus obtain a space of generalized functions that contains the `classical’ spaces. As we shall see, this space is much bigger and in particular it allows us to differentiate (in the appropriate sense) and remain in this class of generalized functions and, most notably, consider the Fourier transform of these objects and still remain in the class. These operation many times are not even available on spaces; for example we cannot even define the Fourier transform on for . Furthermore, even when there is a way to define these operations on functions we don’t necessarily stay in the given class of functions. For example, while it is perfectly legitimate to define the Fourier transform of an function, the resulting function is not in general an integrable function. We shall see that the fact that is closed under taking partial derivatives, multiplying by polynomials and by taking the Fourier transform of its elements, its dual space is also closed under the corresponding operations.

In what follows we will many times write for the dual and for the pairing .

Definition 14A linear functional will be called atempered distributionif it is continuous on with respect to the topology on described in the previous sections.

That is, the linear functional is a tempered distribution if and only if there exists some