## DMat0101, Notes 5: The Hardy-Littlewood maximal function

1. Averages and maximal operators

This week we will be discussing the Hardy-Littlewood maximal function and some closely related maximal type operators. In order to have something concrete let us first of all define the averages of a locally integrable function ${f\in L^1 _{\textnormal{loc}}({\mathbb R}^n)}$ around the point ${x\in{\mathbb R}^n}$:

$\displaystyle A_r(f)(x)=\frac{1}{|B(x,r)|}\int_{B(x,r)}f(y)dy,$

where ${B(x,r)}$ is the Euclidean ball with center ${x\in{\mathbb R}^n}$ and radius ${r>0}$ and ${|B(x,r)|}$ denotes its Lebesgue measure. Note that since Lebesgue measure is translation invariant we have

$\displaystyle |B(x,r)|=|B(0,r)|=r^n |B(0,1)|=\Omega_n r^n,$

where ${\Omega_n}$ denotes the Lebesgue measure (or volume in this case) of the ${n}$-dimensional unit ball ${B(0,1)\subset{\mathbb R}^n}$. Denoting by ${\chi}$ the indicator function of the normalized unit ball

$\displaystyle \chi(x)=\frac{1}{|B(0,1)|}\chi_{B(0,1)}(x),$

and noting that the balls centered at zero are ${0}$-symmetric, we can write

$\displaystyle \begin{array}{rcl} A_r(f)(x)&=&\frac{1}{|B(0,1)|r^n }\int_{B(0,r)}f(x-y)dy\\ \\ &=&\int_{{\mathbb R}^n} f(x-y)\frac{1}{|B(0,1)|r^n}\chi_{B(0,1)}({y}/{r})dy \\ \\ &=& (f*\chi_r)(x). \end{array}$

Thus

$\displaystyle A_r(f)(x)=(f*\chi_r)(x),$

and of course ${\chi_r}$ is an approximation to the identity since ${\int_{{\mathbb R}^n}|\chi|=\int_{{\mathbb R}^n}\chi=1}$ and ${\chi_r}$ are just the dilations of the function ${\chi}$:

$\displaystyle \chi_r(x)=\frac{1}{r^n}\chi(\frac{x}{r}).$

Remembering the discussion that followed the definition of the convolution in Notes 2, the convolution of a locally integrable function ${f}$ with the dilations of an ${L^1}$ function ${\phi}$ was viewed as an averaging process. We see now that when ${\phi=\chi}$ this is exact, that is, ${f*\chi_r}$ is the average of ${f}$ with respect to a ball around ${x}$ of radius ${\sim_n r}$ where the implied constant only depends on the dimension. A similar conclusion follows if we start with any set ${K}$ that is say ${0}$-symmetric and convex and normalized to volume ${|K|=1}$. We then have that

$\displaystyle f*(\chi_K)_r=\frac{1}{|rK|}\int_{x+rK}f(y)dy=A_r ^{K}(f)(x),$

that is, ${(f*(\chi_K)_r)(x)}$ are the averages of ${f}$ with respect to the dilations of the fixed convex body ${K}$ at every point ${x\in{\mathbb R}^n}$. Here we denote by ${rK}$ the dilations of ${K}$

$\displaystyle rK:=\{rx:x\in K\}.$

It is an easy exercise to show that all these averages are uniformly bounded in size. For all ${1\leq p \leq \infty}$ we have

$\displaystyle \|A_r ^K(f)\|_{L^p({\mathbb R}^n)}\leq \|f\|_{L^p({\mathbb R}^n)}.$

One of course could consider more general sets ${K}$ instead of convex sets which are ${0}$-symmetric and in fact this leads to one of the most interesting family of problems in harmonic analysis. This however falls outside the scope of this course and we will mostly focus on the case of the normalized unit ball which in some sense is the prototypical example.

The Hardy-Littlewood maximal operator (with respect to Euclidean balls) is defined as

$\displaystyle M(f)=\sup_{r>0} A_r(|f|)=\sup_{r>0} |f|*\chi_r.$

Observe that this is a sublinear operator that is well defined at least when ${f}$ is locally integrable. Although maximal operators are interesting in their own right, there are some very specific applications we have in mind. The first has to do with pointwise convergence of averages of a function and is a consequence of the following simple proposition.

Proposition 1 Let ${\{T_t\}_{t>0}}$ be a family of sub-linear operators on ${L^p(X,\mu)}$ and define the maximal operator

$\displaystyle T^*(f)(x)=\sup_{t>0} |T_t(f)(x)|.$

If ${T^*}$ is of weak type ${(p,q)}$ then for any ${t_o>0}$ the set

$\displaystyle \{f\in L^p(X,\mu): \lim_{t\rightarrow t_o}T_t f(x)=f(x)\ \ \mbox{a.e.}\}$

is closed in ${L^p(X,\mu)}$

Proof: In order to show that the set

$\displaystyle E_{T^*}:=\{f\in L^p(X,\mu): \lim_{t\rightarrow t_o}T_tf(x)=f(x)\ \ \mbox{a.e.}\}$

is closed, consider a sequence of functions ${\{f_n\}\subset E_{T^*}}$ with ${f_n\rightarrow f}$ in ${L^p({\mathbb R}^n)}$. We need to show that ${f\in E_{T^*}}$. To see this observe that for almost every ${x\in{\mathbb R}^n}$ we have

$\displaystyle \begin{array}{rcl} \limsup_{t\rightarrow t_o} |T_tf(x)-f(x)|&\leq& |T_t(f-f_n)(x)-(f-f_n)(x)| \\ \\ &\leq& \sup_{t>0}|T_t(f-f_n)(x)| +|(f-f_n)(x)|\\ \\ &=&T^*(f-f_n)(x)+|(f-f_n)(x)|. \end{array}$

Thus for any ${\lambda>0}$ we can write

$\displaystyle \begin{array}{rcl} && \mu(\{x\in X: \limsup_{t\rightarrow t_o} |T_tf(x)-f(x)|>\lambda\}) \\ \\ &\leq&\mu(\{x\in X:T^*(f-f_n)(x)>\lambda/2 \}) +\mu(\{x\in X:|(f-f_n)(x)|>\lambda/2 \})\\ \\ &\lesssim_{T^*} &\frac{\|f-f_n\|_p ^q}{\lambda ^q}+\frac{\|f_n-f\|_p ^ p}{\lambda^p}. \end{array}$

Since the right hand side tends to ${0}$ as ${n\rightarrow \infty}$ and the left hand side does not depend on ${n}$ we conclude that for every ${\lambda>0}$

$\displaystyle \mu(\{x\in X:\limsup_{t\rightarrow t_o} |T_tf(x)-f(x)|>\lambda \})=0.$

Now we have that

$\displaystyle \begin{array}{rcl} && \mu(\{x\in X:\limsup_{t\rightarrow t_o} |T_tf(x)-f(x)|>0 \})\\ \\ &\leq & \sum_{k=1} ^\infty \mu(\{x\in X:\limsup_{t\rightarrow t_o} |T_tf(x)-f(x)|>\frac{1}{k}\})=0. \end{array}$

Thus ${\lim_{t\rightarrow t_o} T_t(f)(x)=f(x)}$ for almost every ${x\in{\mathbb R}^n}$ so that ${f\in E_{T^*}}$. $\Box$

Remark 1 We have indexed the family ${T_t}$ in ${t\in{\mathbb R}_+}$ for the sake of definiteness but one can of course consider more general index sets and the previous proposition remains valid. In every case that the index set is uncountable some attention should be given in assuring the measurability of ${T_t *(f)}$.

Remark 2 To get a clearer picture of what this proposition says consider the family of operators

$\displaystyle T_t(f)(x)=(f*\phi_t)(x),$

for some ${\phi\in L^1({\mathbb R}^n)}$ with integral ${\int \phi=1}$. As we have seen already many times, these averages of ${f}$ converge to ${f}$ in many different senses for different classes of functions ${f}$. In particular if ${f\in C^\infty _c({\mathbb R}^n)}$ then ${f*\phi_t}$ converges to ${f}$ even uniformly as ${t\rightarrow 0}$. Thus we have

$\displaystyle C^\infty _c({\mathbb R}^n)\subset \{f\in L^p(X,\mu): \lim_{t\rightarrow 0 }T_tf(x)=f(x)\ \ \mbox{a.e.}\}.$

Since ${C^\infty _c({\mathbb R}^n)}$ is dense in ${L^p}$, Proposition 1 implies that if ${T^*}$ is of weak type ${(p,q)}$ then

$\displaystyle \lim_{t\rightarrow 0 } (f*\phi_t)(x)=f(x),$

for almost every ${x\in {\mathbb R}^n}$. Thus in order to show that approximations to the identity converge to the function almost everywhere it is enough to show that the corresponding maximal operator is of weak type ${(p,q)}$. In what follows we will show that the Hardy-Littlewood maximal operator is of weak type ${(1,1)}$ and this already implies the corresponding statement for a wide class of nice’ approximations to the identity.

To avoid confusion, remember that in Theorem 15 of Notes 3 we have already exhibited that

$\displaystyle \lim_{t\rightarrow 0 }(f*\phi_t)(x)= f(x)$

for every Lebesgue point ${x}$ of ${f}$. However this is only interesting if we already know that ${f}$ has many’ Lebesgue points (in particular almost every point in ${{\mathbb R}^n}$). In Theorem 15 of Notes 3 we took for granted that the integral of a locally integrable function is almost everywhere differentiable and this in turn implied that almost every point in ${{\mathbb R}^n}$ is a Lebesgue point of ${f}$. In this part of the course we will fill in this gap by showing that the integral of a locally integrable function is almost everywhere differentiable.

Exercise 1 Let ${T^*(f)(x)=\sup_{t>0}|T_tf(x)|}$ be of weak type ${(p,q)}$. Show that for every ${t_o>0}$ the set

$\displaystyle \{ f\in L^p(X,\mu): \lim_{t\rightarrow t_o} T_t f(x) \mbox{ exists a.e. } \}$

is closed in ${L^p(X,\mu)}$.

Hint: The proof is very similar to that of Proposition 1. Observe that it suffices to show that

$\displaystyle \mu(\{x\in X: \limsup_{t\rightarrow t_o}T_tf(x)-\liminf_{t\rightarrow t_o}T_t f(x)>\lambda \})=0,$

for every ${\lambda>0}$.

2. The Hardy-Littlewood maximal theorem

We focus our attention to the Hardy-Littlewood maximal operator; for ${f\in L^1 _{\textnormal{loc}}}$

$\displaystyle M(f)(x)=\sup_{r>0}\frac{1}{|B(x,r)|}\int_{B(x,r)}|f(y)| dy,\quad x\in {\mathbb R}^n.$

The discussion in the previous section suggests that one should try to prove weak ${(p,q)}$ bounds for the operator ${M}$. In fact we will prove the following theorem which summarizes the boundedness properties of ${M}$.

Theorem 2 (Hardy-Littlewood maximal theorem) (i) The Hardy-Littlewood maximal operator is of strong type ${(p,p)}$ for ${1\leq p <\infty}$:

$\displaystyle \|M(f)\|_{L^p({\mathbb R}^n)}\lesssim_{p,n} \|f\|_{L^p({\mathbb R}^n)},$

for all ${1 and ${f\in L^p({\mathbb R}^n)}$.

(ii) The Hardy-Littlewood maximal operator if of weak type ${(1,1)}$:

$\displaystyle |\{x\in{\mathbb R}^n: M(f)(x)>\lambda \}|\lesssim_n \frac{\|f\|_{L^1({\mathbb R}^n)}}{\lambda},\quad \lambda>0,$

for all ${f\in L^1({\mathbb R}^n)}$.

Remark 3 The Hardy-Littlewood maximal operator is not of strong type ${(1,1)}$. To see this note that for any ${f\in L^1({\mathbb R}^n)}$ we have that

$\displaystyle M(f)(x)\gtrsim_{f} \frac{1}{|x|^n},\quad |x|\rightarrow \infty,$

which shows in particular that ${M(f)}$ is never integrable whenever ${f\in L^1({\mathbb R}^n)}$ is not identically ${0}$. Moreover, no strong estimates of type ${(p,q)}$ are possible whenever ${p\neq q}$ as can be seen by examining the dilations of ${f}$ and ${Mf}$.

Exercise 2 Prove the assertions in the previous remark.

Exercise 3 Let ${f\in L^1({\mathbb R}^n)}$ and let ${B}$ be a ball such that ${M(f)(x)>\lambda}$ for every ${x\in B}$. Let ${B^*}$ be the ball with the same center and twice the radius of ${B}$. Show that ${M(f)(x)\gtrsim_n \lambda }$ for every ${x\in B^*}$.

Proof of Theorem 2: First of all let us observe that ${M}$ is of strong type ${(\infty,\infty)}$. This is just a consequence of the general fact that an average never exceeds a maximum’. In view of the Marcinkiewicz interpolation theorem it then suffices to show the assertion (i) of the theorem, namely that ${M}$ is of weak type ${(1,1)}$. Furthermore, by homogeneity, it suffices to show that

$\displaystyle |\{x\in {\mathbb R}^n: Mf(x) >1 \}|\lesssim_{n} \|f\|_{L^1({\mathbb R}^n)}.$

We now fix some ${f\in L^1({\mathbb R}^n)}$ and set

$\displaystyle E =\{x\in{\mathbb R}^n: |Mf(x)|>1\},$

and let ${K\subset E }$ be any compact subset of ${E}$ and our task is to obtain an estimate of the form ${|K|\lesssim_n \|f\|_{L^1({\mathbb R}^n)}}$, uniformly in ${K\subset E}$.

For every ${x\in K}$ there is a ball ${B_x}$ (of some radius) such that

$\displaystyle \int_{B_x} |f(y)|dy > |B_x| .$

The family ${\{B_x\}_{x\in K}}$ clearly covers the compact set ${K}$ so we can extract a finite subcollection of balls which we denote by ${\{B_m\}_{m=1} ^N}$ which still cover ${K}$. Since ${K\subset \cup_{m=1} ^N B_m}$ we get that

$\displaystyle |K|\leq \sum_{m=1} ^N |B_m| < \sum_{m=1} ^N \int_{B_m} |f(x)|dx.$

Observe on the other hand that

$\displaystyle \int_{U_m B_m}|f(x)|\leq \|f\|_{L^1({\mathbb R}^n)},$

so if we manage to show that

$\displaystyle \sum_{m=1} ^n \int_{B_m}|f(x)|dx\simeq_n \int_{U_{m=1} ^N B_m}|f(x)|,$

we would be done. The main obstruction to such an estimate is that the balls ${B_m}$ may overlap a lot. On the other hand, if the balls ${B_m}$ where disjoint (or almost’ disjoint) then there would be no problem. Although we can’t directly claim that the family ${\{B_m\}}$ is non-overlapping, the following lemma will allow us to extract a subcollection of balls which has this property, without losing too much of the measure of the union of balls in the collection.

Lemma 3 (Vitali-type covering lemma) Let ${B_1,\ldots , B_N}$ be a finite collection of balls. Then there exists a subcollection ${B_{n_1},\ldots,B_{n_M}}$ of disjoint balls such that

$\displaystyle \sum_{j=1} ^M |B_{n_j}|=|\cup_{j=1} ^M B_{n_j}|\geq 3^{-n}|\cup_{i=1} ^N B_i|.$

Before giving the proof of this covering lemma let us see how we can use it to conclude the proof of Theorem 2. Recall that we have extracted a finite collection of balls ${\{B_m\}_{m=1} ^N}$ which cover the set ${K}$ and which satisfy

$\displaystyle \int_{B_m}|f(x)|dx >|B_m|,\quad m=1,2,\ldots,N.$

Now applying the covering lemma we can extract a subcollection of disjoint balls ${\{B_{n_j}\}_{j=1} ^M}$ so that the measure of their union exceeds a multiple of the measure of the union of the original family of balls. Thus, we can write

$\displaystyle \begin{array}{rcl} |K|\leq |\cup_{m=1} ^N B_m|&\leq 3^n & |\cup_{j=1} ^M B_{n_j}|= 3^n \sum_{j=1} ^M|B_{n_j}|\\ \\ &<&3^n \sum_{j=1} ^M \int_{B_{n_j}}|f(x)|dx =3^n \int_{\cup_{j=1} ^M B_{n_j}} |f(x)|dx \\ \\ &\leq & 3^n \int_{{\mathbb R}^n}|f(x)|dx = 3^n \|f\|_{L^1({\mathbb R}^n)}. \end{array}$

Observe that this estimate is uniform over all compact sets ${K\subset E}$ so taking the supremum over such sets and using the inner regularity of the Lebesgue measure we conclude that

$\displaystyle |E|\leq 3^n \|f\|_{L^1({\mathbb R}^n)},$

which concludes the proof. $\Box$

Proof of the Covering Lemma 3: First of all let us assume that the balls ${B_1,\ldots,B_N}$ are arranged in decreasing order of size (thus ${B_1}$ is the largest ball). We will choose the subcollection ${B_{n_1},\ldots,B_{n_M}}$ by the greedy algorithm. The first ball we choose in the subcollection is the largest ball, thus ${B_{n_1}=B_1}$. Now assume we have chosen the balls ${B_{n_1},B_{n_2},\ldots,B_{n_i}}$ for some ${i\geq 1}$. We choose the ball ${B_{n_{i+1}}}$ to be the largest ball which doesn’t intersect any of the balls already chosen. Observe that this amounts to choosing

$\displaystyle n_{i+1}:=\max\{j: 1\leq j \leq N, B_j\cap B_{n_\ell}=\emptyset \quad\mbox{for all}\quad \ell=1,2,\ldots,i\}.$

We continue this process until we run out of balls. It is clear that the resulting subcollection ${\{B_{n_j}\} }$ consists of disjoint balls. On the other hand, every ball ${B}$ of the original collection is either selected or it intersects one of the selected balls, say, ${B_{n_j}}$ in the subcollection of greater or equal radius (otherwise the ball ${B}$ would be selected). Then it is not hard to see that

$\displaystyle B\subset B^*_{n_j},$

where ${B^* _{n_j}}$ is the ball with the same center as ${B_{n_j}}$ and three times its radius. Thus we have that

$\displaystyle B_1\cup\cdots\cup B_N \subset B^* _{n_1}\cup\cdots\cup B^* _{n_M}.$

Taking the Lebesgue measure of both unions we conclude

$\displaystyle |B_1\cup\cdots\cup B_N| \leq 3^n |B_{n_1}\cup\cdots B_{n_M}|,$

and we are done. $\Box$

Exercise 4 (The maximal function on the class ${L\log L}$) We saw that if ${f}$ is a non-trivial integrable function then ${\mathcal M(f)}$ is never integrable. Suppose however that ${f}$ is supported in a finite ball ${B\subset {\mathbb R}^n}$ and that it is a bit better’ than being integrable, namely it satisfies

$\displaystyle \|f\|_{L\log L (B)}:=\int_B |f(x)|\log^+|f(x)|<+\infty.$

where ${\log^+x=\max(\log x,0)}$. We say in this case that ${f\in L\log L(B)}$. Then we have that ${M(f)\in L^1(B)}$ and

$\displaystyle \|Mf\|_{L^1(B)}\lesssim |B|+\|f\|_{L\log L(B)}.$

Hints: (a) For ${\lambda>0}$ show that

$\displaystyle |\{x\in B: M(f)(x)>2\lambda\}| \lesssim \frac{1}{\lambda}\int_{\{x\in B:|f(x)|>\lambda\}}|f(x)|.$

It will help you to split the function ${f}$ as

$\displaystyle f=f\chi_{\{|f|>\lambda\}}+f\chi_{\{|f|<\lambda\}}=:f_2+f_1,$

and observe that ${\|M(f_1)\|_{L^\infty(B)}<\lambda}$.

(b) Show that

$\displaystyle \int_B M(f)(x)dx \leq 2 |B|+2\int_1 ^\infty |\{x\in B:M(f)(x)>2\lambda\}|d\lambda.$

From this, (a) and Fubini’s theorem you can conclude the proof.

3. Consequences of the maximal theorem

Our first application of the maximal theorem has to do with the differentiability of the integral of a locally integrable function. Indeed, using Theorem 2 and Proposition 1 we immediately get the following.

Corollary 4 (Lebesgue differentiation theorem) Let ${f\in L^1 _{\textnormal{loc}}({\mathbb R}^n)}$ be a locally integrable function. Then, for almost every ${x\in{\mathbb R}^n}$ we have that

$\displaystyle \lim_{r\rightarrow 0 }\frac{1}{|B(x,r)|}\int_{B(x,r)}f(y)dy =f(x).$

For the proof just observe that ${|A_t(f)(x)|\leq M(f)(x)}$ and that the claimed convergence property is a local property thus one can confine any locally integrable function in a ball around the point ${x}$ which turns ${f}$ into an ${L^1}$ function. As we have already seen in Notes 3, the previous statement also implies the following:

Corollary 5 Let ${f\in L^1 _{\textnormal{loc}}({\mathbb R}^n)}$. Then almost every point in ${{\mathbb R}^n}$ is a Lebesgue point if ${f}$, that is, we have that

$\displaystyle \lim _{r\rightarrow 0}\frac{1}{|B(x,r)|}\int_{B(x,r)}|f(x)-f(y)|dy=0,$

for almost every ${x\in{\mathbb R}^n}$.

Lebesgue’s differentiation theorem generalizes to more general averages. A manifestation of this is already presented in Theorem 15 of Notes 2 which asserts that for nice’ approximations to the identity ${\phi}$, the means ${f*\phi_t}$ converge to ${f}$ at every Lebesgue point of ${f}$. Here we will give an alternative proof of this theorem by controlling the maximal operator ${\sup_{t>0}f*\phi_t}$ by the Hardy-Littlewood maximal function.

Proposition 6 Let ${\phi\in L^1({\mathbb R}^n)}$ be a positive and radially decreasing function with ${\int_{{\mathbb R}^n}\phi(x)dx=1}$. Then we have that

$\displaystyle \sup_{t>0} (f*\phi_t)(x)\leq \|\phi\|_{L^1({\mathbb R}^n)} M(f)(x).$

Proof: First suppose that ${\phi}$ is of the form ${\phi(x)=\sum_{j=1} ^N a_j \chi_{B_j}}$ where ${a_J>0}$ and ${B_j}$ are Euclidean balls centered at ${0}$ for all ${j=1,2,\ldots,N}$. Then we have

$\displaystyle \begin{array}{rcl} \phi*f(x)&=&\sum_{j=1} ^N a_j (f*\chi_{B_j})(x)=\sum_{j=1} ^N a_j |B_j| \frac{1}{|B_j|} (f*\chi_{B_j})(x)\\ \\ &\leq& \sum_{j=1} ^N a_j|B_j|\ M(f)(x)= \int_{{\mathbb R}^n}\phi(x)dx\ M(f)(x)\\ \\ &=&\|\phi\|_{L^1({\mathbb R}^n)} M(f)(x). \end{array}$

However, any function ${\phi}$ which is positive and radially decreasing can be approximated monotonically from below by a sequence of simple functions of the form ${\sum a_j \chi_{B_j}}$ so we are done. $\Box$

As an immediate corollary we get the same control for approximations to the identity which are controlled by positive radially decreasing functions.

Corollary 7 Let ${|\phi(x)|\leq \psi(x) }$ almost everywhere where ${\psi(x)}$ is positive, radially decreasing and integrable. Then we have that

$\displaystyle T^*(f)(x):=\sup_{t>0} (f*\phi_t)(x)\leq \int_{{\mathbb R}^n} \psi(y)dy\ M(f)(x).$

In particular ${T^*}$ is of weak type ${(1,1)}$ and strong type ${(p,p)}$ for all ${1. We conclude that

$\displaystyle \lim_{t\rightarrow 0} (f*\phi_t)(x)=\int_{{\mathbb R}^n} \phi(y)dy \ f(x),$

for almost every ${x\in{\mathbb R}^n}$.

Remark 4 The qualitative conclusion of the previous corollaries is that maximal averages of ${f}$ with radially decreasing integrable kernels are controlled by the Hardy-Littlewood maximal function. A typical radially decreasing integrable kernel is the Gaussian kernel

$\displaystyle W(x)=e^{-\pi|x|^2}.$

By dilating ${W}$ by ${\sqrt{2\pi t}}$ we get

$\displaystyle W_t(x)=\frac{1}{(2\pi t)^{\frac{n}{2}}}e^{- \frac{|x|^2}{4t}}.$

The function ${e^{-|x|^2/4t}}$ can be viewed as smooth approximation of the indicator function of a ball of radius ${\sim \sqrt t}$ (up to constants). Indeed, for ${|x|<\sqrt t}$ say, we have that ${e^{-|x|^2/4t}\simeq 1}$, while for ${|x|\gtrsim \sqrt{t}}$ the function ${e^{-|x|^2/4t}}$ decays very fast. Thus the kernel ${W_t}$ is not so different from ${\chi_{\sqrt{t}}=t^{-\frac{n}{2}}\chi_{B(0,\sqrt t)}}$.

3.1. Points of density and the Marcinkiewicz Integral

A direct consequence of Lebesgue’s differentiation theorem is that almost every point of a measurable set is completely’ surrounded by other points of the set. To make this precise, let us give a definition.

Definition 8 Let ${E}$ be be a measurable set in ${{\mathbb R}^n}$ and let ${x\in{\mathbb R}^n}$. We say that ${x}$ is a point of density of the set ${E}$, if

$\displaystyle \lim_{r\rightarrow 0}\frac{|E\cap B(x,r)|}{|B(x,r)|}=1.$

Of course the limit in the previous definition might not exist in general or not be equal to ${1}$. Observe however that if the previous limit is equal to ${0}$ then ${x}$ is a point of density of the set ${E^C,}$ the complement of ${E}$. On the other hand, applying Lebesgue’s differentiation theorem to the function ${\chi_E}$ which is obviously locally integrable we get

$\displaystyle \lim_{r\rightarrow 0}\frac{1}{|B(x,r)|}\int_{B(x,r)}\chi_E(y)dy=\lim_{r\rightarrow 0}\frac{|E\cap B(x,r)|}{|B(x,r)|}=\chi_E(x),$

for almost every ${x\in {\mathbb R}^n}$. Thus we immediately get the following

Proposition 9 Let ${E\subset {\mathbb R}^n}$ be a measurable set. Then almost every point of ${E}$ is a point of density of ${E}$. Likewise, almost every point ${x\in E^C}$ is a point of density of ${E^C}$.

Thus a point of density is in a measure theoretic sense completely surrounded by other points of ${E}$. The measure of the set ${E}$ in the ball ${B(x,r)}$ is proportional to the measure of the ball as ${r\rightarrow 0}$ and ${x}$ is a point of density.

Another way to describe this notion is the following. Let ${F}$ be a closed set and define ${\delta(x)=\textnormal{dist}(x,F)}$. Of course ${\delta(x)=0}$ if ${x\in F}$. Now think of ${y}$ in a neighborhood of zero so that the vector ${x+y}$ is in the neighborhood of ${x}$. If ${x\in F}$ then the distance of the point ${x+y}$ from ${F}$ is at most ${|y|}$ since ${x\in F}$ and ${|(x+y)-x|=|y|}$. Thus we have that ${\delta(x+y)\leq |y|}$ whenever ${x\in F}$. That is, when the points ${x+y}$ approaches ${x\in F}$, the distance ${\delta(x+y)}$, that is the distance of ${x+y}$ from ${F}$ approaches zero. In fact the estimate above can be improved.

Proposition 10 Let ${F}$ be a closed set. Then for almost every ${x\in F}$, ${\delta(x+y)=o(|y|)}$ as ${|y|\rightarrow 0}$. This is true in particular if ${x}$ is a point of density of the set ${F}$.

Exercise 5 Prove Proposition 10 above. The ${o(|y|)}$ is interpreted as follows: For every ${\epsilon>0}$ there exists some ${\delta>0}$ such that ${\delta(x+y)\leq \epsilon |y|}$ whenever ${|y|\leq \delta}$.

We will be mostly interested in another instance of this principle that is reflected in the Marcinkiewicz integral. This will also come in handy in our study of oscillatory integrals in the next chapter.

For ${F}$ a closed set as before we define the Marcinkiewicz integral associated to ${F}$, ${I(x)}$, as

$\displaystyle I(x)=\int_{|y|\leq 1 }\frac{\delta(x+y)}{|y|^{n+1}},\quad x\in {\mathbb R}^n .$

Theorem 11 (i) When ${x\in F^C}$ then ${I(x)=+\infty.}$ (ii) For almost every ${x\in F}$ we have that ${I(x)<+\infty.}$

Remark 5 The previous theorem shows that, in average, ${\delta(x+y)}$ is small enough whenever ${x\in F}$ to make the integral converge locally. This can be seen as a variation of Proposition 10 though no direct quantitative connection is claimed.

Part (i) is obvious and is left as an exercise. For (ii) it will be enough to show the following:

Lemma 12 Let ${F}$ be a closed set whose complement ${F^C}$ has finite measure. Then we set

$\displaystyle I_*(x)=\int_{{\mathbb R}^n}\frac{\delta(x+y)}{|y|^{n+1}}.$

Then ${I_*(x)<+\infty}$ for almost every ${x\in F}$. In particular we have

$\displaystyle \int_F I_*(x)\lesssim_n | F^C|.$

Proof: It is enough to show

$\displaystyle \int_F I_*(x)\lesssim | F^C|,$

since then ${I*(x)}$ is finite for almost every ${x\in F}$. To that end we write

$\displaystyle \begin{array}{rcl} \int_F I_*(x)dx &=& \int_F \int_{{\mathbb R}^n}\frac{\delta(x+y)}{|y|^{n+1}}dy \ dx=\int_F \int_{{\mathbb R}^n}\frac{\delta(y)}{(x+y)^{n+1}}dy\ dx \\ \\ &= & \int_F \int_{F^C} \frac{\delta(y)}{|x-y|^{n+1}} dy\ dx = \int_{F^C} \bigg( \int_F \frac{1}{|x-y|^{n+1}} dx\bigg) \delta(y) dy. \end{array}$

Now fix a ${y\in F^C}$. As ${x\in F}$ we obviously have that ${|x-y|\geq \delta(y)}$ thus ${F\subset \{x\in {\mathbb R}^n:|x-y|\geq \delta(y)\}}$. Since all the quantities under the integral signs are positive the previous estimate implies

$\displaystyle \int_{F}\frac{1}{|x-y|^{n+1}}dy\leq \int_{\{x\in {\mathbb R}^n:|x-y|\geq \delta(y)\}}\frac{1}{|x|^{n+1}}\lesssim_n \frac{1}{\delta(y)},$

whenever ${y\in F^C}$. Integrating for ${y\in F}$ we get

$\displaystyle \int_F I_*(x)dx \lesssim_n \int_{F^C} \delta(y)\delta(y)^{-1}dy\leq |F^C|.$

$\Box$

To get the proof of Theorem 11 we now use the previous lemma as follows. Let ${F}$ be a closed set and let ${B_m}$ be a ball of radius ${m}$ centered at ${0}$. Let ${F_m=F\cup B_m ^C}$. Then ${F_m}$ is closed and ${F_m ^C\subset B_m}$ so that ${|F_m ^C|<\infty}$. Thus the previous lemma applies to ${F_m}$ and we get that

$\displaystyle \int_{|y|\leq 1}\frac{\delta_m(x+y)}{|y|^{n+1}}dy<+\infty,$

for almost every ${x\in F_m}$ where we denote by ${\delta_m}$ the distance from the set ${F_m}$. Now observe that for ${x\in F\cap B_{m-2}}$ and ${|y|\leq 1}$ we have that ${\delta_m(x+y)=\delta(x+y)}$; indeed ${\delta_m(x+y)\leq |y|\leq 1}$ and ${|x+y|\leq m-1}$ thus ${\textnormal{dist}{x+y, B_m ^C}\geq 1}$. We conclude that

$\displaystyle \int_{|y|\leq 1}\frac{ \delta(x+y)}{|y|^{n+1}}dy<\infty,$

for almost every ${x\in F\cap B_{m-2}}$. Since every ${x\in {\mathbb R}^n}$ eventually belongs to some ${B_{m-2} }$ for some large ${m}$ we get the conclusion of the theorem.

Exercise 6 (i) Show the following strengthened form of Lemma 12: For ${\psi\geq 0}$ and locally integrable then

$\displaystyle \int_F I_*(x)\psi(x)dx\leq \int_{F^C} (M\psi)(x) dx,$

whenever ${F}$ is closed and ${|F^C|<+\infty}$.

(ii) Use (i) and the maximal theorem to conclude that ${I_*(x)\in L^p(F)}$ for all ${1\leq p <\infty}$.

We now come to a different approach to the maximal function theorem. On the one hand the dyadic’ approach we will follow here already implies the maximal theorem presented in the previous paragraph. It is however interesting in its own right and it will give us the chance to present a dyadic structure on the Euclidean space which will come in handy in many different cases.

Consider the basic cube ${Q_{0,0}=[0,1)^n\subset {\mathbb R}^n}$. A dyadic dilation of this cube is the cube ${Q_{m,0}:=2^m Q_{0,0}}$ where ${m\in {\mathbb Z}}$. Now we also consider integer translations of this cube of the form ${Q_{m,k}:=k+Q_{m,0}}$ for some integer vector ${k\in{\mathbb Z}^n}$. We have the following definition:

Definition 13 A dyadic cube of generation ${m}$ is a cube of the form

$\displaystyle Q_{m,k}=2^m(k+[0,1)^n)=\{2^m(k+x):x\in [0,1)^n\},$

where ${m\in {\mathbb Z}}$ and ${k\in{\mathbb Z}^n}$. The family of disjoint cubes

$\displaystyle \mathcal {Q} _m:=\{Q_{m,k}\}_{k\in{\mathbb Z}}$

defines the ${m}$-th generation of dyadic cubes.

The dyadic cubes have the following basic properties.

(d1) The dyadic cubes in the generation ${m}$ are disjoint and their union is ${{\mathbb R}^n}$. Thus any point ${x\in{\mathbb R}^n}$ belongs to unique dyadic cube in the ${m}$-th generation.

(d2) Two (different) dyadic cubes are either disjoint or one contains the other.

(d3) A dyadic cube in ${\mathcal Q_m}$ consists of exactly ${2^n}$ dyadic cubes of the generation ${\mathcal Q_{m-1}}$. On the other hand, for any dyadic cube ${Q\in \mathcal Q_m}$ and any ${j>m}$ there is a unique dyadic cube in the generation ${\mathcal Q_j}$ that contains ${Q}$.

As a first instance of how things simplify and get sharper in the dyadic world, let us see the analogue of the Vitali covering lemma in the dyadic case.

Lemma 14 (Dyadic Vitali-type covering lemma) Let ${Q_1,\ldots,Q_N}$ be a finite collection of dyadic cubes. There exists a subcollection ${Q_{n_1},\ldots,Q_{n_M}}$ of disjoint dyadic cubes such that

$\displaystyle Q_1\cup\cdots \cup Q_N=Q_{n_1}\cup\cdots\cup Q_{n_M}.$

Proof: Let ${Q_{n_i}}$ be the maximal cubes among ${Q_1,\ldots,Q_N}$, that is, the cubes that are not contained in any other cube of the collection ${Q_1,\ldots,Q_N}$. Then the cubes ${\{Q_{n_j}\}_{j=1} ^M}$ are disjoint (otherwise they wouldn’t be maximal). Also any cube that is not maximal is contained in the union ${Q_{n_1}\cup\cdots\cup Q_{n_M}}$. $\Box$

Given a function ${f\in L^1 _{\textnormal{loc}}({\mathbb R}^n)}$ and ${x\in{\mathbb R}^n}$ we set

$\displaystyle \textbf E_mf(x) =\sum_{Q\in\mathcal Q_m}\bigg( \frac{1}{|Q|}\int_Q f\bigg) \chi_Q(x).$

Observe that given ${x}$ there is a unique cube ${Q_x\in\mathcal Q_m}$ that contains ${x}$ and then the value of ${\textbf E_m f}$ at ${x}$ equals the average of the function ${f}$ over the cube ${Q_x}$. In fact, ${\textbf E_m f}$ is the conditional expectation of ${f}$ with respect to the ${\sigma}$-algebra generated by the family ${\mathcal Q_m}$. Observe that for every generation ${m}$, if ${\Omega}$ is a union of cubes in ${\mathcal Q_m}$ then

$\displaystyle \int_\Omega \textbf E_m f=\int_\Omega f.$

The operator ${\textbf E_k}$ is the discrete dyadic analogue of an approximation to the identity dilated at level ${2^k}$. A difference however is that the averages here are not centered’. Indeed, ${\textbf E_k f(x)}$ is the average of ${f}$ with respect to the cube ${Q}$ whenever ${x\in Q}$ for some ${Q\in \mathcal Q_m}$. However, ${x}$ is not the center’ of the cube ${Q}$.

The dyadic maximal function is defined as

$\displaystyle M_{\Delta} (f)(x)=\sup_{k\in{\mathbb Z}} \textbf {E}_k |f| (x) = \sup _{Q \ni x} \frac{1}{|Q|} \int_Q |f(y)|dy.$

Thus the supremum is taken over all dyadic cubes that contain ${x}$ or, equivalently, over all generations of dyadic cubes. We have the analogue of the maximal theorem:

Theorem 15 (Dyadic Maximal Theorem) (i) The dyadic maximal function is of weak type ${(1,1)}$ with weak type norm at most ${1}$:

$\displaystyle |\{x\in{\mathbb R}^n:M_\Delta f (x)>\lambda\}|\leq \frac{\|f\|_L^1({\mathbb R}^n)}{\lambda},$

for all ${f\in L^1({\mathbb R}^n)}$. (ii) The dyadic maximal function is of strong type ${(p,p)}$, for all ${1; for all ${f\in L^p({\mathbb R}^n)}$ we have

$\displaystyle \|M_\Delta(f)\|_{L^p({\mathbb R}^n)}\lesssim_p \|f\|_{L^p({\mathbb R}^n)},$

where the implied constant depends only on ${p}$.

We conclude using Proposition 1 that

(iii) For every ${f\in L^1 _{\textnormal{loc}}({\mathbb R}^n)}$ we have that

$\displaystyle \lim_{k\rightarrow -\infty} \textbf E_k (f)(x)=f(x)\quad\mbox{for a.e.}\quad x\in{\mathbb R}^n.$

Exercise 7 Give the proof of Theorem 15 above. Observe that the proof is essentially identical to that of Theorem 2 using the dyadic version of the Vitali covering Lemma instead of the non-dyadic one. For (ii) you need to observe that the statement is true for continuous functions (for example) and use Proposition 1.

Exercise 8 (The maximal function with respect to cubes) Let ${M_\square}$ denote the maximal function with respect to cubes, that is,

$\displaystyle M_\square(f)(x)=\sup_{r>0}\frac{1}{r^n}\int_{[-\frac{r}{2},\frac{r}{2}]^n}|f(x-y)|dy=\sup_{r>0}(|f|*\psi_r)(x),$

where ${\psi}$ is the indicator function of the cube ${[-\frac{1}{2},\frac{1}{2}]^n}$. Show that

$\displaystyle M_\square (f)(x)\simeq_n M (f),$

where the implied constants depend only on the dimension ${n}$.

Exercise 9 Show the pointwise estimate

$\displaystyle M_\Delta(f)(x)\lesssim_n M (f)(x),$

where the implied constant depends only on the dimension ${n}$. On the other hand, show that the opposite estimate cannot be true. For example when ${n=1}$ test against the function ${\chi_{[0,1]}}$. Conclude that the dyadic maximal theorem follows from the non-dyadic one (with a different constant though). Hint: Observe that if ${x\in Q}$ and ${Q}$ is a dyadic cube, there exists a ball ${B(x,r)}$ which contains ${Q}$ and ${|B(x,r)|\simeq _n |Q|}$.

Exercise 10 Consider the non-centered maximal function with respect to cubes, or balls

$\displaystyle M'(f)(x)=\sup_{B\ni x}\frac{1}{|B|}\int_B |f(y)|dy,$

where the supremum is taken over all Euclidean balls containing ${x}$. Likewise

$\displaystyle M' _\square (f)(x)=\sup_{Q\ni x}\frac{1}{|Q|}\int_Q f(y)dy,$

where the supremum is taken over all cubes (with sides parallel to the coordinate axes) that contain ${x}$. Show that ${M,M'}$ and ${M_\square '}$ are all pointwise equivalent, that is

$\displaystyle M'(f)(x)\simeq_n M' _\square(f)(x) \simeq_n M(f)(x)\quad x\in{\mathbb R}^n.$

5. The Calderón-Zygmund decomposition

Let ${(X,\mu)}$ be a measure space and ${f:X\rightarrow {\mathbb C}}$ be a measurable function (say) in ${L^p(X,\mu)}$. For a level ${\lambda>0}$ we have many times used the decomposition of ${f}$ at level ${\lambda>0}$:

$\displaystyle f= f \chi_{\{x\in X: |f(x)|\leq \lambda \}}+f\chi_{\{x\in X: |f(x)|>\lambda \}}=:g+b.$

The function ${g=f \chi_{\{x\in X: |f(x)|\leq \lambda \}}}$ is the good’ part of ${f}$; indeed we have that

$\displaystyle \|g\|_{L^p}\leq \|f\|_{L^p}\quad\mbox{and}\quad \|g\|_{L^\infty} \leq \lambda.$

Thus the good part ${g}$ adopts the ${L^p}$-integrability of ${f}$ and furthermore it is bounded. On the other hand the bad’ part ${b}$ satisfies

$\displaystyle \|b\|_{L^p} \leq \|f\|_{L^p}\quad\mbox{and}\quad \mu({\mathrm{supp}}(b))\leq \frac {\|f\|_{L^p} ^p}{\lambda^p}.$

Thus the bad part ${b}$ also inherits the ${L^p}$-integrability of ${f}$ but it also has small’ support.

In a general measure space one cannot do much more than that in terms of decomposing ${f}$ in a good part and a bad part. If however there is also a metric structure in the space which is compatible with the measure, one can do a bit better and also get some control on the local oscillation of the bad part ${b}$. Various forms of this decomposition are usually referred to as Calderón-Zygmund decompositions. We present here the basic example in the dyadic Euclidean setup.

Proposition 16 (Dyadic Calderón Zygmund decomposition) Let ${f\in L^1({\mathbb R}^n)}$ and ${\lambda>0}$. There exists a decomposition of ${f}$ of the form

$\displaystyle f=g+\sum_{Q\in\mathcal B} b_Q,$

where ${\mathcal B}$ is a collection of disjoint dyadic cubes and the sum is taken over all the cubes ${Q\in\mathcal B}$. This decomposition satisfies the following properties:

(i) The good part’ ${g}$ satisfies the bound

$\displaystyle \|g\|_{L^1({\mathbb R}^n)}\leq \|f\|_{L^1({\mathbb R}^n)}\quad\mbox{and}\quad \|g\|_{L^\infty({\mathbb R}^n)}\leq 2^n \lambda.$

(ii) The bad part’ is ${b=\sum_{Q\in\mathcal{B}}b_Q}$; each function ${b_Q}$ is supported on ${Q}$ and

$\displaystyle \begin{array}{rcl} \int_Q b_Q=0,\quad \|b_Q\|_{L^1({\mathbb R}^n)} \leq 2^{n+1} \lambda |Q| ,\quad \mbox{for all}\quad Q\in\mathcal B. \end{array}$

(iii) For each ${Q\in\mathcal B}$ we have

$\displaystyle \lambda\leq \frac{1}{|Q|}\int_Q |f(y)|dy \leq 2^n \lambda.$

Furthermore we have that

$\displaystyle \bigcup_{Q\in\mathcal B} Q=\{x\in{\mathbb R}^n:M_\Delta(f)(x)>\lambda\} \subset\{x\in{\mathbb R}^n:M(f)(x)>\lambda\}.$

In particular, from the dyadic maximal theorem we have

$\displaystyle \sum_{Q\in\mathcal B}|Q|\leq \frac{\|f\|_{L^1({\mathbb R}^n)}}{\lambda}.$

Proof: The proof is very similar to the proof of the dyadic covering lemma. We fix some level ${\lambda>0}$ and let us call a dyadic cube ${Q}$ bad if

$\displaystyle \frac{1}{|Q|}\int_Q |f|>\lambda.$

If a dyadic cube is not bad we call it good. A bad cube will be called maximal if ${Q}$ is bad and also there is no dyadic cube strictly containing ${Q}$ is bad. Let us denote by ${\mathcal B}$ the collection of maximal bad cubes. Since the cubes in the collection ${\mathcal B}$ are dyadic and maximal, they are disjoint. Also, for any bad cube ${Q'}$, let ${x\in Q'}$. We have that

$\displaystyle M_\Delta(f)(x)=\sup_{{\stackrel {Q \mbox{ \tiny dyadic } }{ Q\ni x}}} \frac{1}{|Q|}\int_Q f\geq\frac{1}{|Q'|}\int_{Q'}|f|>\lambda.$

Also, Since ${f\in L^1({\mathbb R}^n)}$, every bad cube is contained in some maximal bad cube. Indeed, if ${Q'}$ is bad cube then ${\chi_{2^n Q'}\rightarrow 1}$ as ${n\rightarrow \infty}$ so monotone convergence implies that ${\int_{2^nQ'}|f|\rightarrow \|f\|_{L^1({\mathbb R}^n)}}$. It follows that there is a large enough ${n}$ such that

$\displaystyle \frac{1}{|2^nQ'|}\int_{2^nQ'}|f|>\lambda\quad\mbox{and}\quad\frac{1}{|2^{m}Q'|}\int_{2^{m}Q'}|f|<\lambda,$

for all ${m>n}$. Thus the dyadic cube ${2^BQ'}$ is maximal and bad.

Now let ${Q}$ be a maximal bad cube and consider the parent of ${Q}$, ${Q^*}$, that is the unique dyadic cube with double the side-length that contains ${Q}$. Since ${Q}$ is maximal, ${Q^*}$ has to be good so we have

$\displaystyle \frac{1}{|Q^*|}\int_{Q^*}|f|\leq \lambda$

and thus

$\displaystyle \frac{1}{|Q|}\int_Q |f|\leq 2^n\lambda.$

for all maximal bad cubes ${Q}$. We set

$\displaystyle b_Q=(f-\frac{1}{|Q|}\int_Qf)\chi_Q,$

whenever ${Q\in\mathcal B}$ is a maximal bad cube. We also set

$\displaystyle g=(1-\chi_{\cup_{Q\in\mathcal B} Q})f+\sum_{Q\in\mathcal B} \bigg(\frac{1}{|Q|}\int_Q |f|\bigg)\chi_Q= f-\sum_{Q\in\mathcal B}b_Q.$

It is not hard to verify all the required properties of ${b,g}$ except maybe that ${\|g\|_{L^\infty({\mathbb R}^n)}\leq 2^n\lambda}$. It is easy to see that

$\displaystyle \sup_{x\in Q}|g(x)|=\frac{1}{|Q|}\int_Q |f|\leq 2^n\lambda,$

whenever ${Q\in\mathcal B}$ is a bad cube. If ${x\notin \bigcup_{Q\in\mathcal B}Q}$ and ${x\in Q'}$, then necessarily ${Q'}$ is good. We thus have that

$\displaystyle \frac{1}{|Q'|}\int_{Q'}|g|=\frac{1}{|Q'|}\int_{Q'} |f|<\lambda,$

since ${Q'}$ is good. Now, by the dyadic maximal theorem, we have that ${\frac{1}{|Q'|}\int_{Q'}f(y)dy\rightarrow f(x)}$ as ${|Q'|\rightarrow 0}$ with ${x\in Q'}$. Since ${x\notin \bigcup_{Q\in\mathcal B}Q}$ we conclude that ${|g(x)|=|f(x)|\leq \lambda}$ and we are done in this case as well. $\Box$

Observe that in the previous decomposition of ${f=b+g}$, the bad set’, that is the set where ${b}$ lives, is given in the form

$\displaystyle B=\cup_{Q\in\mathcal B} Q = \{x\in{\mathbb R}^n: M_\Delta(f)(x)>\lambda\}.$

One could prove the Calderón-Zygmund decomposition starting from the set ${ \{x\in{\mathbb R}^n: M_\Delta(f)(x)>\lambda\}}$ and decomposing it as a union of disjoint dyadic cubes. This sort of decomposition is interesting in its own right. Let us see how this can be done.

Proposition 17 (Dyadic Whitney decomposition) Let ${\Omega \subset {\mathbb R}^n}$ be an open set which is not all of ${{\mathbb R}^n}$. Then there exists a decomposition

$\displaystyle \Omega=\bigcup_{Q\in\mathcal D} Q,$

where ${\mathcal D}$ is a collection of disjoint dyadic cubes. For each ${Q\in\mathcal D}$ we have

$\displaystyle \textnormal{dist}(Q,{\mathbb R}^n\setminus \Omega)\simeq \textnormal{diam}(Q).$

Proof: Let ${\mathcal D}$ denote the dyadic cubes inside ${\Omega}$ such that

$\displaystyle \textnormal{diam}(Q)\leq \textnormal{dist} (Q,{\mathbb R}^n\setminus \Omega)\leq 5\ \textnormal{diam}(Q). \ \ \ \ \ (1)$

Obviously ${\cup_{Q\in\mathcal D} Q \subset \Omega}$ but the opposite inclusion is also true. Indeed, if ${x\in\Omega}$ note that ${x}$ is contained in some dyadic cube ${Q\subset \Omega}$ since ${\Omega}$ is open. Now for ${Q}$ a dyadic cube let ${Q'}$ be its parent’, that is the unique dyadic cube of side twice the side-length of ${Q}$, containing ${Q}$. Considering successive parents of ${Q}$ there will be a dyadic cube ${Q''}$ containing ${x}$ with diameter greater than ${\textnormal{dist}(x,{\mathbb R}^n\setminus \Omega)/4}$ and less than ${\textnormal{dist}(x,{\mathbb R}^n\setminus \Omega)/2}$. Thus ${Q''\subset \Omega}$ and ${\textnormal{diag}(Q'')\simeq \textnormal{dist}(Q'',{\mathbb R}^n\setminus \Omega)}$. The collection of dyadic cubes ${\mathcal D}$ is not necessarily disjoint so we only choose the cubes in ${\mathcal D}$ which are maximal with respect to set inclusion and call this collection again ${\mathcal D}$. Now maximal and dyadic means disjoint so we are done. $\Box$

Using the Whitney decomposition lemma one can give an alternative proof of the Calderón-Zygmund decomposition by taking

$\displaystyle \Omega=\{ x\in{\mathbb R}^n:M_{\Delta}(f)(x)>\lambda \},$

and noting that the latter set is open.

As a corollary we get a control of the level sets of the usual (non-dyadic) maximal function by the level sets of the dyadic maximal function.

Corollary 18 For all ${\lambda>0}$ we have that

$\displaystyle |\{x\in{\mathbb R}^n: M_\square(f)(x)>4^n \lambda \}|\leq 2^n |\{x\in{\mathbb R}^n: M_\Delta(f)(x)>\lambda \}|.$

Proof: Let ${\mathcal B}$ be the collection of dyadic cubes obtained by the Calderón-Zygmund decomposition at level ${\lambda>0}$. We have that

$\displaystyle \bigcup_{Q\in\mathcal B}Q= \{x\in{\mathbb R}^n:M_\Delta(f)(x)>\lambda\}.$

We write ${Q^*}$ for the cube with the same center as ${Q}$ and twice its side-length.

$\displaystyle \{x\in{\mathbb R}^n:M_\square(f)(x)>4^n\lambda\} \subset \bigcup_{Q\in\mathcal B} Q^*. \ \ \ \ \ (2)$

Indeed, let ${x\notin \bigcup_{Q\in\mathcal B} Q^*}$ and ${R}$ be any cube centered at ${x}$. Denoting by ${r}$ the side-length of ${R}$, we choose ${k\in{\mathbb Z}}$ so that ${2^{k-1}\leq r <2^k}$. Then ${R}$ intersects ${m\leq 2^n}$ cubes in the ${k}$-th generation ${\mathcal Q_k}$, and let us call them ${R_1,\ldots, R_m}$. Observe that none of these cubes can be contained in any of the ${Q\in\mathcal B}$ because otherwise we would have that ${x\in\bigcup _{Q\in \mathcal B} Q^*}$. Thus the average of ${f}$ on each ${R_j}$ is at most ${\lambda}$ so

$\displaystyle \frac{1}{|R|}\int_R |f|\leq \frac{1}{|R|}\sum_{j=1} ^m \int_{R_j\cap R}|f|\leq \sum_{j=1} ^m\frac{2^{kn}}{|R|}\frac{1}{|R_j|}\int_{R_j}|f|\leq \lambda m 2^n\leq 4^n\lambda.$

This proves the claim (2) and thus the corollary. $\Box$

Exercise 11 Using the dyadic maximal theorem only, conclude that the operators ${M_\square, M}$ are of weak type ${(1,1)}$.

5.1. The Fefferman-Stein weighted inequality.

We give a first application of the Calderón-Zygmund decomposition which in some sense is the prototype of a weighted norm inequality. It is a variation of the maximal theorem where the Lebesgue measure is replaced by a measure of the form ${w(x)dx}$ for some non-negative measurable function ${w}$. It then turns out that the maximal function maps ${L^p({\mathbb R}^n,Mw(x)dx)}$ to ${L^p({\mathbb R}^n,w(x)dx)}$ boundedly for all ${1 and that it also satisfies a weak endpoint analogue for ${p=1}$. In particular we have

Theorem 19 (Fefferman-Stein inequality) Let ${w}$ be a non-negative locally integrable function (a weight’).

(i) We have that

$\displaystyle \int_{{\mathbb R}^n}[Mf(x)] ^p w(x)dx \lesssim_{p,n} \int_{{\mathbb R}^n} |f(x)|^p Mw(x)dx,$

for all ${f\in L^p({\mathbb R}^n,Mw(x)dx)}$ with ${1.

(ii) In the endpoint case ${p=1}$ we get the weak analogue

$\displaystyle \int_{\{x\in{\mathbb R}^n:M(f)(x)>\lambda\}} w(x)dx\lesssim_n \int_{{\mathbb R}^n}|f(x)|Mw(x)dx,$

for all ${f\in L^1({\mathbb R}^n,w(x)dx)}$.

Proof: We will show that ${\|M(f)\|_{L^\infty(w)}\leq \|f\|_{L^\infty(Mw)}}$ and that the weak ${(1,1)}$ inequality in (ii) holds. Then the Marcinkiewicz interpolation theorem will give (i) as well.

The bound

$\displaystyle \|M(f)\|_ { L^\infty (w) }\leq \|f\|_{L^\infty (Mw)},$

is trivial and is left as an exercise. We turn our attention to the ${(1,1)}$-bound. Let ${\mathcal B}$ be the collection of the dyadic cubes obtained from the Calderón-Zygmund decomposition at level ${\lambda>0}$. By the proof of Lemma 18 we have that

$\displaystyle \{x\in{\mathbb R}^n: M_\square(f)>4^n\lambda\}\subset \bigcup _{Q\in\mathcal B} Q^*,$

where ${Q^*}$ is the cube with the same center as ${Q}$ and twice its side-length. We have

$\displaystyle \begin{array}{rcl} \int_{\{x\in{\mathbb R}^n:M_\square(f)(x)>4^n\lambda\}}w(x)dx&\leq &\sum_{Q\in\mathcal B} \int_{Q^*} w(x)dx = \sum_{Q\in\mathcal B}2^n|Q|\frac{1}{|Q^*|}\int_{Q^*} w(x)dx. \end{array}$

Again, from the Calderón-Zygmund decomposition (at level ${\lambda}$) we have that

$\displaystyle |Q|<\frac{1}{\lambda}\int_Q |f(y)|dy=\frac{1}{\lambda}\int_{{\mathbb R}^n}|f(y)|\chi_Q(y)dy,$

for all ${Q\in\mathcal B}$ of the decomposition. Combining the last two estimates we can write

$\displaystyle \begin{array}{rcl} \int_{\{x\in{\mathbb R}^n:M_\square(f)(x)>4^n\lambda\}}w(x)dx&\leq &\frac{2^n}{\lambda}\sum_{Q\in\mathcal B} \int_{{\mathbb R}^n}|f(y)|\bigg(\frac{1}{|Q^*|}\int_{Q^*} w(x)dx \bigg)\chi_Q(y)dy. \end{array}$

For fixed ${Q\in\mathcal B}$ the term ${|f(y)|\bigg(\frac{1}{|Q^*|}\int_{Q^*} w(x)dx \bigg)\chi_Q(y)}$ is non-zero if and only if ${y\in Q\subset Q^*}$. Thus the previous estimate implies

$\displaystyle \begin{array}{rcl} \int_{\{x\in{\mathbb R}^n:M_\square(f)(x)>4^n\lambda\}}w(x)dx&\leq &\frac{2^n}{\lambda} \int_{{\mathbb R}^n}|f(y)|M_\square '(w)(y)dy, \end{array}$

where ${M_\square '}$ is the non-centered maximal function associated to cubes. See Exercise 8. Since ${M_\square '(f)(x)\lesssim_n M(f)(x)}$ this concludes the proof. $\Box$

Exercise 12 (Heldberg’s inequality and Hardy-Littlewood-Sobolev theorem) Let ${0<\gamma, ${1 and ${\frac{1}{q}=\frac{1}{p}-\frac{n-\gamma}{n}}$.

(i) Show Heldberg’s inequality: If ${f\in L^p({\mathbb R}^n)}$ then

$\displaystyle |(f*|y|^{-\gamma})(x)|\lesssim_{\gamma,n,p} [M(f)(x)]^\frac{p}{q}\|f\|_{L^p({\mathbb R}^n)} ^{1-\frac{p}{q} }.$

(ii) Use the Hardy-Littlewood maximal theorem and (i) to conclude that Hardy-Littlewood-Sobolev theorem: For every ${f\in L^p({\mathbb R}^n)}$ we have that

$\displaystyle \|f*|y|^{-\gamma}\|_{L^q({\mathbb R}^n)}\lesssim_{\gamma,n,p}\|f\|_{L^p({\mathbb R}^n)}.$

Hint: In order to show (i) split the integral

$\displaystyle \begin{array}{rcl} |(f*|y|^{-\gamma})(x)|&=&\bigg|\int_{{\mathbb R}^n}f(x-y)|y|^{-\gamma}dy\bigg|\\ \\ &\leq& \bigg|\int_{|y|

where ${R>0}$ is a parameter to be chosen later on. For ${I_1}$ observe that

$\displaystyle I_1 = f*(|y|^{-\gamma}\chi_{B(0,R)}).$

Observe that ${|y|^{-\gamma}\chi_{B(0,R)}}$ is decreasing, radial, non-negative and integrable (since ${\gamma). Use Proposition 6 and the calculation in its proof to show the bound

$\displaystyle |I_1|\lesssim R^{n-\gamma} M(f)(x).$

For ${I_2}$ use Hölder’s inequality to show

$\displaystyle |I_2|\lesssim R^{-\frac{n}{q}}\|f\|_{L^p({\mathbb R} ^n)}.$

Choose the parameter ${R>0}$ to minimize the sum ${I_1+I_2}$. Part (ii) is a trivial consequence of (i).

[Update 4 Apr 2011: Section 3.1 concerning the Marcinkiewicz integral added; numbering changed.

Update 9th May 2011: Typo in the hint of Exercise 1 corrected.]

I'm a postdoc researcher at the Center for mathematical analysis, geometry and dynamical systems at IST, Lisbon, Portugal.

### 2 Responses to DMat0101, Notes 5: The Hardy-Littlewood maximal function

1. ndthi says:

Hi. I find that your note is very usefull. I only have a question:

How to prove that “any function {\phi} which is positive and radially decreasing can be approximated monotonically from below by a sequence of simple functions of the form {\sum a_j \chi_{B_j}}.

Thank you very much

• it suffices to do it on the real line and Look at the pre images of the superlevel sets of the function.