**1. Convolutions and approximations to the identity **

We restrict our attention to the Euclidean case . As we have seen the space is a vector space; linear combinations of functions in remain in the space. There is however a `product’ defined between elements of that turns into a Banach algebra. For we define the *convolution* of to be the function

Furthermore, using Fubini’s theorem to change the order of integration we can easily see that

Thus for we have that their convolution is again an element of . Note that the previous estimate is the main difficulty in showing that is a Banach algebra.

More generally, the convolution of , , and , is a well defined element of and we have that

Exercise 1Use the integral version of Minkowski’s inequality to prove estimate (1) above.

Let us summarize some properties of convolution in the following proposition. We take the chance to give two definitions here that we will use throughout these notes.

Definition 1Let be a topological space and be a continuous function. Thesupportof , denoted by , is the set

This is the smallest closed set in outside which .

Observe that we gave the definition of the support of a function for *continuous* functions. This is mostly a technical issue. It is easily understood that, in general, the support of a measurable function can only be defined up to sets of measure zero. The precise definition is as follows.

Definition 2Let be a regular Borel measure on a topological space and be a Borel measurable function. A point is called asupport pointfor if

for every open neighborhood of . The set

is called thesupportof .

Exercise 2Assume that the measure in the previous definition has the additional property that for every open set . Use exercise 1 of notes 1 to prove that for any continuous function the two definitions of , that is Definition 1 and Definition 2, coincide.

Proposition 3Let be such that the convolutions below are well defined.

(i) (commutative)

(ii) (associative)

(iii) (translations) For and we define thetranslation operator

For we have

.

(iv) (support) If then

*Proof:* Statements *(i)*, *(ii)* and *(iii)* are trivial consequences of changes of variables and Fubini’s theorem. For *(iv)* observe that if then for any we have . Thus for all , so .

A very useful property of the convolution of two functions is that it adopts the smoothness of the `nicest’ function. Formally this is because any differentiation operator applied to can be transferred to either or :

Here we use the standard multi-index notation: for and we write as usual . We also write . In practice we need one of the functions to have some regularity and some mild conditions on the second function to do this rigorously. For example we have the following:

Proposition 4Let and suppose that has continuous partial derivatives up to -th order, that is . Suppose also that is bounded for all . Then has continuous derivatives up to -th order, i.e. , and .

*Proof:* Let’s just see the special case and . The proof in the general case is identical. Call . Since , is a finite, absolutely continuous measure. We then need to show that

Fix some sequence . Observe that . By the mean value theorem we have that

Using Lebesgue’s dominated convergence theorem we get

Observe that the hypothesis on the boundedness of the higher order derivatives will be used to show the uniform boundedness of (the analogues of) the functions in the general case.

It is instructive to fix one function to be an indicator function, say where the constant is there just in order to normalize the total -mass of the function to . Usually we consider smooth versions of but let’s just stick to case of the characteristic function for the sake of simplicity. Consider the `reflection’ of give as . Since we have started with an even function this makes no difference so that . Observe that we can write

For some fixed , the translations of by , centers the function at the point . So is (a multiple of) the indicator function of an interval of length , centered at . Integrating against essentially averages the function around the point with `weight’, the function . In this averaging process, our choice of implies that only the values of at a scale around will be important. Thus the convolution of and replaces the value of at a point with the average of the values of at a scale around a point. One can take this process one step further and consider sequences of functions that are more or and more concentrated around the origin, but have the same mass, say . For example the second function in this sequence would be , the third could be , and so on. Taking convolutions of the function with the functions amounts to averaging the function around every point, in smaller and smaller scales around the point. Intuitively one thinks that in the limit, one should recover the function itself, at least in some weak sense. This turns out to be indeed the case. But what’s the gain? We just saw that taking convolutions of an integrable (say) function with a smooth bounded function gives as again a smooth function. Thus the previous process allows us to approximate (in some sense) any reasonable function by a sequence of very smooth functions. This has many technical advantages as one can think of any function as a limit, in the appropriate sense, of smooth approximations. This also gives a heuristic explanation of why the convolution of two functions behaves at least as good as the `nicest’ function in the convolution; averaging is a smoothing process.

We will now make the previous heuristic discussion precise. Let be a function on and . We define the *dilations* of the function to be

Usually we will have a lot of freedom in choosing the function and we will require *at least* that . Observe that dilating the function by doesn’t change the integral:

You should think of the function as a function concentrated around a point as was for example in the previous discussion or, even better, as smooth approximations of it (bump function). Thus for example could be a smooth function with compact support around the origin. Observe that as , the mass of the function , which is constant, becomes more and more concentrated around the origin. We will refer to this construction as `an approximation to the identity’. The reason is that, as was mentioned before, one can recover any reasonable function by convolving with and taking the limit as , at least in the sense. A more rigorous explanation is that converges (in a weak sense) to a dirac mass at .

Theorem 5Let with . For define the dilations of as before, . Then, for any we have that in the as :

*Proof:* For we use the notation

for the translation operator. Using the fact that has integral we can write

By Minkowski’s integral inequality we get that

Now as (see remark below) and so by the dominated convergence theorem we get the result.

Remark 1The translation operator is continuous in for all , that is

for all , .

Observe that for , as means that is uniformly continuous. This shows why the previous theorem breaks down in .

Exercise 3Show that the translation operator is continuous in for . Use the fact that continuous functions with compact support are dense in for . See also section 2.

as for all which are bounded and uniformly continuous.

(ii) If is bounded and continuous on show that

as , uniformly on compact subsets of .

Remark 2There is a slight abuse of notation here. We use for the norm in the space defined in terms of theessential supremumof a function. However, the right norm in spaces of continuous functions should be defined in terms of the actualsupremumof the function. Note however that for a continuous function, the two notions are identical so this should create no confusion.

Exercise 5Let and be its dual exponent. Suppose that and . Show that exists for every and that it is a continuous and decays to zero at infinity. Also show the estimate

Remark 3If is a finite Borel measure on and it makes perfect sense to define the convolution of with to be the function

We then have

where is the total variation of the measure .

**2. Some dense classes of functions **

In this paragraph we will discuss some dense sub-classes of functions inside the space. These will prove to be very useful as many estimates will be easier to establish for these special sub-classes. Also, many times, working with a dense class in , help us avoid several technical difficulties or even define operators that are not obviously defined directly on some space. We will state some of the results here in the generality of a Hausdorff (or locally Hausdorff) space noting that everything goes through for equipped with the Lebesgue measure.

**Simple functions:** Let be the class of all simple functions such that

that is all simple complex valued functions that have support of finite measure. For the space is dense in . The space of all simple functions (not necessarily of finite compact support) is dense in for .

**Continuous functions with compact support:** Let be a measure space, where is a locally Hausdorff space, is a -algebra that contains all compact subsets of and such that

(i) locally finite: for all compact sets .

(ii) is inner regular, meaning

(iii) is outer regular, meaning

We denote by the space of continuous functions with compact support. Then, for every , is dense in .

Remark here that whenever we embed into , automatically inherits the topology induced by the larger space, that is, the one defined by the norm . Since spaces are complete under our hypotheses, this says that is the completion of with respect to the norm of for . For , the completion of with respect to the is not but the space of continuous functions on that *vanish at infinity*.

**Continuous functions that vanish at infinity:** Let be a locally compact Hausdorff space (a Hausdorff space where every point has a compact neighborhood). A function is said to *vanish at infinity* if for every there exists a compact set such that for all . We denote by the space of all complex valued continuous functions on that vanish at infinity.

It is clear that , and actually the two spaces coincide whenever is compact. We can equip the space with the norm

Theorem 6If is a locally compact Hausdorff space, then is the completion of with respect to the supremum norm defined above.

For the proofs of the previous classical results see for example [F] or [R].

All the previous results apply to the Euclidean setup . Of course simple functions with support of finite measure are dense in whenever . A bit more can be said as we can choose our simple functions to be linear combinations of (-dimensional) bounded intervals, and these are still dense in . Continuous functions with compact support are also dense in for all . We can also restrict to a smaller class of more regular functions:

**Infinitely differentiable functions with compact support:** Let us consider the space of functions which are infinitely differentiable and have compact support. We denote this space by . First of all it is not totally trivial that this space is non-empty.

Lemma 7There exists a function . From this we easily conclude that there is a .

Exercise 6Consider the function

(i) Show that , together with its derivatives of any order, is infinitely differentiable and bounded.

(ii) Consider the function . Show that if and otherwise. It is obvious then that .

(iii) For consider the function belongs to . (iv) For consider the function

Show that .

Obviously . However, it is not hard to see the space is still dense in for . It will however be easier to show that once we’ve introduced some more tools from real analysis and, in particular, convolution.

**Schwartz functions:** Here we introduce the space of *Schwartz* functions , which will turn out to be extremely useful in what follows. So let be the space of all infinitely differentiable () functions such that

for all multi-indices , of nonnegative integers. In other words, Schwartz functions are smooth functions that, together with their partial derivatives of every order, decay faster than any polynomial power at infinity. Of course every function in the class is trivially a Schwartz function since it vanishes identically at infinity together with its derivatives of every order. A more interesting example of a Schwartz function is the Gaussian function :

The space is also dense in all spaces for . Of course this is immediate once one shows that is dense in .

Schematically we have the following inclusions

and each space in this chain is dense in with the topology induced by for . We will discuss the space of Schwartz functions in much more detail in what follows. For now you can think of it as another nice class of functions that is dense in all the spaces for .

In the following proposition we use convolutions to show the previous denseness properties:

Proposition 8The space , and thus also the space , is dense in for all . Also the space is dense in in the supremum norm.

*Proof:* Let and . Since the space is dense in , there is a such that

Let with . By 5 we have that there is in as . Thus for small enough we have that

We conclude that

It remains to verify that is in for every . Note however that is smooth by Proposition 4. Also, since both and have compact support, Proposition 3 shows that also has compact support and we are done. Observe that the same argument applies if we start with a . Using the fact is dense in it suffices to approximate a function . However, functions in are obviously bounded, so Exercise 4 completes the proof in this case as well.

Let us go back to approximations of the identity and justify their name.

Exercise 7 (convergence of approximations to the identity in the sense of distributions)For we denote by the Dirac measure at the point :

Let with and consider the approximation to the identity , . Show that

for every . We say that (considered as a sequence of finite measures)converges in the sense of distributionsto the measure . We will come back to that point later on in the course.

**3. Operators on spaces; boundedness and interpolation **

Having set up our main environment, the spaces , we come to the core of this introduction: operators acting on these spaces and their properties. In general, we will consider operators taking functions on some measure space to function on some other measure space . Many times our operators will be initially defined on `nice functions’ such as smooth functions with compact support of Schwartz functions. The goal would then be to extend the operator to a standard normed vector space such as .

Suppose that and are two normed vector spaces (usually Banach spaces of functions) and be a linear operator, that is, we have

for all and complex numbers . We will say that is bounded if there is a constant such that for every . The norm of the operator , denoted by or just , is the smallest constant so that such an inequality is true. We thus have

Continuity is equivalent to boundedness for linear operators:

Lemma 9Let be a linear operator. The following are equivalent: (i) The operator is continuous.

(ii) The operator is continuous at .

(iii) The operator is bounded.

Suppose that we want to show that a linear operator is a well defined bounded linear operator, where are Banach spaces. Many times however we can only define the operator on some dense subset . Suppose we have then that . When can we extend to the whole class ? Given , the obvious thing to do is to consider some sequence such that . We then need to examine whether the limit exists. Suppose that is bounded on the dense sub-class, that is if,

for all . Using the boundedness of on the dense class and linearity (essential) we can conclude that

so the sequence is a Cauchy sequence. The completeness of then implies that the limit of does indeed exist, so we can define

Observe also that for any other sequence we must have

for any . From this we conclude that thus the extension is unique. Many times we will only define the operator on the dense class and show its continuity on the dense sub-class. We will then say that is *densely defined*.

We will use this device many times in trying to show that some linear operator is well defined and bounded, by examining the continuity of on one of the dense classes that we have considered before (depending on what is more convenient).

A more general class of operators we will come across quite often is that of sublinear operators. Suppose that is an operator acting on a vector space of measurable functions. Then is called *sublinear* if for all complex constants and

for all in the vector space. Of course all linear operators are sublinear. However, the most typical example of a sublinear operators we will come across is a maximal type operator. Such an operator has the form

where is a family of linear operators acting on some vector space of measurable functions, is an infinite countable or uncountable index set, and the function is a measurable function of . Such operators are called *maximal operators* and the linearity of each guarantees that is sublinear.

Definition 10(i) Let and be a sublinear operator on . We will say that is ofstrong typeif

for all , where the implied constant depends only on and . In this case we write for the norm of the operator .

(ii) We will say that is ofweak typeif

for all . We will write for the norm of the operator .

Observe that for fixed , the strong type property of trivially implies that is of weak type . The opposite, of course, is not true. However, we will see that in many cases the strong type bound can be deduced by interpolating between suitable endpoint weak type bounds. The first such result is the Marcinkiewicz interpolation theorem.

Theorem 11 (Marcinkiewicz interpolation theorem)Let and be measure spaces, , and let be a sublinear operator defined on and taking values in the space of measurable functions on . Suppose that is of weak type and of weak type . Then is of strong type for any .

Remark 4Before going into the proof of this theorem let us discuss a bit its hypothesis. Given a function we first need to show that is well defined. Having the information that is well defined on we essentially need to see that whenever . To see this, fix a positive constant , to be defined later, and consider the functions

Obviously we have . Moreover,

Similarly we can estimate

This shows that we can decompose any function to a sum of two functions and , whenever , thus . In particular, is well defined for any .

*Proof:* We first prove the theorem when . Since our hypothesis involves the distribution sets of of it is convenient to recall the representation of the norm of a function in terms of its distribution set. Indeed, from Proposition 9 of notes 1 we have

The measure of the set will appear many times in the proof so it is convenient to give it a shorter notation:

Fix for a moment and consider the decomposition of the function at level as in the remark before:

The sublinearity of allows us to write

for any . Thus,

so that

Since and is of weak type we can estimate the first summand as

Similarly, since and is of weak type we have

where are two numerical constants depending only on respectively and on and . For simplicity we suppress the dependence of the constants on these parameters. Combining the previous estimates we can write

Unravelling the definitions of the previous estimate yields

In order to recover the norm of observe by (2) that it’s enough to multiply by and integrate in .

Multiplying the first summand on the right hand side of (4) by and integrating we get

Similarly, multiplying the second summand in (4) by and integrating we have

Summing up the previous two estimates we conclude that

which shows that is of strong type with

Observe that there is no claim here that this quantitative estimate on the norm of is optimal in general.

The proof in the case is very similar. Now the hypothesis that is of weak type is replaced by the hypothesis that maps to . That is, there exists some constant , depending only on and , such that

for all . We fix some level and we split the function as where . Obviously so by the hypothesis we have that . Arguing as in the case we can write

Since , the second summand in the previous estimate vanishes identically. We conclude that

This concludes the proof in the case as well as providing the quantitative estimate .

Exercise 8Modify the proof above to show that under they hypotheses of the Marcinkiewicz interpolation theorem we can conclude that

where for some .

Hint:This is already the constant appearing in the case . For the case split the function at the level (instead of ), for some , and optimize in the parameter at the end of the proof. For this, use the heuristic that a sum is optimized when the terms in the sum are roughly equal in size.

Exercise 9Let and suppose that . Show that for all .Hint:The proof is very similar to the proof of the Marcinkiewicz interpolation theorem, only simpler. Use again the fact that

and split the range of as , at an appropriate level . Use the weak integrability conditions for in the appropriate intervals of .

Exercise 10Let be a finite set equipped with counting measure and let be a function. Show that for any we have that

Thus on finite sets, the spaces and are equivalent. Here denotes the cardinality of .

Hint:Observe that and use Proposition 9 of notes 1.

Exercise 11 (Dual formulation of )Let . Show that for every , we have

where .

Hint:As in the previous exercise, write

Since the set has finite measure one can estimate further the measure of the level set by

Now split the integral we want to estimate accordingly in order to take advantage of this estimate. See also the hint in the previous exercise. This will give you one direction of the estimate, the other direction being trivial.

While the Marcinkiewicz interpolation theorem is the prototype of real interpolation, complex methods can be used to derive similar conclusions. An example of such a method has already been used via the three lines lemma applied to exhibit the log convexity of the norms (which is also a form of interpolation). We will now describe the prototype of complex interpolation.

The following theorem has some differences compared to the Marcinkiewicz interpolation theorem. First of all we assume that is linear rather than sublinear. Note as well that our hypotheses concern *strong* type bounds for the operator rather than weak endpoint bounds. On the other hand, the conclusion gives a good estimate for the norm of the operator when interpolating between the endpoints and allows more freedom in the choice of the exponents at the endpoints.

Theorem 12 (Riesz-Thorin interpolation theorem)Let and . Let

be a linear operator that is of strong type with norm and of strong type with norm . That is we have that

for all and

for all . Then is of strong type with norm at most :

for all , where and , with .

*Proof:* Let us first consider the case . Then by the log-convexity of the norm we get directly that

as desired. We can therefore focus on the case so that . Without loss of generality we can assume that .

We divide the proof in several steps:

**step 1:** It is enough to prove the theorem for . To see this just observe that we can always replace the measures by , respectively, for appropriate constants . We can choose these constants so that and then we also have . Doing the calculations you will see that we need to define the constants by means of the equations

In what follows we will therefore assume that in the statement of the theorem.

for all simple functions of finite measure support . Here is the dual exponent of .

First of all, since is of strong type , Hölder’s inequality shows that

and, similarly, by the type of we get that

Thus, estimate (5) is true for . It is obvious that we need to interpolate between the two endpoint estimates above. We will do that by means of the three lines convexity lemma. First we define the map

where . Here there is a problem in the case since the dual exponents are equal to . In this case the definition of should be understood as

The function is a holomorphic function of . Furthermore, since are simple functions of finite measure support, it is not hard to see that is actually bounded on the strip . Furthermore, for we see that . Now, on the boundary of the strip we have that

from (6) and similarly

from (7). Using the three lines lemma we get that

The right hand side however is equal to . Applying the result for and we get the claim of step 2. Observe that nothing really changes in the case .

**step 3:** We have that

for all and all simple functions of finite measure support.

To see this let and be a simple function with finite measure support. We write . Observe that and . Now let be sequences of simple functions of finite measure support such that

and

as . We write . By step 2 and (6) and (7) we have that

Letting and observing that as we get the claim of this step as well.

**step 4:** We have that

for all and .

First of all observe that from step 3 we can actually conclude that

for all and simple functions of finite measure support. In order to see this let be any simple function that vanishes outside a set of finite measure and define . Consider a sequence of simple functions such that and . In particular vanishes outside the set . We thus have the estimate . Also observe that since is a function in by our hypothesis. Lebesgue’s dominated convergence theorem now shows that

Now for any and , let be a sequence of simple functions with finite measure support such that pointwise and . Fatou’s lemma now gives

This proves the claim of this step as well. Duality between and now completes the proof of the theorem.

As a first application of the Riesz-Thorin interpolation theorem we will now prove Young’s inequality on convolutions of functions.

Proposition 13 (Young’s inequality)Let . Let be such that . If and , then is a well defined function in and we have the estimate

*Proof:* For and fixed we define the operator

As we have already seen (see Exercise 1) we have the bound , that is, is of strong type . It is also very easy to see that if is the conjugate exponent of then we have

that is and is of strong type . Letting and , the Riesz-Thorin interpolation theorem shows that is of strong type . Replacing and using the hypothesis we get that . Thus we conclude that is of strong type with norm at most . That is we have as we wanted to show.

Exercise 12 (Schur’s test)Let and . Let be a -measurable function such that (i) For almost every we have that

(ii) For almost every we have that

.

We consider the operator

,

for suitable functions . Define and .

Show that is of strong type with norm at most where and are as in the Riesz-Thorin interpolation theorem.

Hint:First consider the sublinear operator

,

which is always well defined (though maybe infinite) and controls . Use Minkowski’s integral inequality andHölder’s inequality to show that , and thus $T$ is of strong type and of strong type . Use the Riesz-Thorin interpolation theorem to conclude the proof.

*[Update 24 Feb 2011: Omission in Exercise 12 corrected and a solution hint added.]
*

*[Update 15 Mar 2011: Typo in Exercise 8 corrected, Exercise 4 edited.]*