26 1.Boolean functions and the Fourier expansion More generally,given two functions f,g:(-1,1)"-R,we can compute (f,g)by taking the "dot product"of their coordinates in the orthonormal basis of parities.The resulting formula is called Plancherel's Theorem. Plancherel's Theorem.For any f,g:(-1,1)n-R, f,8》=x-fx8w]=∑fSgS) SsIn] We can verify this formula explicitly as we did in(1.9): f,g)=(∑fS)xs,∑T)xr)=∑fS)gT)Kxs,xr〉=∑fS)gS). TsIn] S.TsIn] SE[n] Now is a good time to remark that for Boolean-valued functions f,g: (-1,1)"-{-1,1),the inner product (f,g)can be interpreted as a kind of "cor- relation"between f and g,measuring how similar they are.Since f(x)g(x)=1 if f(x)=g(x)and f(x)g(x)=-1 if f(x)#g(x),we have: Proposition1.9.Iff,g:{-1,1n→{-1,1, (f,g)=Pr[f(x)=g(x)]-Pr[f(x)#g(x)]=1-2dist(f,g). Here we are using the following definition: Definition 1.10.Given f,g:(-1,1)n-(-1,1),we define their (relative Ham- ming)distance to be dist(f,g)=Pr[f(x)g(x)], the fraction of inputs on which they disagree. With a number of Fourier formulas now in hand we can begin to illustrate a basic theme in the analysis of Boolean functions:interesting combinatorial properties of a Boolean function f can be "read off"from its Fourier coeffi- cients.Let's start by looking at one way to measure the "bias"of f: Definition 1.11.The mean of f (-1,1)"-Ris E[f].When f has mean 0 we say that it is unbiased,or balanced.In the particular case that f:{-1,1)" {-1,1)is Boolean-valued,its mean is E[f]=Pr[f=1]-Pr[f=-1]; thus f is unbiased if and only if it takes value 1 on exactly half of the points of the Hamming cube. Fact 1.12.If f (-1,1)"-R then E[f]=f(). This formula holds simply because E[f]=(f,1)=f()(taking S=o in Proposition 1.8).In particular,a Boolean function is unbiased if and only if its empty-set Fourier coefficient is 0. Next we obtain a formula for the variance of a real-valued Boolean func- tion (thinking of f(x)as a real-valued random variable): Copyright Ryan O'Donnell,2014
26 1. Boolean functions and the Fourier expansion More generally, given two functions f , g : {−1,1} n → ❘, we can compute 〈f , g〉 by taking the “dot product” of their coordinates in the orthonormal basis of parities. The resulting formula is called Plancherel’s Theorem. Plancherel’s Theorem. For any f , g : {−1,1} n → ❘, 〈f , g〉 = E x∼{−1,1} n [f (x)g(x)] = X S⊆[n] fb(S)gb(S). We can verify this formula explicitly as we did in (1.9): 〈f , g〉 = D X S⊆[n] fb(S)χS, X T⊆[n] gb(T)χT E = X S,T⊆[n] fb(S)gb(T)〈χS,χT〉 = X S⊆[n] fb(S)gb(S). Now is a good time to remark that for Boolean-valued functions f , g : {−1,1} n → {−1,1}, the inner product 〈f , g〉 can be interpreted as a kind of “correlation” between f and g, measuring how similar they are. Since f (x)g(x) = 1 if f (x) = g(x) and f (x)g(x) = −1 if f (x) 6= g(x), we have: Proposition 1.9. If f , g : {−1,1} n → {−1,1}, 〈f , g〉 = Pr[f (x) = g(x)]−Pr[f (x) 6= g(x)] = 1−2dist(f , g). Here we are using the following definition: Definition 1.10. Given f , g : {−1,1} n → {−1,1}, we define their (relative Hamming) distance to be dist(f , g) = Pr x [f (x) 6= g(x)], the fraction of inputs on which they disagree. With a number of Fourier formulas now in hand we can begin to illustrate a basic theme in the analysis of Boolean functions: interesting combinatorial properties of a Boolean function f can be “read off ” from its Fourier coeffi- cients. Let’s start by looking at one way to measure the “bias” of f : Definition 1.11. The mean of f : {−1,1} n → ❘ is E[f ]. When f has mean 0 we say that it is unbiased, or balanced. In the particular case that f : {−1,1} n → {−1,1} is Boolean-valued, its mean is E[f ] = Pr[f = 1]−Pr[f = −1]; thus f is unbiased if and only if it takes value 1 on exactly half of the points of the Hamming cube. Fact 1.12. If f : {−1,1} n → ❘ then E[f ] = fb(;). This formula holds simply because E[f ] = 〈f ,1〉 = fb(;) (taking S = ; in Proposition 1.8). In particular, a Boolean function is unbiased if and only if its empty-set Fourier coefficient is 0. Next we obtain a formula for the variance of a real-valued Boolean function (thinking of f (x) as a real-valued random variable): Copyright © Ryan O’Donnell, 2014
1.4.Basic Fourier formulas 27 Proposition 1.13.The variance of f:(-1,1)n-R is Varlf]=(f -E[f],f-EIf ]=EIf2]-EIf2=>f(S)2. S≠0 This Fourier formula follows immediately from Parseval's Theorem and Fact 1.12. Fact1.14.Forf:{-1,1m-{-1,1h, Var[f]=1-ELf]2=4Pr[f(x)=1]Pr[f(x)=-1]e[0,1]. In particular,a Boolean-valued function f has variance 1 if it's unbiased and variance 0 if it's constant.More generally,the variance of a Boolean- valued function is proportional to its "distance from being constant". Proposition 1.15.Let f:{-1,1)n-(-1,1).Then 2e s Varlf]s 4e,where e=min(dist(f,1),dist(f,-1)). The proof of Proposition 1.15 is an exercise.See also Exercise 1.17. By using Plancherel in place of Parseval,we get a generalization of Propo- sition 1.13 for covariance: Proposition 1.16.The covariance of f,g:(-1,1)n-Ris Covlf,g]=(f-EIf],g-EIgl)=EIfg]-EIf]EIg]=>f(S)(S). S≠0 We end this section by discussing the Fourier weight distribution of Boolean functions. Definition 1.17.The (Fourier)weight of f:(-1,1)"-R on set S is defined to be the squared Fourier coefficient,f(S)2 Although we lose some information about the Fourier coefficients when we square them,many Fourier formulas only depend on the weights of f. For example,Proposition 1.13 says that the variance of f equals its Fourier weight on nonempty sets.Studying Fourier weights is particularly pleasant for Boolean-valued functions f:(-1,1)"-(-1,1)since Parseval's Theorem says that they always have total weight 1.In particular,they define a proba- bility distribution on subsets of [n]. Definition 1.18.Given f:(-1,1yn-(-1,1),the spectral sample for f,de- noted Sf,is the probability distribution on subsets of [n]in which the set S has probability f(S)2.We write S~Sf for a draw from this distribution. For example,the spectral sample for the max2 function is the uniform distribution on all four subsets of [2];the spectral sample for Majs is the uniform distribution on the four subsets of [3]with odd cardinality. Copyright Ryan O'Donnell,2014
1.4. Basic Fourier formulas 27 Proposition 1.13. The variance of f : {−1,1} n → ❘ is Var[f ] = 〈f −E[f ], f −E[f ]〉 = E[f 2 ]−E[f ] 2 = X S6=; fb(S) 2 . This Fourier formula follows immediately from Parseval’s Theorem and Fact 1.12. Fact 1.14. For f : {−1,1} n → {−1,1}, Var[f ] = 1−E[f ] 2 = 4Pr[f (x) = 1]Pr[f (x) = −1] ∈ [0,1]. In particular, a Boolean-valued function f has variance 1 if it’s unbiased and variance 0 if it’s constant. More generally, the variance of a Booleanvalued function is proportional to its “distance from being constant”. Proposition 1.15. Let f : {−1,1} n → {−1,1}. Then 2² ≤ Var[f ] ≤ 4², where ² = min{dist(f ,1),dist(f ,−1)}. The proof of Proposition 1.15 is an exercise. See also Exercise 1.17. By using Plancherel in place of Parseval, we get a generalization of Proposition 1.13 for covariance: Proposition 1.16. The covariance of f , g : {−1,1} n → ❘ is Cov[f , g] = 〈f −E[f ], g −E[g]〉 = E[f g]−E[f ]E[g] = X S6=; fb(S)gb(S). We end this section by discussing the Fourier weight distribution of Boolean functions. Definition 1.17. The (Fourier) weight of f : {−1,1} n → ❘ on set S is defined to be the squared Fourier coefficient, fb(S) 2 . Although we lose some information about the Fourier coefficients when we square them, many Fourier formulas only depend on the weights of f . For example, Proposition 1.13 says that the variance of f equals its Fourier weight on nonempty sets. Studying Fourier weights is particularly pleasant for Boolean-valued functions f : {−1,1} n → {−1,1} since Parseval’s Theorem says that they always have total weight 1. In particular, they define a probability distribution on subsets of [n]. Definition 1.18. Given f : {−1,1} n → {−1,1}, the spectral sample for f , denoted Sf , is the probability distribution on subsets of [n] in which the set S has probability fb(S) 2 . We write S ∼ Sf for a draw from this distribution. For example, the spectral sample for the max2 function is the uniform distribution on all four subsets of [2]; the spectral sample for Maj3 is the uniform distribution on the four subsets of [3] with odd cardinality. Copyright © Ryan O’Donnell, 2014
28 1.Boolean functions and the Fourier expansion Given a Boolean function it can be helpful to try to keep a mental picture of its weight distribution on the subsets of [n],partially ordered by inclu- sion.Figure 1.1 is an example for the Majs function,with the white circles indicating weight 0 and the shaded circles indicating weight 1/4. [3 1,2 {1,3 2,3 Figure 1.1.Fourier weight distribution of the Majg function Finally,as suggested by the diagram we often stratify the subsets S[n] according to their cardinality (also called "height"or "level").Equivalently, this is the degree of the associated monomial xS. Definition 1.19.For f:(-1,1)"-R and 0sksn,the (Fourier)weight of f at degree k is wh[f]=>f(S)2. Ss[n] IS|=k If f:{-1,1)-(-1,1)is Boolean-valued,an equivalent definition is wh[f]=Pr [ISI=kl. S-8t By Parseval's Theorem,W[f]=f=where fk=∑fS)xs 1S1=k is called the degree k part of f.We will also sometimes use notation like W>k[f]=∑sIf(S)2 and f=∑SIs F(S)Xs. 1.5.Probability densities and convolution For variety's sake,in this section we write the Hamming cube as F2 rather than (-1,1)7.In developing the Fourier expansion,we have generalized from Boolean-valued Boolean functions f:F2-(-1,1)to real-valued Boolean func- tions f:F2-R.Boolean-valued functions arise more often in combinatorial problems,but there are important classes of real-valued Boolean functions One example is probability densities. Copyright@Ryan O'Donnell,2014
28 1. Boolean functions and the Fourier expansion Given a Boolean function it can be helpful to try to keep a mental picture of its weight distribution on the subsets of [n], partially ordered by inclusion. Figure 1.1 is an example for the Maj3 function, with the white circles indicating weight 0 and the shaded circles indicating weight 1/4. [3] {1,2} {1,3} {2,3} {1} {2} {3} Figure 1.1. Fourier weight distribution of the Maj3 function Finally, as suggested by the diagram we often stratify the subsets S ⊆ [n] according to their cardinality (also called “height” or “level”). Equivalently, this is the degree of the associated monomial x S . Definition 1.19. For f : {−1,1} n → ❘ and 0 ≤ k ≤ n, the (Fourier) weight of f at degree k is Wk [f ] = X S⊆[n] |S|=k fb(S) 2 . If f : {−1,1} n → {−1,1} is Boolean-valued, an equivalent definition is Wk [f ] = Pr S∼Sf [|S| = k]. By Parseval’s Theorem, Wk [f ] = kf =k k 2 2 where f =k = X |S|=k fb(S)χS is called the degree k part of f . We will also sometimes use notation like W>k [f ] = P |S|>k fb(S) 2 and f ≤k = P |S|≤k fb(S)χS. 1.5. Probability densities and convolution For variety’s sake, in this section we write the Hamming cube as ❋n 2 rather than {−1,1} n . In developing the Fourier expansion, we have generalized from Boolean-valued Boolean functions f : ❋n 2 → {−1,1} to real-valued Boolean functions f : ❋n 2 → ❘. Boolean-valued functions arise more often in combinatorial problems, but there are important classes of real-valued Boolean functions. One example is probability densities. Copyright © Ryan O’Donnell, 2014
1.5.Probability densities and convolution 29 Definition 1.20.A (probability)density function on the Hamming cube F2 is any nonnegative function:R=0 satisfying E[p(x]=1. We write y~to denote that y is a random string drawn from the associated probability distribution,defined by =川=p2ye受 Here you should think of (y)as being the relative density of y with respect to the uniform distribution on F2.For example,we have: Fact 1.21.If p is a density function and g:F2-R,then E [g(y)]=(p,g)=E.[p(x)g(x)]. V- x- The simplest example of a probability density is just the constant func- tion 1,which corresponds to the uniform probability distribution on F2.The most common case arises from the uniform distribution over some subset ASF. Definition 1.22.If A sF2 we write 1A F2-(0,1)for the 0-1 indicator function of A;i.e., 1Ax)= 0 ifxA. Assuming A we write A for the density function associated to the uni- form distribution on A;i.e., PA =EAJ1A. We typically write y~A rather than y~A. A simple but useful example is when A is the singleton set A=(0).(Here 0 is denoting the vector (0,0,...,0)EF2.)In this case the function ro takes value 2"on input 0EF2 and is zero elsewhere on F2.In Exercise 1.1 you will verify the Fourier expansion of po): Fact 1.23.Every Fourier coefficient of pio is 1;i.e.,its Fourier expansion is p4oy)=∑Xsy). SsIn] We now introduce an operation on functions that interacts particularly nicely with density functions,namely,convolution. Copyright@Ryan O'Donnell,2014
1.5. Probability densities and convolution 29 Definition 1.20. A (probability) density function on the Hamming cube ❋n 2 is any nonnegative function ϕ : ❋n 2 → ❘≥0 satisfying E x∼❋n 2 [ϕ(x)] = 1. We write y ∼ ϕ to denote that y is a random string drawn from the associated probability distribution, defined by Pr y∼ϕ [y = y] = ϕ(y) 1 2 n ∀y ∈ ❋ n 2 . Here you should think of ϕ(y) as being the relative density of y with respect to the uniform distribution on ❋n 2 . For example, we have: Fact 1.21. If ϕ is a density function and g : ❋n 2 → ❘, then E y∼ϕ [g(y)] = 〈ϕ, g〉 = E x∼❋n 2 [ϕ(x)g(x)]. The simplest example of a probability density is just the constant function 1, which corresponds to the uniform probability distribution on ❋n 2 . The most common case arises from the uniform distribution over some subset A ⊆ ❋n 2 . Definition 1.22. If A ⊆ ❋n 2 we write 1A : ❋n 2 → {0,1} for the 0-1 indicator function of A; i.e., 1A(x) = ( 1 if x ∈ A, 0 if x ∉ A. Assuming A 6= ; we write ϕA for the density function associated to the uniform distribution on A; i.e., ϕA = 1 E[1A] 1A. We typically write y ∼ A rather than y ∼ ϕA. A simple but useful example is when A is the singleton set A = {0}. (Here 0 is denoting the vector (0,0,...,0) ∈ ❋n 2 .) In this case the function ϕ{0} takes value 2n on input 0 ∈ ❋n 2 and is zero elsewhere on ❋n 2 . In Exercise 1.1 you will verify the Fourier expansion of ϕ{0} : Fact 1.23. Every Fourier coefficient of ϕ{0} is 1; i.e., its Fourier expansion is ϕ{0} (y) = X S⊆[n] χS(y). We now introduce an operation on functions that interacts particularly nicely with density functions, namely, convolution. Copyright © Ryan O’Donnell, 2014
30 1.Boolean functions and the Fourier expansion Definition 1.24.Let f,g:F2-R.Their convolution is the function f*g: F2-R defined by *g=,fwgx-1=,x-g0w1 y~ Since subtraction is equivalent to addition in F2 we may also write (f *g)(x)=E [f(y)g(x+y)]=E[f(x+y)g(y)l. y If we were representing the Hamming cube by (-1,1)"rather than F2 we would replace x+y with xoy,where o denotes entry-wise multiplication. Exercise 1.25 asks you to verify that convolution is associative and com- mutative: f*(g*h)=(f *g)*h,f*g=g*f. Using Fact 1.21 we can deduce the following two simple results: Proposition 1.25.If p is a density function on F2 and g:F3-R then p*g)=gx-v川=EBg+1. In particular;Ey-olg(y)]=p*g(0). Proposition 1.26.Ifg=y is itself a probability density function then so is p*w;it represents the distribution on xEF2 given by choosing y~p and z~w independently and setting x=y+z. The most important theorem about convolution is that it corresponds to multiplication of Fourier coefficients: Theorem 1.27.Let f,g:F2-R.Then for all S[nl, f *g(S)=f(S)g(S). Proof.We have f*g(S)=E[(f *g)(x)xs(x)] (the Fourier formula) x-2 =品l,rwgx-rs6 (by definition) [f(y)g(z)xs(y+z)] (as x-y is uniform on F2 Vx) independently =,5grwrsogaxsa (by identity (1.5)) =f(S)g(S) (Fourier formula,independence), as claimed. ▣ Copyright Ryan O'Donnell,2014
30 1. Boolean functions and the Fourier expansion Definition 1.24. Let f , g : ❋n 2 → ❘. Their convolution is the function f ∗ g : ❋n 2 → ❘ defined by (f ∗ g)(x) = E y∼❋n 2 [f (y)g(x− y)] = E y∼❋n 2 [f (x− y)g(y)]. Since subtraction is equivalent to addition in ❋n 2 we may also write (f ∗ g)(x) = E y∼❋n 2 [f (y)g(x+ y)] = E y∼❋n 2 [f (x+ y)g(y)]. If we were representing the Hamming cube by {−1,1} n rather than ❋n 2 we would replace x+ y with x ◦ y, where ◦ denotes entry-wise multiplication. Exercise 1.25 asks you to verify that convolution is associative and commutative: f ∗(g∗ h) = (f ∗ g)∗ h, f ∗ g = g∗ f . Using Fact 1.21 we can deduce the following two simple results: Proposition 1.25. If ϕ is a density function on ❋n 2 and g : ❋n 2 → ❘ then ϕ∗ g(x) = E y∼ϕ [g(x− y)] = E y∼ϕ [g(x+ y)]. In particular, Ey∼ϕ[g(y)] = ϕ∗ g(0). Proposition 1.26. If g = ψ is itself a probability density function then so is ϕ ∗ ψ; it represents the distribution on x ∈ ❋n 2 given by choosing y ∼ ϕ and z ∼ ψ independently and setting x = y+ z. The most important theorem about convolution is that it corresponds to multiplication of Fourier coefficients: Theorem 1.27. Let f , g : ❋n 2 → ❘. Then for all S ⊆ [n], f∗ g(S) = fb(S)gb(S). Proof. We have f∗ g(S) = E x∼❋n 2 [(f ∗ g)(x)χS(x)] (the Fourier formula) = E x∼❋n 2 · E y∼❋n 2 [f (y)g(x− y)]χS(x) ¸ (by definition) = E y,z∼❋n 2 independently [f (y)g(z)χS(y+ z)] (as x− y is uniform on ❋n 2 ∀x) = E y,z∼❋n 2 [f (y)χS(y)g(z)χS(z)] (by identity (1.5)) = fb(S)gb(S) (Fourier formula, independence), as claimed. Copyright © Ryan O’Donnell, 2014.