and now(9), with X*=X, follows for a large class of functions f by inte- gration by parts We can gain some additional intuition regarding the zero bias transfor mation by observing its action on non-normal distributions, which, in some sense, moves them closer to normality. Let b be a bernoulli random variable with success probability P E(0, 1), and let ua, b denote the uniform distri bution on the finite interval [a, b]. Centering B to form the mean zero discrete random variable X=B-p having variance o=p(1-P), substitution into he right hand side of (9) yields EIXf(XJ= E(B-P)f(B-p p(1-p)f(1-p)-(1-p)pf(-p) 2[f(1-p)-f(-p f(u) f(U), for U having uniform density over [-p, 1-Pl. Hence, with=d indicating the equality of two random variables in distribution (B-p=dU where U has distribution Z-p, 1-pI This example highlights the general fact that the distribution of X* is always absolutely continuous, regardless of the nature of the distribution of X It is the uniqueness of the fixed point of the zero bias transformation, that is, the fact that X* has the same distribution as X only when X is normal that provides the probabilistic reason behind the Clt. This only if'direction of Steins characterization suggests that a distribution which gets mapped to one nearby is close to being a fixed point of the zero bias transformation, and therefore must be close to the transformation's only fixed point, the normal Hence the normal approximation should apply whenever the distribution of a random variable is close to that of its zero bias transformation Moreover, the zero bias transformation has a special property that imme- diately shows why the distribution of a sum Wn of comparably sized inde- pendent random variables is close to that of Wr: a sum of independent terms can be zero biased by replacing a single summand chosen proportionally to its variance and replacing it with one of comparable size. Thus, by differ ing only in a single summand, the variables Wn and W are close, making
and now (9), with X∗ = X, follows for a large class of functions f by integration by parts. We can gain some additional intuition regarding the zero bias transformation by observing its action on non-normal distributions, which, in some sense, moves them closer to normality. Let B be a Bernoulli random variable with success probability p ∈ (0, 1), and let U[a, b] denote the uniform distribution on the finite interval [a, b]. Centering B to form the mean zero discrete random variable X = B − p having variance σ 2 = p(1 − p), substitution into the right hand side of (9) yields E[Xf(X)] = E[(B − p)f(B − p)] = p(1 − p)f(1 − p) − (1 − p)pf(−p) = σ 2 [f(1 − p) − f(−p)] = σ 2 Z 1−p −p f 0 (u)du = σ 2Ef0 (U), for U having uniform density over [−p, 1 − p]. Hence, with =d indicating the equality of two random variables in distribution, (B − p) ∗ =d U where U has distribution U[−p, 1 − p]. (10) This example highlights the general fact that the distribution of X∗ is always absolutely continuous, regardless of the nature of the distribution of X. It is the uniqueness of the fixed point of the zero bias transformation, that is, the fact that X∗ has the same distribution as X only when X is normal, that provides the probabilistic reason behind the CLT. This ‘only if’ direction of Stein’s characterization suggests that a distribution which gets mapped to one nearby is close to being a fixed point of the zero bias transformation, and therefore must be close to the transformation’s only fixed point, the normal. Hence the normal approximation should apply whenever the distribution of a random variable is close to that of its zero bias transformation. Moreover, the zero bias transformation has a special property that immediately shows why the distribution of a sum Wn of comparably sized independent random variables is close to that of W∗ n : a sum of independent terms can be zero biased by replacing a single summand chosen proportionally to its variance and replacing it with one of comparable size. Thus, by differing only in a single summand, the variables Wn and W∗ n are close, making 6
Wn an approximate fixed point of the zero bias transformation, and there- fore approximately normal. This explanation, when given precisely, becomes a probabilistic proof of the Lindeberg-Feller central limit theorem under a condition equivalent to( 8)which we call the small zero bias condition We first consider more precisely this special property of the zero bias transformation on independent sums. Given Xn satisfying Condition 1. 1, let X*=in: I<i<nf be a collection of random variables so that xin has the Xin zero bias distribution and is independent of Xn. Further, let In be a random index, independent of Xn and Xn, with distribution P(In= i) and write the variable selected by In, that is, the mixture, using indicator functions as 1(0=iX ,n and X* 1(In=1)Xn(12) Then Wr=Wn-Ximn+Xi has the Wn zero bias distribution. For the simple proof of this fact, see 6 From(13) we see that the Clt should hold when the random variables XIm, n and Xin are both small asymptotically, since then the distribution of Wn is close to that of W*, making Wn an approximate fixed point of the zero bias transformation. The following theorem shows that properly formalizing the notion of smallness results in a condition equivalent to Lindeberg's. R call that ay a sequence of random variables Yn converges in probability to y and write Y-Y. if limP(Yn-Y1≥∈)=0 for all e>0. Theorem 1.1 For a collection of random variables Xn, n=1, 2, .. satisfy ing Condition 1. 1, the small zero bias condition and the Lindeberg condition (8) are equivalent. 7
Wn an approximate fixed point of the zero bias transformation, and therefore approximately normal. This explanation, when given precisely, becomes a probabilistic proof of the Lindeberg-Feller central limit theorem under a condition equivalent to (8) which we call the ‘small zero bias condition’. We first consider more precisely this special property of the zero bias transformation on independent sums. Given Xn satisfying Condition 1.1, let X∗ n = {X∗ i,n : 1 ≤ i ≤ n} be a collection of random variables so that X∗ i,n has the Xi,n zero bias distribution and is independent of Xn. Further, let In be a random index, independent of Xn and X∗ n , with distribution P(In = i) = σ 2 i,n, (11) and write the variable selected by In, that is, the mixture, using indicator functions as XIn,n = Xn i=1 1(In = i)Xi,n and X ∗ In,n = Xn i=1 1(In = i)X ∗ i,n. (12) Then W∗ n = Wn − XIn,n + X ∗ In,n (13) has the Wn zero bias distribution. For the simple proof of this fact, see [6]. From (13) we see that the CLT should hold when the random variables XIn,n and X∗ In,n are both small asymptotically, since then the distribution of Wn is close to that of W∗ n , making Wn an approximate fixed point of the zero bias transformation. The following theorem shows that properly formalizing the notion of smallness results in a condition equivalent to Lindeberg’s. Recall that we say a sequence of random variables Yn converges in probability to Y , and write Yn →p Y , if limn→∞ P(|Yn − Y | ≥ ) = 0 for all > 0. Theorem 1.1 For a collection of random variables Xn, n = 1, 2, . . . satisfying Condition 1.1, the small zero bias condition X ∗ In,n →p 0 (14) and the Lindeberg condition (8) are equivalent. 7