By Cauchy-Schwarz inequality(EYZ s E(Y2) /E(Z2) /2), we have cou h(xy aIn f(a; 8)12 [aE(h(x) ≤Var(h(x)·Var aIn f(a: 6 06 In light of 3), we have Var(h(x)≥ 8- In f(a: 0) If the estimator is unbiased, E(h(x))=0 and Var(h(x)≥ E alf(r: 8) Examp For a random sample of size n from a normal distribution, the Cramer-Rao vari- ance lower bound for unbiased estimator 8=(u, o) is derived as followin ∫( 0)=f(x,0)=I{(2x)-1/(2 1/x;-μ 2y/(G)n/p-∑x1-p) lnf(x1,x2,…,xni;6 In(2T)-oln aIn L a2(x2-1), 2=-2n2+2n∑(x2-p2, a2InL aIn L (a2)2 aInL
By Cauchy-Schwarz inequality (E|Y Z| ≤ E(Y 2 ) 1/2E(Z 2 ) 1/2 ), we have cov h(x), ∂ ln f(x; θ) ∂θ 2 = ∂E(h(x)) ∂θ 2 ≤ V ar(h(x)) · V ar ∂ ln f(x; θ) ∂θ . (3) In light of (3), we have V ar(h(x)) ≥ h ∂E(h(x)) ∂θ i2 −E h ∂ 2 ln f(x;θ) ∂θ 2 i. If the estimator is unbiased, E(h(x)) = θ and V ar(h(x)) ≥ 1 −E h ∂ 2 ln f(x;θ) ∂θ 2 i. Example: For a random sample of size n from a normal distribution, the Cramer-Rao variance lower bound for unbiased estimator θ = (µ, σ2 ) ′ is derived as following. f(x1, x2, ..., xn; θ) = f(x, θ) = Yn i=1 ( (2π) −1/2 (σ 2 ) −1/2 exp " − 1 2 xi − µ σ 2 #) = (2π) −n/2 (σ 2 ) −n/2 exp " − 1 2σ 2 Xn i=1 (xi − µ) 2 # . ln f(x1, x2, ..., xn; θ) = − n 2 ln(2π) − n 2 ln σ 2 − 1 2σ 2 Xn i=1 (xi − µ) 2 , ∂ ln L ∂µ = 1 σ 2 Xn i=1 (xi − µ), ∂ ln L ∂σ2 = − n 2σ 2 + 1 2σ 4 Xn i=1 (xi − µ) 2 , ∂ 2 ln L ∂µ2 = − n σ 2 , ∂ 2 ln L ∂(σ 2 ) 2 = n 2σ 4 − 1 σ 6 Xn i=1 (xi − µ) 2 , ∂ 2 ln L ∂µ∂σ2 = − 1 σ 4 Xn i=1 (xi − µ). 11
The sample information matrix is In(0 aIn f(a: 0))/aIn f(a; 0) aiN f(a; 0) 勸」 是0 The Cramer-Rao low variance Bound therefore is 2.1.3 Sufficiency Efficiency can be seen as a property indicating that the estimator utilizes"all the information contained in the statistical model. An important concept related to the information of a statistical model is the concept of a sufficient statis- tic introduced by Fisher(1922)as a way to reduce the sampling information oy discarding only the information of no relevance to any inference about 0. In other words, a statistic T( x) is said to be sufficient for 0 if it makes no difference whether we use x or (x) in inference concerning 8. Obviously in such a case we would prefer to work with T(x) instead of x, the former being of lower dimen Definition 11(sufficient statistic) A statistic T(): x-Rm, n>m(n is the size of a sample), is called suffi cient for 0 if the conditional distribution f(aT(a)=T) is independent of 0 i.e. 0 does not appear in f(ar(a)=T)and the domain of f()does not involve 0 Verifying this directly by deriving f(ar(a)=T) and showing that is indepen- dent of 0 can be a very difficult exercise. One indirect way of verifying sufficiency
The sample information matrix is In(θ) = E ∂ ln f(x; θ) ∂θ ∂ ln f(x; θ) ∂θ ′ = E − ∂ 2 ln f(x; θ) ∂θ∂θ ′ = −E " ∂ 2 ln L ∂µ2 ∂ 2 ln L ∂µ∂σ2 ∂ 2 ln L ∂(σ2) 2 ∂ 2 ln L ∂µ∂σ2 # = n σ2 0 0 n 2σ4 . (how?) The Cramer-Rao low variance Bound therefore is I −1 n (µ, σ) = σ 2 n 0 0 2σ 4 n . 2.1.3 Sufficiency Efficiency can be seen as a property indicating that the estimator ”utilizes” all the information contained in the statistical model. An important concept related to the information of a statistical model is the concept of a sufficient statistic introduced by Fisher (1922) as a way to reduce the sampling information by discarding only the information of no relevance to any inference about θ. In other words, a statistic τ (x) is said to be sufficient for θ if it makes no difference whether we use x or τ (x) in inference concerning θ. Obviously in such a case we would prefer to work with τ (x) instead of x, the former being of lower dimensionality. Definition 11 (sufficient statistic): A statistic τ (·) : X → R m, n > m (n is the size of a sample), is called suffi- cient for θ if the conditional distribution f(x|τ (x) = τ ) is independent of θ, i.e. θ does not appear in f(x|τ (x) = τ ) and the domain of f(·) does not involve θ. Verifying this directly by deriving f(x|τ (x) = τ ) and showing that is independent of θ can be a very difficult exercise. One indirect way of verifying sufficiency 12