By convention,the derivatives of g()at boundary points a and b are g(a)=lim g(a+x)-g(a) x→0+ g(6)=lim g(6+x)-g(b) x→0 2 Similarly for the second derivatives g"(a)and g"(b)at the boundary points of the support a,. For convenience,we further impose an additional condition on kernel K(),which will actually be maintained throughout this chapter. Assumption 3.2 [Second Order Kernel with Bounded Support]:K(u)is a positive kernel function with a bounded support on [-1,1 This bounded support assumption is not necessary,but it simplifies the asymptotic analysis and interpretation. 2.1.2 Asymptotic Bias and Boundary Effect Our purpose is to show that g(t)is a consistent estimator for g()for a given point x in the support.Now we decompose (-g()=Eg(x)-g(x+()-Eg(. It follows that the mean squared error of the kernel density estimator g(r)is given by MSE(g(x))=[Eg(x)-g(2)2+E [g(z)-Eg(c)]2 =Bias2[g(z)]+var [()] The first term is the squared bias of the estimator g(r),which is nonstochastic,and the second term is the variance of g(r)at the point r.We shall show that under suitable regularity conditions,both the bias and the variance of g(r)vanish to zero as the sample size T goes to infinity. We first consider the bias.For any given point x in the interior region a+h,b-h] 16
By convention, the derivatives of g() at boundary points a and b are g 0 (a) = lim x!0+ g(a + x) g(a) x ; g 0 (b) = lim x!0 g(b + x) g(b) x : Similarly for the second derivatives g 00(a) and g 00(b) at the boundary points of the support [a; b]. For convenience, we further impose an additional condition on kernel K(); which will actually be maintained throughout this chapter. Assumption 3.2 [Second Order Kernel with Bounded Support]: K(u) is a positive kernel function with a bounded support on [1; 1]: This bounded support assumption is not necessary, but it simpliÖes the asymptotic analysis and interpretation. 2.1.2 Asymptotic Bias and Boundary E§ect Our purpose is to show that g^(x) is a consistent estimator for g(x) for a given point x in the support. Now we decompose g^(x) g(x) = [Eg^(x) g(x)] + [^g(x) Eg^(x)]: It follows that the mean squared error of the kernel density estimator g^(x) is given by MSE(^g(x)) = [Eg^(x) g(x)]2 + E [^g(x) Eg^(x)]2 = Bias2 [^g(x)] + var[^g(x)] : The Örst term is the squared bias of the estimator g^(x), which is nonstochastic, and the second term is the variance of g^(x) at the point x. We shall show that under suitable regularity conditions, both the bias and the variance of g^(x) vanish to zero as the sample size T goes to inÖnity. We Örst consider the bias. For any given point x in the interior region [a + h; b h] 16
of the support [a,b of Xt,we have Eal-g0=7∑Eke-x)-g T t=1 E [Kn (-X)]-g(x)(by identical distribution) k(() g(y)dy-g(x) (b-x)/h J(a-x)/h (d)(y chue of variabie) K(w)ode halde-) g(r)K(u)du-g(z) +he国 uK(u)du +号r四ge+nh Cx((+m)(K(w)du 2cxe+ad的 where the second term re+回--rK@i-0 as h-0 by Lebesgue's dominated convergence theorem,and the boundedness and continuity of g"()and2K(u)du<oo. Therefore,for the point x in the interior region a+h,b-h,the bias of g(x)is proportional to h2.Thus,we must let h0as Too in order to have the bias vanish to zero as T→o. The above result for the bias is obtained under the identical distribution assumption on {Xt.It is irrelevant to whether {Xt}is IID or serially dependent.In other words, it is robust to serial dependence in [X}. Question:What happens to the bias of g(r)if x is outside the interior region [a+h,b- ? 17
of the support [a; b] of Xt ; we have E [^g(x)] g(x) = 1 T X T t=1 EKh(x Xt) g(x) = E [Kh (x Xt)] g(x) (by identical distribution) = Z b a 1 h K x y h g(y)dy g(x) = Z (bx)=h (ax)=h K(u)g(x + hu)du g(x) (by change of variable y x h = u) = Z 1 1 K(u)g(x + hu)du g(x) = g(x) Z 1 1 K(u)du g(x) +hg0 (x) Z 1 1 uK(u)du + 1 2 h 2 Z 1 1 u 2K(u)g 00(x + hu)du = 1 2 h 2CKg 00(x) + 1 2 h 2 Z 1 1 [g 00(x + hu) g 00(x)]u 2K(u)du = 1 2 h 2CKg 00(x) + o(h 2 ) where the second term Z 1 1 [g 00(x + hu) g 00(x)]u 2K(u)du ! 0 as h ! 0 by Lebesgueís dominated convergence theorem, and the boundedness and continuity of g 00() and R 1 1 u 2K(u)du < 1: Therefore, for the point x in the interior region [a + h; b h]; the bias of g^(x) is proportional to h 2 : Thus, we must let h ! 0 as T ! 1 in order to have the bias vanish to zero as T ! 1. The above result for the bias is obtained under the identical distribution assumption on fXtg. It is irrelevant to whether fXtg is IID or serially dependent. In other words, it is robust to serial dependence in fXtg: Question: What happens to the bias of g^(x) if x is outside the interior region [a+h; b h]? 17
We say that x is outside the interior region a+h,b-h if r in [a,a+h or b-h,b. These two regions are called boundary regions of the support.Their sizes are equal to h and so vanish to zero as the sample size T increases. Suppose x=a+λh∈[a,a+h),where入∈[0,l).We shall call x is a point in the left boundary region of the support a,b.Then E[G(x】-g(x)=E[Kh(e-X】-g(x) () g(ydy-g(x) r(b-x)/h K(u)g(x+hu)du-g(x) (a-x)/h K(u)g(x+hu)du-g(x) g()K(u)du-() +h uk(u)g'(z Thu)du = ()K(u)dz-1+0(0). =0(1) if g(c)is bounded away from zero,that is,if g()>e>0 for all r[a,b]for any small but fixed constant e.Note that the (1)term arises sinceK(u)dz=1 for any<1. Thus,if [a,a+h)or (b-h,b],the bias E()]-g()may never vanish to zero even if h0.This is due to the fact that there is no symmetric coverage of observations in the boundary region [a,a +h)or (b-h,b.This phenomenon is called the boundary effect or boundary problem of kernel estimation. There have been several solutions proposed in the smoothed nonparametric literature. These include the following methods. Trimming Observations:Do not use the estimate g(r)when z is in the bound- ary regions.That is,only estimate and use the densities for points in the interior region [a +h,b-h. This approach has a drawback.Namely,valuable information may be lost because g(r)in the boundary regions contain the information on the tail distribution of 18
We say that x is outside the interior region [a + h; b h] if x in [a; a + h] or [b h; b]: These two regions are called boundary regions of the support. Their sizes are equal to h and so vanish to zero as the sample size T increases. Suppose x = a + h 2 [a; a + h); where 2 [0; 1): We shall call x is a point in the left boundary region of the support [a; b]. Then E [^g(x)] g(x) = E [Kh (x Xt)] g(x) = 1 h Z b a K x y h g(y)dy g(x) = Z (bx)=h (ax)=h K(u)g(x + hu)du g(x) = Z 1 K(u)g(x + hu)du g(x) = g(x) Z 1 K(u)du g(x) +h Z 1 uK(u)g 0 (x + hu)du = g(x) Z 1 K(u)dx 1 + O(h): = O(1) if g(x) is bounded away from zero, that is, if g(x) > 0 for all x 2 [a; b] for any small but Öxed constant : Note that the O(1) term arises since R 1 K(u)dx = 1 for any < 1: Thus, if x 2 [a; a + h) or (b h; b]; the bias E[^g(x)] g(x) may never vanish to zero even if h ! 0. This is due to the fact that there is no symmetric coverage of observations in the boundary region [a; a + h) or (b h; b]: This phenomenon is called the boundary e§ect or boundary problem of kernel estimation. There have been several solutions proposed in the smoothed nonparametric literature. These include the following methods. Trimming Observations: Do not use the estimate g^(x) when x is in the boundary regions. That is, only estimate and use the densities for points in the interior region [a + h; b h]: This approach has a drawback. Namely, valuable information may be lost because g^(x) in the boundary regions contain the information on the tail distribution of 18
IX,which is particularly important to financial economists (e.g.,extreme down- side market risk)and welfare economics (e.g.,the low-income population). Using a Boundary Kernel: To modify the kernel K[(r-X)/h]when (and only when)z is the boundary regions such that it becomes location-dependent in the boundary region.For example,Hong and Li (2005)use a simple kernel-based density estimator T )= Kh(z,X:), 1 where h-lK())/∫ehK(md, ifx∈0,h), Kn(c,)≡ h-1K(), ifx∈h,1-, h-k(/K(u)du, ifx∈(1-h,1 and K()is a standard second order kernel.The idea is to modify the kernel function in the boundary regions so that the integral of the kernel function is unity.Then the bias is O(h2)for all a+h,-h in the interior region and is at most O(h)for r [a,a+h)and (b-h,b]in the boundary regions.The advantage of this method is that it is very simple and always gives positive density estimates.The drawback is that the bias at the boundary region can be as slow as O(h),which is slower than O(h2)in the interior region. Using a Jackknife Kernel:For x in the interior region [a+h,b-h,use the standard positive kernel K().For r in the boundary regions [a,a+h)and (b-h,b, use the following jackknife kernel Ke()=)c)-(rle)k(ula) wK(0,ξ) wK(0,5/a) where wK(亿,)=∫etK()du for I-O,l,r=r()anda=a()depend on parameterξ∈[0,l].Vhen x∈[a,a+h),we haveξ=(x-a)/h;when x E(b-h,b,we have=(b-x)/h.In both cases,we set wK(1,)/wK(0,E) awK(1,ξ/a)/wK(0,ξ/a)-wK(1,b)/wK(0,ξ) 19
fXtg; which is particularly important to Önancial economists (e.g., extreme downside market risk) and welfare economics (e.g., the low-income population). Using a Boundary Kernel: To modify the kernel K[(x Xt)=h] when (and only when) x is the boundary regions such that it becomes location-dependent in the boundary region. For example, Hong and Li (2005) use a simple kernel-based density estimator g^(x) = 1 T X T t=1 Kh(x; Xt); where Kh(x; y) 8 >>< >>: h 1K xy h = R 1 (x=h) K(u)du; if x 2 [0; h); h 1K xy h ; if x 2 [h; 1 h]; h 1K xy h = R (1x)=h 1 K(u)du; if x 2 (1 h; 1] and K() is a standard second order kernel. The idea is to modify the kernel function in the boundary regions so that the integral of the kernel function is unity. Then the bias is O(h 2 ) for all x 2 [a + h; b h] in the interior region and is at most O(h) for x 2 [a; a + h) and (b h; b] in the boundary regions. The advantage of this method is that it is very simple and always gives positive density estimates. The drawback is that the bias at the boundary region can be as slow as O(h); which is slower than O(h 2 ) in the interior region. Using a Jackknife Kernel: For x in the interior region [a + h; b h]; use the standard positive kernel K(): For x in the boundary regions [a; a+h) and (bh; b]; use the following jackknife kernel K(u) (1 + r) K(u) !K(0; ) (r=) K(u=) !K(0; =) ; where !K(l; ) R 1 u lK(u)du for l = 0; 1; r r() and () depend on parameter 2 [0; 1]. When x 2 [a; a + h); we have = (x a)=h; when x 2 (b h; b]; we have = (b x)=h: In both cases, we set r !K(1; )=!K(0; ) !K(1; =)=!K(0; =) !K(1; b)=!K(0; ) : 19
As suggested in Rice(1986),we set a=2-ξ.Givenξ∈[0,1,the support of Ke()is[-a,al.Consequently,for anyξ∈[0,1], Ke(u)du Kξ(u)du=1, uKe(u)du s- Kξ(u)du=0, u2Ke(u)du u2Ke(u)du>0, ab K候au- "K(u)du>0. h The bias is O(h2)for all points x [a,bl,including those in the boundary regions. We note that the jackknife kernel formula in Hardle(1990,Section 4.4)is incorrect. 。Data Reflection: The reflection method is to construct the kernel density estimate based on an aug- mented data which combined both the "reflected"data {X and the original data {X with support on [0,1].Suppose r is a boundary point in [0,h)and x>0.Then the reflection method gives an estimator T T 阳-72Ke-X)+2-(-x t=1 t=1 Note that with the support [-1,1]of kernel K(),when z is away from the bound- ary,the second term will be zero.Hence,this method only corrects the density es- timate in the boundary region.See Schuster (1985,Communications in Statistics: Theory and Methods)and Hall and Wehrly (1991,Journal of American Statisti- cal Association).This method has been extended by Chen and Hong (2012)and Hong,Sun and Wang(2018)to estimate time-varying functions (i.e.,deterministic functions of time). Question:What is the general formula for the kernel density estimator when the support of X is [a,]rather than [0,1)? Suppose r is a boundary point in [a,a+h).Then the density estimator becomes -2e-X+2-(化-n T t= 20
As suggested in Rice (1986), we set = 2 . Given 2 [0; 1], the support of K() is [; ]: Consequently, for any 2 [0; 1]; Z K(u)du = Z K(u)du = 1; Z uK(u)du = Z K(u)du = 0; Z b u 2K(u)du = Z b u 2K(u)du > 0; Z b K2 (u)du = Z b K2 (u)du > 0: The bias is O(h 2 ) for all points x 2 [a; b]; including those in the boundary regions. We note that the jackknife kernel formula in H‰rdle (1990, Section 4.4) is incorrect. Data Reáection: The reáection method is to construct the kernel density estimate based on an augmented data which combined both the ìreáectedîdata fXtg T t=1 and the original data fXtg T t=1 with support on [0; 1]: Suppose x is a boundary point in [0; h) and x 0: Then the reáection method gives an estimator g^(x) = 1 T X T t=1 Kh(x Xt) + 1 T X T t=1 Kh[x (Xt)]: Note that with the support [1; 1] of kernel K(); when x is away from the boundary, the second term will be zero. Hence, this method only corrects the density estimate in the boundary region. See Schuster (1985, Communications in Statistics: Theory and Methods) and Hall and Wehrly (1991, Journal of American Statistical Association). This method has been extended by Chen and Hong (2012) and Hong, Sun and Wang (2018) to estimate time-varying functions (i.e., deterministic functions of time). Question: What is the general formula for the kernel density estimator when the support of Xt is [a; b] rather than [0; 1]? Suppose x is a boundary point in [a; a + h):Then the density estimator becomes g^(x) = 1 T X T t=1 Kh(x Xt) + 1 T X T t=1 Kh[x ((Xt a))]: 20