In general, we do not assume that the function form of ri(r) is known, except that westill maintain the assumption that ri(r) is a square-integrable function. Because ri(r)is square-integrable, we haveajakri(r)dr =w;(r)wk(r)drj=0 k=08080Zajaxdjiby orthonormality3=0 ,=080Na?8j=0where dik is the Kronecker delta function:dik=1if j=k and o otherwiseThe squares summability implies aj → 0 as j → co, that is, aj becomes less impor-tant as the order j oo.This suggests that a truncated sumPrip(r) =Eab,(r)j=0can be used to approximate ri(r) arbitrarily well if p is sufficiently large. The approxi-mation error,orthebiasbp(r)=ri(c)-T1p(c) ajd;(r)j=p+10asp→8.However, the coefficient Q, is unknown. To obtain a feasible estimator for ri(r), weconsiderthefollowingsequenceoftruncatedregressionmodelsXt =,Φ;(Xt-1)+ept,j=0where p = p(T) -→ oo is the number of series terms that depends on the sample size T.We need p/T → 0 as T → oo, i.e., the number of p is much smaller than the sample sizeT. Note that the regression error Ept is not the same as the true innovation et for eachgiven p. Instead, it contains the true innovation et and the bias bp(Xt-1).6
In general, we do not assume that the function form of r1(x) is known, except that we still maintain the assumption that r1(x) is a square-integrable function. Because r1(x) is square-integrable, we have Z 1 1 r 2 1 (x)dx = X1 j=0 X1 k=0 jk Z 1 1 j (x) k (x)dx = X1 j=0 X1 k=0 jkj;k by orthonormality = X1 j=0 2 j < 1; where j;k is the Kronecker delta function: j;k = 1 if j = k and 0 otherwise. The squares summability implies j ! 0 as j ! 1; that is, j becomes less important as the order j ! 1. This suggests that a truncated sum r1p(x) = X p j=0 j j (x) can be used to approximate r1(x) arbitrarily well if p is su¢ ciently large. The approximation error, or the bias, bp(x) r1(x) r1p(x) = X1 j=p+1 j j (x) ! 0 as p ! 1: However, the coe¢ cient j is unknown. To obtain a feasible estimator for r1(x); we consider the following sequence of truncated regression models Xt = X p j=0 j j (Xt1) + "pt; where p p(T) ! 1 is the number of series terms that depends on the sample size T: We need p=T ! 0 as T ! 1, i.e., the number of p is much smaller than the sample size T. Note that the regression error "pt is not the same as the true innovation "t for each given p: Instead, it contains the true innovation "t and the bias bp(Xt1): 6
The ordinary least squares estimatorβ=()-X>wXt1=2whereI= (1,., W)is a Tx pmatrix,andbt = [vo(Xt-1), i(Xt-1), .., ,(Xt-1)]lis a p × 1 vector. The series-based regression estimator isrip(r)=B,b(a).-0To ensure that fip(r) is asymptotically unbiased, we must let p = p(T) → 00 as T → 00(e.g., p = VT). However, if p is too large, the number of estimated parameters willbe too large, and as a consequence, the sampling variation of β will be large (i.e., theestimatorβis imprecise.)Wemustchoosean appropriatep=P(T)soastobalancethebias and the sampling variation.The truncation order p is called a smoothing parameterbecause it controls the smoothness of the estimated function fip(r).In general, for anygiven sample, a large p will give a smooth estimated curve whereas a small p will give awiggly estimated curve. If p is too large such that the variance of rip(r) is larger thanits squared bias, we call that there exists oversmoothing. In contrast, if p is too sall suchthat the variance of rip(r) is smaller than its squared bias, then we call that there existsundersmoothing. Optimal smoothing is achieved when the variance of rip(r) balances itssquared bias. The series estimator fip(r) is called a global smoothing method, becauseonce p is given, the estimated function fFip(r) is determined over the entire domain ofXtUnder suitable regularity conditions, fip(c) will consistently estimate the unknownfunction ri(r) as the sample size T increases. This is called nonparametric estimationbecause noparametric functional form is imposed on ri(r).The base functions {b;()) can be the Fourier series (i.e., the sin and cosine func-tions), and B-spline functions if Xt has a bounded support. See (e.g.) Andrews (1991,Econometrica)andHongandWhite(1995,Econometrica)forapplications.7
The ordinary least squares estimator ^ = ( 0 )1 0X = X T t=2 t 0 t !1 X T t=2 tXt ; where = ( 0 1 ; :::; 0 T ) 0 is a T p matrix, and t = [ 0 (Xt1); 1 (Xt1); :::; p (Xt1)]0 is a p 1 vector. The series-based regression estimator is r^1p(x) = X p j=0 ^ j j (x): To ensure that r^1p(x) is asymptotically unbiased, we must let p = p(T) ! 1 as T ! 1 (e.g., p = p T): However, if p is too large, the number of estimated parameters will be too large, and as a consequence, the sampling variation of ^ will be large (i.e., the estimator ^ is imprecise.) We must choose an appropriate p = P(T) so as to balance the bias and the sampling variation. The truncation order p is called a smoothing parameter because it controls the smoothness of the estimated function r^1p(x): In general, for any given sample, a large p will give a smooth estimated curve whereas a small p will give a wiggly estimated curve. If p is too large such that the variance of r^1p(x) is larger than its squared bias, we call that there exists oversmoothing. In contrast, if p is too sall such that the variance of r^1p(x) is smaller than its squared bias, then we call that there exists undersmoothing. Optimal smoothing is achieved when the variance of r^1p(x) balances its squared bias. The series estimator r^1p(x) is called a global smoothing method, because once p is given, the estimated function r^1p(x) is determined over the entire domain of Xt : Under suitable regularity conditions, r^1p(x) will consistently estimate the unknown function r1(x) as the sample size T increases. This is called nonparametric estimation because no parametric functional form is imposed on r1(x): The base functions f j ()g can be the Fourier series (i.e., the sin and cosine functions), and B-spline functions if Xt has a bounded support. See (e.g.) Andrews (1991, Econometrica) and Hong and White (1995, Econometrica) for applications. 7
Example 2 [Probability Density Function]: Suppose the PDF g(r) of Xt is asmooth function with unbounded support. We can expandg() =(r) β,H;(r),j=0where the function112d(c) =exp(V2元is the N(0, 1) density function, and [H;(r)) is the sequence of Hermite polynomials,defined asdi(-1)Φ() =-Hj-1(r)() for j> 0,drjwhere Φ() is the N(0, 1) CDF. For example,Ho(r) = 1,H(a) = r,H2(z) = (r2-1)H3(r) =r(r-3)H4() = r4 - 6r2+ 3.See, for example, Magnus, Oberhettinger and Soni (1966, Section 5.6) and Abramowitzand Stegun (1972, Ch.22)Here,the Fourier coefficientg(r)H,;(r)o(r)dr.3.Again,β,→0 as j-→oogiven Ej=oBg<oThe N(O, 1) PDF (r) is the leading term to approximate the unknown density g(r),and the Hermite polynomial series will capture departures from normality (e.g., skewnessand heavy tails)To estimate g(r), we can consider the sequence of truncated probability densitiesgp(r) =C-ld(r) β,H;(r),j=0where the constantH;(r)o(r)dam=8
Example 2 [Probability Density Function]: Suppose the PDF g(x) of Xt is a smooth function with unbounded support. We can expand g(x) = (x) X1 j=0 jHj (x); where the function (x) = 1 p 2 exp( 1 2 x 2 ) is the N(0; 1) density function, and fHj (x)g is the sequence of Hermite polynomials, deÖned as (1)j d j dxj (x) = Hj1(x)(x) for j > 0; where () is the N(0; 1) CDF. For example, H0(x) = 1; H1(x) = x; H2(x) = (x 2 1) H3(x) = x(x 2 3); H4(x) = x 4 6x 2 + 3: See, for example, Magnus, Oberhettinger and Soni (1966, Section 5.6) and Abramowitz and Stegun (1972, Ch.22). Here, the Fourier coe¢ cient j = Z 1 1 g(x)Hj (x)(x)dx: Again, j ! 0 as j ! 1 given P1 j=0 2 j < 1: The N(0; 1) PDF (x) is the leading term to approximate the unknown density g(x), and the Hermite polynomial series will capture departures from normality (e.g., skewness and heavy tails). To estimate g(x); we can consider the sequence of truncated probability densities gp(x) = C 1 p (x) X p j=0 jHj (x); where the constant Cp = X p j=0 j Z Hj (x)(x)dx 8
is a normalization factor to ensure that gp(r) is a PDF for each p. The unknown pa-rameters [βj] can be estimated from the sample [Xt]t-1 via the maximum likelihoodestimation (MLE) method. For example, suppose {Xt) is an IID sample. ThenTβ= argmaxlngp(Xi)t=1To ensure thatgp(r) = Cpld(a) j-oB,H;(c)is asymptotically unbiased, we must let p = p(T) -→ oo as T → oo. However, p mustgrow more slowly than the sample size T grows to infinity so that the sampling variationofβwill not betoo large.For the use of Hermite Polynomial series expansions, see (e.g.) Gallant and Tauchen(1996, Econometric Theory), Ait-Sahalia (2002, Econometrica), and Cui, Hong and Li(2020).Question: What are the advantages of nonparametric smoothing methods?They require few assumptions or restrictions on the data generating process. Inparticular, they do not assume a specific functional form for the function of interest(of course certain smoothness condition such as differentiability is required). They candeliver a consistent estimator for the unknown function, no matter whether it is linear ornonlinear.Thus,nonparametricmethodscan effectivelyreduce potentialsystematicbiases due to model misspecification, which is more likely to be encountered for parametricmodeling.Question: What are the disadvantages of nonparametric methods?. Nonparametric methods require a large data set for reasonable estimation. Fur-thermore, there exists a notorious problem of “curse of dimensionality," when thefunction of interest contains multiple explanatory variables. This will be explainedbelow.·There exists another notorious “boundary effect"problem for nonparametric esti-mation near the boundary regions of the support. This occurs due to asymmetriccoverage of data in the boundary regions.9
is a normalization factor to ensure that gp(x) is a PDF for each p: The unknown parameters fjg can be estimated from the sample fXtg T t=1 via the maximum likelihood estimation (MLE) method. For example, suppose fXtg is an IID sample. Then ^ = arg max X T t=1 ln ^gp(Xt) To ensure that g^p(x) = C^1 p (x) Xp j=0^ jHj (x) is asymptotically unbiased, we must let p = p(T) ! 1 as T ! 1: However, p must grow more slowly than the sample size T grows to inÖnity so that the sampling variation of ^ will not be too large. For the use of Hermite Polynomial series expansions, see (e.g.) Gallant and Tauchen (1996, Econometric Theory), AÔt-Sahalia (2002, Econometrica), and Cui, Hong and Li (2020). Question: What are the advantages of nonparametric smoothing methods? They require few assumptions or restrictions on the data generating process. In particular, they do not assume a speciÖc functional form for the function of interest (of course certain smoothness condition such as di§erentiability is required). They can deliver a consistent estimator for the unknown function, no matter whether it is linear or nonlinear. Thus, nonparametric methods can e§ectively reduce potential systematic biases due to model misspeciÖcation, which is more likely to be encountered for parametric modeling. Question: What are the disadvantages of nonparametric methods? Nonparametric methods require a large data set for reasonable estimation. Furthermore, there exists a notorious problem of ìcurse of dimensionality,îwhen the function of interest contains multiple explanatory variables. This will be explained below. There exists another notorious ìboundary e§ectîproblem for nonparametric estimation near the boundary regions of the support. This occurs due to asymmetric coverage of data in the boundary regions. 9
Coefficients are usually difficult to interpret from an economic point of view. There exists a danger of potential overfitting, in the sense that nonparametricmethod, due to its fexibility, tends to capture non-essential features in a datawhich will not appear in out-of-sample scenarios.The above two motivating examples are the so-called orthogonal series expansionmethods.There are other nonparametric methods,such as splines smoothing,kernelsmoothing, k-near neighbor, and local polynomial smoothing. As mentioned earlier,series expansion methods are examples of so-called global smoothing, because thecoefficientsareestimatedusingallobservations.and theyarethenusedtoevaluatethevalues of the underlying function over all points in the support of Xt. A nonparametricseries model is an increasing sequence of parametric models, as the sample size T grows.In this sense, it is also called a sieve estimator. In contrast, kernel and local polynomialmethods are examples of the so-called local smoothing methods, because estimationonly requires the observations in a neighborhood of the point of interest. Below we willmainly focus onkernel and local polynomial smoothing methods, due to their simplicityand intuitivenature.2 Kernel Density Method2.1Univariate Density EstimationSuppose [Xt] is a strictly stationary time series process with unknown marginal PDFg(r).Question:How to estimate the marginal PDF g(r)of thetime series process[X,]?We first consider a parametric approach. Assume that g(r) is an N(μ,o2) PDFwith unknown μ and 2. Then we know the functional form of g() up to two unknownparameters=(μ,o2):g(r,0) =-8<<8.V2r0 xp [-202(a -10
Coe¢ cients are usually di¢ cult to interpret from an economic point of view. There exists a danger of potential overÖtting, in the sense that nonparametric method, due to its áexibility, tends to capture non-essential features in a data which will not appear in out-of-sample scenarios. The above two motivating examples are the so-called orthogonal series expansion methods. There are other nonparametric methods, such as splines smoothing, kernel smoothing, k-near neighbor, and local polynomial smoothing. As mentioned earlier, series expansion methods are examples of so-called global smoothing, because the coe¢ cients are estimated using all observations, and they are then used to evaluate the values of the underlying function over all points in the support of Xt . A nonparametric series model is an increasing sequence of parametric models, as the sample size T grows. In this sense, it is also called a sieve estimator. In contrast, kernel and local polynomial methods are examples of the so-called local smoothing methods, because estimation only requires the observations in a neighborhood of the point of interest. Below we will mainly focus on kernel and local polynomial smoothing methods, due to their simplicity and intuitive nature. 2 Kernel Density Method 2.1 Univariate Density Estimation Suppose fXtg is a strictly stationary time series process with unknown marginal PDF g(x): Question: How to estimate the marginal PDF g(x) of the time series process fXtg? We Örst consider a parametric approach. Assume that g(x) is an N(; 2 ) PDF with unknown and 2 : Then we know the functional form of g(x) up to two unknown parameters = (; 2 ) 0 : g(x; ) = 1 p 22 exp 1 2 2 (x ) 2 ; 1 < x < 1: 10