Ch. 11 Panel Data model Data sets that combine time series and cross sections are common in econo- metrics. For example, the published statistics of the OECD contain numerous series of economic aggregate observed yearly for many countries. The PSID is a studies of roughly 6000 families and 15000 individuals who has been interviews periodically from 1968 to the present. Panel data sets are more oriented toward cross-section analysis; they are wide but typically short(relatively). Hetero- geneity across units is an integral part of the analysis Recall that the(multiple) linear model is used to study the relationship be- tween a dependent variable and several independent variables. That is ∫(x1,x2,…,xk)+E B1x1+B2x2+….+Bkxk+E xB+ where y is the dependent or explained variable, i, i=l,.,k are the independent or the explanatory variables and Bi, i=1, ..,k are unknown coefficients that we are interested in learning about, either through estimation or through hypothesis testing. The term e is an unobservable random disturbance. In the following, we will see the panel data sets provide a richer source of information and the needin f some complex stochastic specifications. el data set of that it will allow the researcher greater fexibility in model difference in behavior across individuals. The basic framework for this statistical model is of the form t=xtB+z1a+et,i=1,2,…,N;t=1,2,…,T. There are k regressor in xit, not including a constant term. The heterogene- ity, or individual effect is za where zi contains a constant term and a set of individual or group specific variables, which may be obser location. an so on or unobserved as family specific characteristics, individ ual heterogeneity in skill or preference, and so on, all of which are taken to be constant over time t. the various cases we will consider are
Ch. 11 Panel Data Model Data sets that combine time series and cross sections are common in econometrics. For example, the published statistics of the OECD contain numerous series of economic aggregate observed yearly for many countries. The PSID is a studies of roughly 6000 families and 15000 individuals who has been interviews periodically from 1968 to the present. Panel data sets are more oriented toward cross-section analysis; they are wide but typically short (relatively). Heterogeneity across units is an integral part of the analysis. Recall that the (multiple) linear model is used to study the relationship between a dependent variable and several independent variables. That is y = f(x1, x2, ..., xk) + ε = β1x1 + β2x2 + ... + βkxk + ε = x 0β + ε where y is the dependent or explained variable, xi , i = 1, ..., k are the independent or the explanatory variables and βi , i = 1, ..., k are unknown coefficients that we are interested in learning about, either through estimation or through hypothesis testing. The term ε is an unobservable random disturbance. In the following, we will see the panel data sets provide a richer source of information and the needing of some complex stochastic specifications. The fundamental advantage of a panel data set over a cross section is that it will allow the researcher greater flexibility in model difference in behavior across individuals. The basic framework for this statistical model is of the form yit = x 0 itβ + z 0 iα + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. There are k regressor in xit, not including a constant term. The heterogeneity, or individual effect is z 0 iα where z 0 i contains a constant term and a set of individual or group specific variables, which may be observed, such as race, sex, location, an so on or unobserved, such as family specific characteristics, individual heterogeneity in skill or preference, and so on, all of which are taken to be constant over time t. The various cases we will consider are: 1
1. Pooled Regression: If z contains only a constant term, then there is no individual specific characteristics in this model. All we need is pooling the data yit=xtB+a+t,i=1,2,,N;t=1,2,…,!T and OlS provides consistent and efficient estimate of the common B and a 2. Fixed Effects: If za=ai, then it is the fixed effect approach to take ai as a group-specific constant term in the regression model vit=xaB+a1+et,i=1,2,…,N;t=1,2,…,T 3. Random effects: If the unobserved individual heterogeneity can be assumed to be uncorrelated with the included variables, then the model may be formulated yit= xitB+E(zia)+Zia-e(zia)+Eit xtB+a+u1+et,i=1,2,…,N;t=1,2,,T The random effect approach specifies that ui is a group specific random element similar to Eit except that for each group, there is but a single draw that enters the regression identically in each period 1 Fixed effects This formulation of the model assume that differences across units can be cap- tured in difference in the constant term. each a is treated as an unknown parameter to be estimated. Let yi and Xi be the T observations the ith unit, i be atx 1 column of ones and let e be associated tx 1 vector of disturbance Then yi=Xi B+ia; +Ei, i=1, 2, It is also assumed that the disturbance terms are well behaved. that is E(E)=0 E(EE= 0I E(e;)=0fi≠
1. Pooled Regression: If z 0 i contains only a constant term, then there is no individual specific characteristics in this model. All we need is pooling the data, yit = x 0 itβ + α + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. and OLS provides consistent and efficient estimate of the common β and α. 2. Fixed Effects: If z 0 iα = αi , then it is the fixed effect approach to take αi as a group-specific constant term in the regression model. yit = x 0 itβ + αi + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. 3. Random effects: If the unobserved individual heterogeneity can be assumed to be uncorrelated with the included variables, then the model may be formulated as yit = x 0 itβ + E(z 0 iα) + [z 0 iα − E(z 0 iα)] + εit = x 0 itβ + α + ui + εit, i = 1, 2, ..., N; t = 1, 2, ..., T. The random effect approach specifies that ui is a group specific random element, similar to εit except that for each group, there is but a single draw that enters the regression identically in each period. 1 Fixed Effects This formulation of the model assume that differences across units can be captured in difference in the constant term. Each αi is treated as an unknown parameter to be estimated. Let yi and Xi be the T observations the ith unit, i be a T × 1 column of ones, and let εi be associated T × 1 vector of disturbance. Then yi = Xiβ + iαi + εi , i = 1, 2, ..., N. It is also assumed that the disturbance terms are well behaved, that is E(εi) = 0; E(εiε 0 i ) = σ 2 IT; and E(εiε 0 j ) = 0 if i 6= j. 2
Observations on all the cross-section can be rewritten as X 0 0 + y or in more compact form y=XB+Da+E where y and e are NT×1, X iS NT×k,isk×1,andD=dnd2…dN]is NT XN with di is a dummy variables indicating the ith unit. This model is usually referred to as the least squares dummy variable(LSDv) model Since this model satisfy the ideal conditions, ols estimator is BLue using the familiar partitioned regression of Ch. 6, the slope estimator would B=(XMDXXMpy where MD=INT -D(DD)D Lemma M00 M D(D'D-D 0 Mo where Mo=IT-1/T(ii)is the demean-matrix
Observations on all the cross-section can be rewritten as y1 y2 . . . . yN = X1 X2 . . . . XN β + i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i α1 α2 . . . . αN + ε1 ε2 . . . . εN , or in more compact form y = Xβ + Dα + ε, where y and ε are NT × 1, X is NT × k, β is k × 1, and D = [d1 d2 ...dN] is NT × N with di is a dummy variables indicating the ith unit. This model is usually referred to as the least squares dummy variable (LSDV) model. Since this model satisfy the ideal conditions, OLS estimator is BLUE. By using the familiar partitioned regression of Ch. 6, the slope estimator would be βˆ = (X0MDX) −1X0MDy, where MD = INT − D(D0D) −1D0 . Lemma: MD = INT − D(D0D) −1D0 = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 , where M0 = IT − 1/T(ii0 ) is the demean-matrix. 3
By definition i 0 0 0 0i0 0 0i0 0 DD ii 0 0 T 0 0 0 ii 0 0 0T0 and therefore INT-D(DDD i 0 0 i 0 0 IT 0 0 0 0 0 0 1 0 0 M00 0 Ir -lii 0 0 Mo O 0 L 0M0
Proof: By definition, D0D = i 0 0 . . . 0 0 i 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i = i 0 i 0 . . . 0 0 i 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 i = T 0 . . . 0 0 T 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 T N×N , and therefore INT − D(D0D) −1D0 = IT 0 . . . 0 0 IT 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 IT − i 0 . . . 0 0 i 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i T 0 . . . 0 0 T 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 T −1 i 0 0 . . . 0 0 i 0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 i 0 = IT − 1 T ii0 0 . . . 0 0 IT − 1 T ii0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 IT − 1 T ii0 = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 . 4
It is easy to see that the matrix MD is idempotent and that Mo 0 y1 y1-y/11 0 Mo 0 92 y 0Mo」Ly yN- UNI and M00 MoXI 0M00 0 MoX MoX where the scalar yi=1/∑1张m,i=1,2,…,N, and let X1=(xnx12….x, then MoX1= Moxi mox12…… MoXi]. Therefore Mo 1,2,…, k with;=1/∑1xt, Denote x1=[n2…, the least squares regression of May on MD X is equivalently to regression of yit-Jil on [xit -Xil The dummy variables coefficient can be recovered from Dy=DxB+DDa+De a=(D①D)-D(y-XB since d'e=0
It is easy to see that the matrix MD is idempotent and that MDy = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 y1 y2 . . . . yN = y1 − y¯1i y2 − y¯2i . . . . yN − y¯N i and MDX = M0 0 . . . 0 0 M0 0 . . 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . 0 M0 X1 X2 . . . . XN = M0X1 M0X2 . . . . M0XN , where the scalar y¯i = 1/T PT t=1 yit, i = 1, 2, ..., N, and let Xi = [xi1 xi2 ....xik], then M0Xi = [M0xi1 M0xi2 ..... M0xik] . Therefore M0xij = xij − x¯ij i, j = 1, 2, ..., k with x¯ij = 1/T PT t=1 xijt . Denote x¯i = [x¯i1 x¯i2 ....x¯ik] 0 , the least squares regression of Mdy on MDX is equivalently to regression of [yit −y¯i ] on [xit − x¯i ]. The dummy variables coefficient can be recovered from D0y = D0Xβˆ + D0Dαˆ + D0 e, or αˆ = (D0D) −1D0 (y − Xβˆ) since D0e = 0. 5