EC2610 Fall 2004 GMM Notes for EC2610 1 Introduction These notes povide an introduction to GMM estimation. Their primary purpose is to make the reader familiar enough with gmm to be able to solve problem set assignments. For the more theoretical foundations, properties and extensions of GMM, or to better understand its workings, interested reader should consult any Hayashi, Hamilton, etc, as well as the original GMM article by Hansen(1982) Available lecture notes for graduate econometrics courses, e. g. by Chamberlain (Ec 2140), by Pakes and Porter(Ec 2144), also contain very useful reviews of GMM Generalized Method of Moments provides asymptotic properties for estima- tors and is general enough to include many other commonly used techniques like Ols and ML Having such an umbrella to encompass many of the estimators is very useful, as one doesn't have to derive each estimator property separately With such a wide range, it is not surprising to see GMM used extensively, but one should also be careful when it is appropriate to apply. Since GMm deals with asymptotic properties, it works well for large samples, but does not pro- vide an answer when the samply size is small, or what is "largeenough sample ize. Also, when applying GMm, one may forgo certain desirable properties like efiiciency. 2 GMM Framework 2.1 Definition of GMM Estimator Let i, i=1,., n be i i.d. random draws from the unknown population distri- bution P For a known function y/, the parameter Bo Ee(usually also in the interior of e) is known to satisfy the key moment condition E(ai, 80)=0 This equation provides the core of the GMM estimation. The appropriate function y and the parameter Bo are usually derived from a theoretical model Both yl and Bo can be vector valued and not necessarily of the same size. Let the size of wl be q, and the size of 0 be p. The mean is 0 only at the true parameter value Bo, which is assumed to be unique over some neighborhood around Bo Along with equation (1), one also imposes certain boundary conditions for the 2nd order moment and partial derivative one E[v(x;60)v(x;,60)≡更<∞ 0 0e sm(a)
EC2610 Fall 2004 GMM Notes for EC2610 1 Introduction These notes povide an introduction to GMM estimation. Their primary purpose is to make the reader familiar enough with GMM to be able to solve problem set assignments. For the more theoretical foundations, properties and extensions of GMM, or to better understand its workings, interested reader should consult any of the standard graduate econometrics textbooks, e.g., by Greene, Wooldridge, Hayashi, Hamilton, etc., as well as the original GMM article by Hansen (1982). Available lecture notes for graduate econometrics courses, e.g. by Chamberlain (Ec 2140), by Pakes and Porter (Ec 2144), also contain very useful reviews of GMM. Generalized Method of Moments provides asymptotic properties for estimators and is general enough to include many other commonly used techniques, like OLS and ML. Having such an umbrella to encompass many of the estimators is very useful, as one doesnít have to derive each estimator property separately. With such a wide range, it is not surprising to see GMM used extensively, but one should also be careful when it is appropriate to apply. Since GMM deals with asymptotic properties, it works well for large samples, but does not provide an answer when the samply size is small, or what is "large" enough sample size. Also, when applying GMM, one may forgo certain desirable properties, like eÖiciency. 2 GMM Framework 2.1 DeÖnition of GMM Estimator Let xi ; i = 1; :::; n be i.i.d. random draws from the unknown population distribution P. For a known function ; the parameter 0 2 (usually also in the interior of ) is known to satisfy the key moment condition: E [ (xi ; 0)] = 0 (1) This equation provides the core of the GMM estimation. The appropriate function and the parameter 0 are usually derived from a theoretical model. Both and 0 can be vector valued and not necessarily of the same size. Let the size of be q, and the size of be p. The mean is 0 only at the true parameter value 0, which is assumed to be unique over some neighborhood around 0: Along with equation (1), one also imposes certain boundary conditions for the 2nd order moment and partial derivative one: E (xi ; 0) 0 (xi ; 0) < 1 and @ 2 j (x; ) @k@l m(x) 1
EC2610 Fall 2004 for all e∈, where E m(x<∞.Also, define D≡E av(ai, Bo ind assume, D has rank equal to p, the dimension of 6 (Note: the above conditions are sufficient, and properties of GMM estimators can also be obtained under weaker conditions) The task of the econometrician lies in obtaining estimate 0 of Bo from the key moment condition. Since there is sample of size n from the population distribution, one may try to obtain the estimate by replacing the population mean with a sample ∑叭, This is a system of q equations with p unknowns. If p=q, we're just dentified, "and under some weak conditions, one can obtain a(unique)solution o(2)around the neighborhood of Bo. When q>p, then we're "over-identified, and a solution will not exist for most functions v. A natural approach for the latter case might be to try to get the left hand side as close to 0 as possible with" closeness" defined over some norm‖·‖A: where An is q-by-q symmetric, positive definite matrix Another approach could be to find the soltuion to(2), by making some linear bination of v,; equations equal to 0. I.e. for some p-by-q matrix Cn, of rank olve for ∑叭(xb hich will give us p equations with p unknowns In fact, both approaches are equivalent and gmM estimation is setup to do exactly that. That is, when p=q, GMM is just-identified and we can usually solve for 6 exactly. When q>p, we're in the over-identified case and for som appropriate matrix An(or Cn), GMM estimate 0 is found by v(x1,6)A 4 (Or equivalently, solving for: equation(3)). The choice of An will be discussed later, but for now assume An -y a.s., where y is also symmetric and positive- definite 2.2 Asymptotic properties of GMM Given the above setup, gMM provides two key results: consistency and as- ymptotic normality. Consistency shows that our estiamtor gives us the right
EC2610 Fall 2004 for all 2 ; where E [m(x)] < 1: Also, deÖne D E @ (xi ; 0) @0 and assume, D has rank equal to p, the dimension of : (Note: the above conditions are su¢ cient, and properties of GMM estimators can also be obtained under weaker conditions). The task of the econometrician lies in obtaining estimate b of 0 from the key moment condition. Since there is sample of size n from the population distribution, one may try to obtain the estimate by replacing the population mean with a sample one: 1 n X i (xi ;b) = 0 (2) This is a system of q equations with p unknowns. If p=q, weíre "justidentiÖed," and under some weak conditions, one can obtain a (unique) solution to (2) around the neighborhood of 0. When q>p, then weíre "over-identiÖed," and a solution will not exist for most functions : A natural approach for the latter case might be to try to get the left hand side as close to 0 as possible, with "closeness" deÖned over some norm kkAn : kykAn = y 0A 1 n y where An is q-by-q symmetric, positive deÖnite matrix. Another approach could be to Önd the soltuion to (2), by making some linear combination of j equations equal to 0. I.e. for some p-by-q matrix Cn, of rank p, solve for: Cn 1 n X i (xi ;b) = 0 (3) which will give us p equations with p unknowns. In fact, both approaches are equivalent and GMM estimation is setup to do exactly that. That is, when p=q, GMM is just-identiÖed and we can usually solve for b exactly. When q>p, weíre in the over-identiÖed case and for some appropriate matrix An (or Cn), GMM estimate b is found by: b = arg min 2 " 1 n X i (xi ; ) #0 A 1 n " 1 n X i (xi ; ) # (4) (Or equivalently, solving for:equation (3)). The choice of An will be discussed later, but for now assume An ! a.s., where is also symmetric and positivedeÖnite. 2.2 Asymptotic properties of GMM Given the above setup, GMM provides two key results: consistency and asymptotic normality. Consistency shows that our estiamtor gives us the "right" 2
EC2610 Fall 2004 answer, and asymptotic normality provides us with variance-covariance matrix which we can use for hypothesis testing. More specifically, the estimator 6. found via equation 3)satisfies 0-8o as(consistency), and m(6-60)N(0,△) △=(D业-1D)-1D-1 (Looking at above properties, one can draw obvious similarities between the GMM estimator, and the Delta Method) To do hypothesis testing, let n denote the asymptotic distribtuion. The equation(5)implies 0N(60,A) where A=△更A′ =(D-D)-1D-1y-1D(D业-1D)-1 dp and D are population means defined over true parameter values, and y is the probability limit of An. When computing the variance matrix for a given mple, one usually replaces the population mean with the sample mean: the true parameter value with the estimated value, and y with E v(ri, 0o)v(ci, 0o) v(x1,)v(x,6) 亚≈An The standard errors are obtained from SEk where Akk is the kth diagonal entry of A
EC2610 Fall 2004 answer, and asymptotic normality provides us with variance-covariance matrix, which we can use for hypothesis testing. More speciÖcally, the estimator b, found via equation (3) satisÖes b ! 0 a.s. (consistency), and p n(b 0) d ! N(0;) (5) (asymptotic normality), where = 0 and = (D0 1D) 1D0 1 (Looking at above properties, one can draw obvious similarities between the GMM estimator, and the Delta Method). To do hypothesis testing, let A denote the asymptotic distribtuion. Then, equation (5) implies: b A N(0; 1 n ) where = 0 (6) = (D0 1D) 1D0 1 1D(D0 1D) 1 and D are population means deÖned over true parameter values, and is the probability limit of An::When computing the variance matrix for a given sample, one usually replaces the population mean with the sample mean; the true parameter value with the estimated value, and with An : = E (xi ; 0) 0 (xi ; 0) 1 n X i (xi ;b) 0 (xi ;b) D = E @ (xi ; 0) @0 1 n X i @ (xi ; ) @0 j =b and An The standard errors are obtained from: SEk = r 1 n kk where kk is the kth diagonal entry of : 3
EC2610 Fall 2004 2.3 Optimal Weighting Matrices Having established the properties of GMM, we now turn to the choice of the weighting matrix An and Cn. When gmm is just identified, then one can usually solve for 0 from equation(2). This is equivalent for finding a unique minimum and since it has full rank, ill ositive-definite matrix An. Also, D will be square: point in equation(4)for any △更△ (Dy- D)Dy-- D(Dy-lD)-1 D-1业D-1D重-1-DD-1yD-1 DD As expected, the choice of An doesn't affect the asymptotic distribution for the just-identified case. For the over-identified case. the choice of the weight matrix will now matter for 6. However, since the consistency and asymptotic normality results of gmm do not depend on the choice of An(as long as it's symmetric and positive definite), we should get our main results again for any choice of An. In such a case, the most common choice is the identity matrix: An=I g Then,业= Ig and D业-1D)-1D-1 (DD)D' and the approximate variance-covariance matrix will be (DD)DAD(DD) (This is the format of GMM variance-covariance matrix Prof. Pakes uses in the IO lecture notes. Given that one is free to choose which particular An to choose, one can try ick the weighting matrix to give gmm other desirable pr fficiency. From equation 6, we know that A=(D-1D)-1D业-4y-1D(D业-1D)-1
EC2610 Fall 2004 2.3 Optimal Weighting Matrices 2.3.1 Choice of An Having established the properties of GMM, we now turn to the choice of the weighting matrix An and Cn: When GMM is just identiÖed, then one can usually solve for b from equation (2). This is equivalent for Önding a unique minimum point in equation (4) for any positive-deÖnite matrix An: Also, D will be square; and since it has full rank, will be invertible. Then, the variance matrix will be: = 0 = (D0 1D) 1D0 1 1D(D0 1D) 1 = D1 D01D0 1 1DD1 D01 = D1D01 As expected, the choice of An doesnít a§ect the asymptotic distribution for the just-identiÖed case. For the over-identiÖed case, the choice of the weight matrix will now matter for b: However, since the consistency and asymptotic normality results of GMM do not depend on the choice of An (as long as itís symmetric and positive deÖnite), we should get our main results again for any choice of An: In such a case, the most common choice is the identity matrix: An = Iq Then, = Iq and = (D0 1D) 1D0 1 = (D0D) 1D0 and the approximate variance-covariance matrix will be: 1 n = 1 n 0 = 1 n (D0D) 1D0D(D0D) 1 (This is the format of GMM variance-covariance matrix Prof. Pakes uses in the IO lecture notes.) Given that one is free to choose which particular An to choose, one can try pick the weighting matrix to give GMM other desirable properties as well, like e¢ ciency. From equation 6, we know that: = (D0 1D) 1D0 1 1D(D0 1D) 1 4
EC2610 Fall 2004 Since we're now free to pick y, one can choose it to minimzes the variance A n(D/业-1D)-1D(14y-1D(D业-1D)-1 It is easy to show that the min(D业-1D)-1D业-14-1D(D业-1D)-1=(D4-1D)-1 which is obtained at The above solution has very intuitive appeal: indexes with larger variances are assigned smaller weights in the estimation 2.3.2 2-Step GMM estimation The above procedure then gives rise to 2-step gMM estimation. in the spirit of FGLS 1. Pick An= I (equal weighting), and solve for the 1st stage GMM estimate 01.Since 01 is consistent, iEi v(i, 01)v(a;, 01) will be consistent estimate of 2. Pick An =i v(i, 01)v(, 01), and obtain the 2nd stage GMM estimate 0, The variance matrix IA. will then be the smallest 2.3.3 Choice of c It should be clear by now how the equations 3)and(4)are related to each, and correspondingly, how An and Cn are related. By actually differentiating the minimization problem in equation(4), we obtain the FOC n400mn2v(x,)=0 Ou(x,0)-1 we have equation(7)tu One caveat should be pointed out. We specified that equation 3)is linear ombination of v (a, 0, i.e. Cn is a matrix of constants. But in equation(7) Cn will in general depend on the solution of the equation: 8. This can be easily circumvented if we look at the 2nd stage GMM solution, and use the first stage Ist stage 01 for Cn. That is, if in the second step, we'd normally solve v(x;,b2)=0
EC2610 Fall 2004 Since weíre now free to pick ; one can choose it to minimzes the variance: = arg min = arg min (D0 1D) 1D0 1 1D(D0 1D) 1 It is easy to show that the minimum is equal to: min (D0 1D) 1D0 1 1D(D0 1D) 1 = (D0 1D) 1 which is obtained at = The above solution has very intuitive appeal: indexes with larger variances are assigned smaller weights in the estimation. 2.3.2 2-Step GMM estimation The above procedure then gives rise to 2-step GMM estimation, in the spirit of FGLS. 1. Pick An = I (equal weighting), and solve for the 1st stage GMM estimate: b1: Since b1 is consistent, 1 n P i (xi ;b1) (xi ;b1) 0 will be consistent estimate of : 2. Pick An = 1 n P i (xi ;b1) (xi ;b1) 0 ; and obtain the 2nd stage GMM estimate b2. The variance matrix 1 n b2 will then be the smallest. 2.3.3 Choice of Cn It should be clear by now how the equations (3) and (4) are related to each, and correspondingly, how An and Cn are related. By actually di§erentiating the minimization problem in equation (4), we obtain the FOC: " 1 n X i @ (xi ;b) @0 #0 A 1 n " 1 n X i (xi ;b) # = 0 (7) If we now deÖne Cn " 1 n X i @ (xi ;b) @0 #0 A 1 n we have equation (7) turning into (3). One caveat should be pointed out. We speciÖed that equation (3) is linear combination of j (x;b); i.e. Cn is a matrix of constants. But in equation (7) Cn will in general depend on the solution of the equation: b: This can be easily circumvented if we look at the 2nd stage GMM solution, and use the Örst stage 1st stage b1 for Cn: That is, if in the second step, weíd normally solve: " 1 n X i @ (xi ;b2) @0 #0 A 1 n " 1 n X i (xi ;b2) # = 0 5