Matthew Blackwell and Adam N.Glynn According to the model in Equation (4),the time- Unfortunatelv.these fixed effects estimation strate- varying covariates here would include the LDV.This gies require within-unit baseline randomization to assumption states that the errors of the TSCS model identify any quantity other than the CET(Sobel 2012: are mean independent of welfare spending at time t Imai and Kim 2017).Specifically,standard fixed ef- given the conditioning set that depends on the history fects models assume that previous values of covari- of the data up to t.Thus,this allows the errors for levels ates like GDP growth or lagged terrorist attacks(that of terrorism to be related to future values of welfare is,the LDV)have no impact on the current value spending. of welfare spending.Thus,to estimate any effects of Sequential ignorability weakens baseline random- lagged treatment,fixed effects models would allow for ization to allow for feedback between the treatment time-constant unmeasured confounding but would also status and the time-varying covariates,including lagged rule out a large number of TSCS applications where outcomes.For instance,sequential ignorability allows there is feedback between the covariates and the treat- for the welfare spending of a country to impact future ment.Furthermore,the assumptions of fixed-effects- levels of terrorism and for this terrorism to affect fu- style models in nonlinear settings can impose strong ture welfare spending.Thus,in this dynamic case,treat- restrictions on over-time variation in the treatment and ments can affect the covariates and so the covariates outcome(Chernozhukov et al.2013).For these reasons also have potential responses:Zit(x).This dynamic and because there is a large TSCS literature in politi- feedback implies that the lagged treatment may have cal science that relies on selection-on-observables as. both a direct effect on the outcome and an indirect sumptions,we focus on situations where sequential ig- effect through these covariates.For example,welfare norability holds.We return to the avenues for future spending might directly affect terrorism by reducing re- research on fixed effects models in this setting in the sentment among potential terrorists,but it might also conclusion. have an indirect effect if it helps to increase levels of state capacity which could,in turn,help combat future terrorism. THE POST-TREATMENT BIAS OF In TSCS models,the LDV,is often included in the TRADITIONAL TSCS MODELS above time-varying conditioning set,Vit,to assess Under sequential ignorability,standard TSCS mod- the dynamics of the time-series process or to els like the ADL model above can become biased capture the effects of longer lags of treatment in a for common TSCS estimands.The basic problem with simple manner.5 In either case,sequential ignorabil- these models is that sequential ignorability allows for ity would allow the LDV to have an effect on the the possibility of post-treatment bias when estimating treatment history as well,but baseline randomization lagged effects in the ADL model.While this problem is would not.For instance,welfare spending may have a well known in statistics(Rosenbaum 1984:Robins 1997: strong effect on terrorism levels which,in turn,affect Robins.Greenland,and Hu 1999),we review it here in future welfare spending.Under this type of feedback, the context of TSCS models to highlight the potential an LDV must be in the conditioning set Vit and strict for biased and inconsistent estimators. exogeneity will be violated. The root of the bias in the ADL approach is the nature of time-varying covariates,Zit.Under the as- 5795.801g Unmeasured Confounding and Fixed Effects sumption of baseline randomization,there is no need to control or adjust for these covariates beyond the Assumptions baseline covariates,Zio.because treatment is assigned Sequential ignorability is a selection-on-observables at baseline-future covariates cannot confound past assumption-the researcher must be able to choose a treatment assignment.The ADL approach thrives in (time-varying)conditioning set to eliminate any con- this setting.But when baseline randomization is im- founding.An oft-cited benefit of having repeated ob plausible,as we argue is true in most TSCS settings,we servations is that it allows scholars to estimate causal will typically require conditioning on these covariates effects in spite of time-constant unmeasured con- to obtain credible causal estimates.And this condition founders.Linear fixed effects models have the benefit ing on Zit is what can create large biases in the ADL of adjusting for all time-constant covariates,measured approach. or unmeasured.This would be very helpful if,for in- To demonstrate the potential for bias,we focus on stance,each country had its own baseline level of wel- a simple case where we are only interested in the first fare spending that was determined by factors corre- two lags of treatment and sequential ignorability as- lated with terrorist attacks,but the year-to-year vari- sumption holds with Vit =[Yi.t-1,Zit,Xi.t-1).This ation in spending within a country was exogenous.At means that treatment is randomly assigned conditional first glance,this ability to avoid time-constant omitted on the contemporaneous value of the time-varying co- variable bias appears to be a huge benefit. variate and the lagged values of the outcome and the treatment.Given this setting,the ADL approach would model the outcome as follows: 6 In certain parametric models,the LDV can be interpreted as sum- marizing the effects of the entire history of treatment.More gener Yit Bo aYit-1 B1Xir B2Xit-1+Zi 8 8it. ally,the LDV may effectively block confounding for contemporane- ous treatment even if it has no causal effect on the current outcome. (17) 1072
Matthew Blackwell and Adam N. Glynn According to the model in Equation (4), the timevarying covariates here would include the LDV. This assumption states that the errors of the TSCS model are mean independent of welfare spending at time t given the conditioning set that depends on the history of the data up to t. Thus, this allows the errors for levels of terrorism to be related to future values of welfare spending. Sequential ignorability weakens baseline randomization to allow for feedback between the treatment status and the time-varying covariates,including lagged outcomes. For instance, sequential ignorability allows for the welfare spending of a country to impact future levels of terrorism and for this terrorism to affect future welfare spending. Thus, in this dynamic case, treatments can affect the covariates and so the covariates also have potential responses: Zit(x1:t − 1). This dynamic feedback implies that the lagged treatment may have both a direct effect on the outcome and an indirect effect through these covariates. For example, welfare spending might directly affect terrorism by reducing resentment among potential terrorists, but it might also have an indirect effect if it helps to increase levels of state capacity which could, in turn, help combat future terrorism. In TSCS models, the LDV, is often included in the above time-varying conditioning set, Vit, to assess the dynamics of the time-series process or to capture the effects of longer lags of treatment in a simple manner.6 In either case, sequential ignorability would allow the LDV to have an effect on the treatment history as well, but baseline randomization would not. For instance, welfare spending may have a strong effect on terrorism levels which, in turn, affect future welfare spending. Under this type of feedback, an LDV must be in the conditioning set Vit and strict exogeneity will be violated. Unmeasured Confounding and Fixed Effects Assumptions Sequential ignorability is a selection-on-observables assumption—the researcher must be able to choose a (time-varying) conditioning set to eliminate any confounding. An oft-cited benefit of having repeated observations is that it allows scholars to estimate causal effects in spite of time-constant unmeasured confounders. Linear fixed effects models have the benefit of adjusting for all time-constant covariates, measured or unmeasured. This would be very helpful if, for instance, each country had its own baseline level of welfare spending that was determined by factors correlated with terrorist attacks, but the year-to-year variation in spending within a country was exogenous. At first glance, this ability to avoid time-constant omitted variable bias appears to be a huge benefit. 6 In certain parametric models, the LDV can be interpreted as summarizing the effects of the entire history of treatment. More generally, the LDV may effectively block confounding for contemporaneous treatment even if it has no causal effect on the current outcome. Unfortunately, these fixed effects estimation strategies require within-unit baseline randomization to identify any quantity other than the CET (Sobel 2012; Imai and Kim 2017). Specifically, standard fixed effects models assume that previous values of covariates like GDP growth or lagged terrorist attacks (that is, the LDV) have no impact on the current value of welfare spending. Thus, to estimate any effects of lagged treatment, fixed effects models would allow for time-constant unmeasured confounding but would also rule out a large number of TSCS applications where there is feedback between the covariates and the treatment. Furthermore, the assumptions of fixed-effectsstyle models in nonlinear settings can impose strong restrictions on over-time variation in the treatment and outcome (Chernozhukov et al.2013).For these reasons, and because there is a large TSCS literature in political science that relies on selection-on-observables assumptions, we focus on situations where sequential ignorability holds. We return to the avenues for future research on fixed effects models in this setting in the conclusion. THE POST-TREATMENT BIAS OF TRADITIONAL TSCS MODELS Under sequential ignorability, standard TSCS models like the ADL model above can become biased for common TSCS estimands. The basic problem with these models is that sequential ignorability allows for the possibility of post-treatment bias when estimating lagged effects in the ADL model.While this problem is well known in statistics (Rosenbaum 1984;Robins 1997; Robins, Greenland, and Hu 1999), we review it here in the context of TSCS models to highlight the potential for biased and inconsistent estimators. The root of the bias in the ADL approach is the nature of time-varying covariates, Zit. Under the assumption of baseline randomization, there is no need to control or adjust for these covariates beyond the baseline covariates, Zi0, because treatment is assigned at baseline—future covariates cannot confound past treatment assignment. The ADL approach thrives in this setting. But when baseline randomization is implausible, as we argue is true in most TSCS settings, we will typically require conditioning on these covariates to obtain credible causal estimates. And this conditioning on Zit is what can create large biases in the ADL approach. To demonstrate the potential for bias, we focus on a simple case where we are only interested in the first two lags of treatment and sequential ignorability assumption holds with Vit = {Yi, t − 1, Zit, Xi, t − 1}. This means that treatment is randomly assigned conditional on the contemporaneous value of the time-varying covariate and the lagged values of the outcome and the treatment.Given this setting, the ADL approach would model the outcome as follows: Yit = β0 + αYi,t−1 + β1Xit + β2Xi,t−1 + Z itδ + εit . (17) 1072 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables Assuming this functional form is correct and assuming classes of models because they are commonly used ap- that sit are independent and identically distributed,this proaches that both (a)avoid post-treatment bias in this model would consistently estimate the CET,B1,given setting and (b)do not require the parametric modeling the sequential ignorability assumption.But what about of the distribution of the time-varying covariates. the effect of lagged treatment?In the ADL approach, One modeling choice that is common to all of these one would combine the coefficients as aB+B2.The approaches,including the ADL,is the choice of causal problem with this approach is that,if Zir is affected lag length.Should we attempt to estimate the effect of by Xi.-1,then Zit will be post-treatment and in many the entire history of welfare spending on terrorist inci- cases induce bias in the estimation of B2(Rosenbaum dents with potential outcome Yir(x1:)Or should we 1984;Acharya,Blackwell,and Sen 2016).Why not sim- only investigate the contemporaneous and first lagged ply omit Zit from our model?Because this would bias effects with potential outcome Yi(x,1,x)?As we the estimates of the contemporary treatment effect,B discussed above,we can always focus on effects that due to omitted variable bias. marginalize over lags of treatment beyond the scope In this setting,there is no way to estimate the di- of our investigation.Thus,this choice of lag length is rect effect of lagged treatment without bias with a sin- less about the "correct"specification and more about gle ADL model.Unfortunately,even weakening the choosing what question the researcher wants to answer. parametric modeling assumptions via matching or gen- A separate question is what variables and their lags eralized additive models will fail to overcome this need to be included in the various models for our an- problem-it is inherent to the data generating pro- swers to be correct.We discuss the details of what needs cess (Robins 1997).These biases exist even in favor to be controlled for and when in our discussion of each able settings for the ADL,such as when the outcome estimator. is stationary and treatment effects are constant over time.Furthermore,as discussed above,standard fixed Structural Nested Mean Models 4号元 effects models cannot eliminate this bias because it involves time-dependent causal feedback.Traditional Our first class of models,SNMMs,can be seen as an approaches can only avoid the bias under special cir- extension of the ADL approach that allows for esti- cumstances such as when treatment is randomly as- mation of lagged effects in a relatively straightforward signed at baseline or when the time-varying covariates manner (Robins 1986.1997).At their most general are completely unaffected by treatment.Both of these these models focus on parameterizing a conditional assumptions lack plausibility in TSCS settings,which is version of the lagged effects (that is,the impulse re- why many TSCS studies control for time-varying co- sponse function): variates.Below,we demonstrate this bias in simula- tions,but we first turn to two methods from biostatistics br(x1:t,j)=E[Yi(x1:1-j,0j) that can avoid these biases. -Ya(x1:t-j-1,0j+1)X:t-j=x1:t- TWO METHODS FOR ESTIMATING THE (18) EFFECT OF TREATMENT HISTORIES 5795.801g If the traditional ADL model is biased in the pres- Robins (1997)refers to these impulse responses as ence of time-varying covariates,how can we proceed "blip-down functions."This function gives the effect of with estimating both contemporaneous and lagged ef- a change from 0 tox,-j in terms of welfare spending on fect of treatment in the TSCS setting?In this section. levels of terrorism at time t,conditional on the treat- we show how to estimate these causal quantities of in- ment history up to time t-j.Inference in SNMMs fo- terest defined above under sequential ignorability us cuses on estimating the causal parameters of this func- ing two approaches developed in biostatistics to specif- tion.The conditional mean of the outcome given the ically address this potential for bias in this type of set- covariates needs to be estimated as part of this ap- ting.The first approach is based on SNMMs,which,in proach,but this is seen as a nuisance function rather their simplest form,represent an extension of the ADL than the object of direct interest approach to avoid the post-treatment bias described Given the chosen lag length to study,a researcher above.The second class of estimators,based on MSMs must only specify the parameters of the impulse re- and IPTW,is semiparametric in the sense that it mod- sponse up to that many lags.If we chose a lag length of eys els the treatment history,but leaves the relationship for example,then we might parameterize the impulse between the outcome and the time-varying covariates response function as unspecified.Because of this,MSMs have the advantage of being robust to our ability or inability to model the bi(x:t,y)=Yixt-j, j∈{0.1}. 19) outcome.We focus our attention on these two broad A second issue is that ADL models often only include condition- 8 Because of focus on being faithful to the ADL setup,we assume ing variables to identify the contemporaneous effect,not any lagged that the lagged effects are constant across levels of the time-varying 士 effects of treatment.Thus,the effect of X might also suffer from confounders as is standard in ADL models.One can include inter omitted variable bias.This issue can be more easily corrected by in- actions with these variables,though SNMMs then require additional cluding the proper condition set,Vit-1,in the model. models for Zir.See Robins (1997,sec.8.3)for more details. 1073
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables Assuming this functional form is correct and assuming that εit are independent and identically distributed, this model would consistently estimate the CET, β1, given the sequential ignorability assumption. But what about the effect of lagged treatment? In the ADL approach, one would combine the coefficients as αβ 1 + β 2. The problem with this approach is that, if Zit is affected by Xi, t − 1, then Zit will be post-treatment and in many cases induce bias in the estimation of β 2 (Rosenbaum 1984; Acharya, Blackwell, and Sen 2016).Why not simply omit Zit from our model? Because this would bias the estimates of the contemporary treatment effect, β 1 due to omitted variable bias.7 In this setting, there is no way to estimate the direct effect of lagged treatment without bias with a single ADL model. Unfortunately, even weakening the parametric modeling assumptions via matching or generalized additive models will fail to overcome this problem—it is inherent to the data generating process (Robins 1997). These biases exist even in favorable settings for the ADL, such as when the outcome is stationary and treatment effects are constant over time. Furthermore, as discussed above, standard fixed effects models cannot eliminate this bias because it involves time-dependent causal feedback. Traditional approaches can only avoid the bias under special circumstances such as when treatment is randomly assigned at baseline or when the time-varying covariates are completely unaffected by treatment. Both of these assumptions lack plausibility in TSCS settings, which is why many TSCS studies control for time-varying covariates. Below, we demonstrate this bias in simulations, but we first turn to two methods from biostatistics that can avoid these biases. TWO METHODS FOR ESTIMATING THE EFFECT OF TREATMENT HISTORIES If the traditional ADL model is biased in the presence of time-varying covariates, how can we proceed with estimating both contemporaneous and lagged effect of treatment in the TSCS setting? In this section, we show how to estimate these causal quantities of interest defined above under sequential ignorability using two approaches developed in biostatistics to specifically address this potential for bias in this type of setting. The first approach is based on SNMMs, which, in their simplest form, represent an extension of the ADL approach to avoid the post-treatment bias described above. The second class of estimators, based on MSMs and IPTW, is semiparametric in the sense that it models the treatment history, but leaves the relationship between the outcome and the time-varying covariates unspecified. Because of this,MSMs have the advantage of being robust to our ability or inability to model the outcome. We focus our attention on these two broad 7 A second issue is that ADL models often only include conditioning variables to identify the contemporaneous effect, not any lagged effects of treatment. Thus, the effect of Xi, t − 1 might also suffer from omitted variable bias. This issue can be more easily corrected by including the proper condition set, Vi, t − 1, in the model. classes of models because they are commonly used approaches that both (a) avoid post-treatment bias in this setting and (b) do not require the parametric modeling of the distribution of the time-varying covariates. One modeling choice that is common to all of these approaches, including the ADL, is the choice of causal lag length. Should we attempt to estimate the effect of the entire history of welfare spending on terrorist incidents with potential outcome Yit(x1: t)? Or should we only investigate the contemporaneous and first lagged effects with potential outcome Yit(xt − 1, xt)? As we discussed above, we can always focus on effects that marginalize over lags of treatment beyond the scope of our investigation. Thus, this choice of lag length is less about the “correct” specification and more about choosing what question the researcher wants to answer. A separate question is what variables and their lags need to be included in the various models for our answers to be correct.We discuss the details of what needs to be controlled for and when in our discussion of each estimator. Structural Nested Mean Models Our first class of models, SNMMs, can be seen as an extension of the ADL approach that allows for estimation of lagged effects in a relatively straightforward manner (Robins 1986, 1997). At their most general, these models focus on parameterizing a conditional version of the lagged effects (that is, the impulse response function):8 bt(x1: t, j) = E[Yit(x1: t−j, 0j) −Yit(x1: t−j−1, 0j+1 )|X1: t−j = x1: t−j]. (18) Robins (1997) refers to these impulse responses as “blip-down functions.” This function gives the effect of a change from 0 to xt − j in terms of welfare spending on levels of terrorism at time t, conditional on the treatment history up to time t − j. Inference in SNMMs focuses on estimating the causal parameters of this function. The conditional mean of the outcome given the covariates needs to be estimated as part of this approach, but this is seen as a nuisance function rather than the object of direct interest. Given the chosen lag length to study, a researcher must only specify the parameters of the impulse response up to that many lags. If we chose a lag length of 1, for example, then we might parameterize the impulse response function as bt(x1: t, j; γ ) = γjxt−j, j ∈ {0, 1}. (19) 8 Because of focus on being faithful to the ADL setup, we assume that the lagged effects are constant across levels of the time-varying confounders as is standard in ADL models. One can include interactions with these variables, though SNMMs then require additional models for Zit. See Robins (1997, sec. 8.3) for more details. 1073 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357