American Political Science Review (2018)112,4,1067-1082 doi:10.1017/S0003055418000357 American Political Science Association 2018 How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables MATTHEW BLACKWELL Harvard University ADAM N.GLYNN Emory University epeated measurements of the same countries,people,or groups over time are vital to many fields of political science.These measurements,sometimes called time-series cross-sectional (TSCS)data. allow researchers to estimate a broad set of causal quantities,including contemporaneous effects op//s and direct effects of lagged treatments.Unfortunately,popular methods for TSCS data can only produce valid inferences for lagged effects under some strong assumptions.In this paper,we use potential outcomes to define causal quantities of interest in these settings and clarify how standard models like the autore- gressive distributed lag model can produce biased estimates of these quantities due to post-treatment conditioning.We then describe two estimation strategies that avoid these post-treatment biases-inverse probability weighting and structural nested mean models-and show via simulations that they can outper- form standard approaches in small sample settings.We illustrate these methods in a study of how welfare spending affects terrorism. INTRODUCTION counterfactual causal effects and discuss the assump- any inquiries in political science involve the tions needed to identify them nonparametrically.We also relate these quantities of interest to common study of repeated measurements of the same quantities in the TSCS literature,like impulse re- countries,people,or groups at several points sponses,and show how to derive them from the param- in time.This type of data,sometimes called time-series eters of a common TSCS model,the autoregressive dis cross-sectional (TSCS)data.allows researchers to draw tributed lag(ADL)model.These treatment effects can on a larger pool of information when estimating causal be nonparametrically identified under a key selection- effects.TSCS data also give researchers the power to on-observables assumption called sequential ignora- ask a richer set of questions than data with a single bility;unfortunately,however,many common TSCS measurement for each unit (for example,see Beck approaches rely on more stringent assumptions and Katz 2011).Using this data,researchers can move including a lack of causal feedback between the 是 past the narrowest contemporaneous questions- treatment and time-varying covariates.This feedback. what are the effects of a single event?-and instead for example,might involve a country's level of welfare ask how the history of a process affects the political spending affecting the vote share of left wing parties, world.Unfortunately,the most common approaches which in turn might affect future levels of spending.We to modeling TSCS data require strict assumptions to estimate the effect of treatment histories without bias argue that this type of feedback is common in TSCS settings.While we focus on a selection-on-observables and make it difficult to understand the nature of the assumption in this paper,we discuss the tradeoffs counterfactual comparisons. with this choice compared to standard fixed-effects This paper makes three contributions to the study methods,noting that the latter may also rule out this of TSCS data.Our first contribution is to define some type of dynamic feedback. Our second contribution is to provide an introduc- tion to two methods from biostatistics that can estimate Matthew Blackwell is an Associate Professor,Department of the effect of treatment histories without bias and under Government and Institute for Quantitative Social Science.Har- vard University,1737 Cambridge St.,MA 02138.Web:http://www. weaker assumptions than common TSCS models.We mattblackwell.org (mblackwell@gov.harvard.edu) focus on two methods:(1)structural nested mean Adam N.Glynn is an Associate Professor,Department of Political models or SNMMs (Robins 1997)and (2)marginal Science,Emory University,327 Tarbutton Hall,1555 Dickey Drive, structural models (MSMs)with inverse probability of Atlanta,GA 30322(aglynn@emory.edu). treatment weighting (IPTWs)(Robins,Hernan,and We are grateful to Neal Beck,Jake Bowers,Patrick Brandt,Simc Goshev,and Cyrus Samii for helpful advice and feedback and El- Brumback 2000).These models allow for consistent es- isha Cohen for research support.Any remaining errors are our own. timation of lagged effects of treatment by paying care- This research project was supported by Riksbankens Jubileumsfond ful attention to the causal ordering of the treatment,the Grant M13-0559:1,PI:Staffan I.Lindberg,V-Dem Institute,Uni- outcome,and the time-varying covariates.The SNMM versity of Gothenburg,Sweden and by European Research Coun- approach generalizes the standard regression modeling cil,Grant 724191,PI:Staffan I.Lindberg,V-Dem Institute,Uni- versity of Gothenburg,Sweden.Replication files are available on of ADLs and often implies very simple and intuitive the American Political Science Review Dataverse:https://doi.org/10. multi-step estimators.The MSM approach focuses on 7910/DVN/SFBX6Z. modeling the treatment process to develop weights Received:September 30,2017;revised:March 16,2018;accepted: that adjust for confounding in simple weighted regres- June 4,2018.First published online:August 3,2018. sion models.Both of these approaches have the ability 1067
American Political Science Review (2018) 112, 4, 1067–1082 doi:10.1017/S0003055418000357 © American Political Science Association 2018 How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables MATTHEW BLACKWELL Harvard University ADAM N. GLYNN Emory University Repeated measurements of the same countries, people, or groups over time are vital to many fields of political science. These measurements, sometimes called time-series cross-sectional (TSCS) data, allow researchers to estimate a broad set of causal quantities, including contemporaneous effects and direct effects of lagged treatments. Unfortunately, popular methods for TSCS data can only produce valid inferences for lagged effects under some strong assumptions. In this paper,we use potential outcomes to define causal quantities of interest in these settings and clarify how standard models like the autoregressive distributed lag model can produce biased estimates of these quantities due to post-treatment conditioning. We then describe two estimation strategies that avoid these post-treatment biases—inverse probability weighting and structural nested mean models—and show via simulations that they can outperform standard approaches in small sample settings. We illustrate these methods in a study of how welfare spending affects terrorism. INTRODUCTION Many inquiries in political science involve the study of repeated measurements of the same countries, people, or groups at several points in time. This type of data, sometimes called time-series cross-sectional (TSCS) data, allows researchers to draw on a larger pool of information when estimating causal effects. TSCS data also give researchers the power to ask a richer set of questions than data with a single measurement for each unit (for example, see Beck and Katz 2011). Using this data, researchers can move past the narrowest contemporaneous questions— what are the effects of a single event?—and instead ask how the history of a process affects the political world. Unfortunately, the most common approaches to modeling TSCS data require strict assumptions to estimate the effect of treatment histories without bias and make it difficult to understand the nature of the counterfactual comparisons. This paper makes three contributions to the study of TSCS data. Our first contribution is to define some Matthew Blackwell is an Associate Professor, Department of Government and Institute for Quantitative Social Science, Harvard University, 1737 Cambridge St., MA 02138. Web: http://www. mattblackwell.org (mblackwell@gov.harvard.edu). Adam N. Glynn is an Associate Professor, Department of Political Science, Emory University, 327 Tarbutton Hall, 1555 Dickey Drive, Atlanta, GA 30322 (aglynn@emory.edu). We are grateful to Neal Beck, Jake Bowers, Patrick Brandt, Simo Goshev, and Cyrus Samii for helpful advice and feedback and Elisha Cohen for research support. Any remaining errors are our own. This research project was supported by Riksbankens Jubileumsfond, Grant M13-0559:1, PI: Staffan I. Lindberg, V-Dem Institute, University of Gothenburg, Sweden and by European Research Council, Grant 724191, PI: Staffan I. Lindberg, V-Dem Institute, University of Gothenburg, Sweden. Replication files are available on the American Political Science Review Dataverse: https://doi.org/10. 7910/DVN/SFBX6Z. Received: September 30, 2017; revised: March 16, 2018; accepted: June 4, 2018. First published online: August 3, 2018. counterfactual causal effects and discuss the assumptions needed to identify them nonparametrically. We also relate these quantities of interest to common quantities in the TSCS literature, like impulse responses, and show how to derive them from the parameters of a common TSCS model, the autoregressive distributed lag (ADL) model. These treatment effects can be nonparametrically identified under a key selectionon-observables assumption called sequential ignorability; unfortunately, however, many common TSCS approaches rely on more stringent assumptions, including a lack of causal feedback between the treatment and time-varying covariates. This feedback, for example, might involve a country’s level of welfare spending affecting the vote share of left wing parties, which in turn might affect future levels of spending.We argue that this type of feedback is common in TSCS settings. While we focus on a selection-on-observables assumption in this paper, we discuss the tradeoffs with this choice compared to standard fixed-effects methods, noting that the latter may also rule out this type of dynamic feedback. Our second contribution is to provide an introduction to two methods from biostatistics that can estimate the effect of treatment histories without bias and under weaker assumptions than common TSCS models. We focus on two methods: (1) structural nested mean models or SNMMs (Robins 1997) and (2) marginal structural models (MSMs) with inverse probability of treatment weighting (IPTWs) (Robins, Hernán, and Brumback 2000).These models allow for consistent estimation of lagged effects of treatment by paying careful attention to the causal ordering of the treatment, the outcome, and the time-varying covariates. The SNMM approach generalizes the standard regression modeling of ADLs and often implies very simple and intuitive multi-step estimators. The MSM approach focuses on modeling the treatment process to develop weights that adjust for confounding in simple weighted regression models. Both of these approaches have the ability 1067 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357
Matthew Blackwell and Adam N.Glynn to incorporate weaker modeling assumptions than period t and Xit=0 if the unit is untreated in period t traditional TSCS models.We describe the modeling (it is straightforward to generalize to arbitrary treat- choices involved and provide guidance on how to ment types).In our running example,Xit=1 would implement these methods. represent a country that had high welfare spending in Our third contribution is to show how traditional year t and Xit =0 would be a country with low welfare models like the ADL are biased for the direct effects spending.We collect all of the treatments for a given of lagged treatments in common TSCS settings,while unit into a treatment history,Xi=(Xa,...,Xir),where T MSMs and SNMMs are not.This bias arises from the is the number of time periods in the study.For example time-varying covariates-researchers must control for we might have a country that always had high spending, them to accurately estimate contemporaneous effects, (1,1,...,1),or a country that always had low spending. but they induce post-treatment bias for lagged effects. (0,0,...,0).We refer to the partial treatment history Thus.ADL models can only consistently estimate up to t as Xi.:t=(Xi,...Xir),with x1:t as a possible lagged effects when time-varying covariates are particular realization of this random vector.We define unaffected by past treatment.SNMMs and MSMs,on Zit,Zi.1:t,and similarly for a set of time-varying co- the other hand,can estimate these effects even when variates that are causally prior to the treatment at time t such feedback exists.We provide simulation evidence such as the government capability,population size,and that this type of feedback can lead to significant bias whether or not the country is in a conflict. in ADL models compared to the SNMM and MSM The goal is to estimate causal effects of the treat- approaches.Overall,these latter methods could be ment on an outcome,Yit,that also varies over time.In promising for TSCS scholars,especially those who are our running example,Yit is the number of terrorist inci- interested in longer-term effects. dents in a given country in a given year.We take a coun- This paper proceeds as follows.We first clarify terfactual approach and define potential outcomes for the causal quantities of interest available with TSCS each time period,Yir(x1:)(Rubin 1978;Robins 1986). data and show how they relate to parameters from This potential outcome represents the incidence of ter- traditional TSCS models.Causal assumptions are a rorism that would occur in country i in year t if i had key part of any TSCS analysis and we discuss them followed history of welfare spending equal to x1:.Ob- in the following section.We then turn to discussing viously,for any country in any year,we only observe the post-treatment bias stemming from traditional one of these potential outcomes since a country can- TSCS approaches,and then introduce the SNMM and not follow multiple histories of welfare spending over MSM approaches,which avoid this post-treatment the same time window.To connect the potential out- bias,and show how to estimate causal effects using comes to the observed outcomes,we make the stan- these methodologies.We present simulation evidence dard consistency assumption.Namely,we assume that of how these methods outperform traditional TSCS the observed outcome and the potential outcome are models in small samples in the following section.Next, the same for the observed history:Yi=Yi(x1.,)when we present an empirical illustration of each approach, X,1:t=x1:.2 based on Burgoon(2006),investigating the connection To create a common playing field for all the methods between welfare spending and terrorism.Finally,we we evaluate.we limit ourselves to making causal infer- conclude with thoughts on both the limitations of ences about the time window observed in the data- these approaches and avenues for future research. that is,we want to study the effect of welfare spend- ing on terrorism for the years in our data set.Under certain assumptions like stationarity of the covariates CAUSAL QUANTITIES OF INTEREST IN and error terms,many TSCS methods can make infer- TSCS DATA ences about the long-term effects beyond the end of the study.This extrapolation is typically required with At their most basic,TSCS data consists of a treatment a single time series,but with the multiple units we have (or main independent variable of interest),an outcome, in TSCS data,we have the ability to focus our infer- and some covariates all measured for the same units at ences on a particular window and avoid these assump- various points in time.In our empirical setting below. tions about the time-series processes.We view this as we focus on a dataset of countries with the number of a conservative approach because all methods for han- terrorist incidents as an outcome and domestic welfare dling TSCS should be able to generate sensible esti- spending as a binary treatment.With one time period, mates of causal effects in the period under study.There eys only one causal comparison exists:a country has either high or low levels of welfare spending.As we gather The definition of potential outcomes in this manner implicitly as- data on these countries over time,there are more coun- sumes the usual stable unit treatment value assumption(SUTVA) terfactual comparisons to investigate.How does the (Rubin 1978).This assumption is questionable for the many compar- history of welfare spending affect the incidence of ter- ative politics and international relations applications,but we avoid discussing this complication in this paper to focus on the issues re- rorism?Does the spending regime today only affect ter- garding TSCS data.Implicit in our definition of the potential out- rorism today or does the recent history matter as well? comes is that outcomes at time t only depend on past values of treat- The variation over time provides the opportunity and ment,not future values (Abbring and van den Berg 2003). the challenge of answering these complex questions. 2 Implicit in the definition of the potential outcomes is that the To fix ideas,let Xi be the treatment for unit i in time treatment history can affect the outcome through the history of period t.For simplicity,we focus first on the case of a time-varying covariates:Y(x)=Y(x1:Z.1:(x111)).Here. 2i.1:r(1-1)represents the values that the covariate history would binary treatment so thatXir=1 if the unit is treated in take under this treatment history. 1068
Matthew Blackwell and Adam N. Glynn to incorporate weaker modeling assumptions than traditional TSCS models. We describe the modeling choices involved and provide guidance on how to implement these methods. Our third contribution is to show how traditional models like the ADL are biased for the direct effects of lagged treatments in common TSCS settings, while MSMs and SNMMs are not. This bias arises from the time-varying covariates—researchers must control for them to accurately estimate contemporaneous effects, but they induce post-treatment bias for lagged effects. Thus, ADL models can only consistently estimate lagged effects when time-varying covariates are unaffected by past treatment. SNMMs and MSMs, on the other hand, can estimate these effects even when such feedback exists. We provide simulation evidence that this type of feedback can lead to significant bias in ADL models compared to the SNMM and MSM approaches. Overall, these latter methods could be promising for TSCS scholars, especially those who are interested in longer-term effects. This paper proceeds as follows. We first clarify the causal quantities of interest available with TSCS data and show how they relate to parameters from traditional TSCS models. Causal assumptions are a key part of any TSCS analysis and we discuss them in the following section. We then turn to discussing the post-treatment bias stemming from traditional TSCS approaches, and then introduce the SNMM and MSM approaches, which avoid this post-treatment bias, and show how to estimate causal effects using these methodologies. We present simulation evidence of how these methods outperform traditional TSCS models in small samples in the following section. Next, we present an empirical illustration of each approach, based on Burgoon (2006), investigating the connection between welfare spending and terrorism. Finally, we conclude with thoughts on both the limitations of these approaches and avenues for future research. CAUSAL QUANTITIES OF INTEREST IN TSCS DATA At their most basic, TSCS data consists of a treatment (or main independent variable of interest), an outcome, and some covariates all measured for the same units at various points in time. In our empirical setting below, we focus on a dataset of countries with the number of terrorist incidents as an outcome and domestic welfare spending as a binary treatment. With one time period, only one causal comparison exists: a country has either high or low levels of welfare spending. As we gather data on these countries over time, there are more counterfactual comparisons to investigate. How does the history of welfare spending affect the incidence of terrorism? Does the spending regime today only affect terrorism today or does the recent history matter as well? The variation over time provides the opportunity and the challenge of answering these complex questions. To fix ideas, let Xit be the treatment for unit i in time period t. For simplicity, we focus first on the case of a binary treatment so that Xit = 1 if the unit is treated in period t and Xit = 0 if the unit is untreated in period t (it is straightforward to generalize to arbitrary treatment types). In our running example, Xit = 1 would represent a country that had high welfare spending in year t and Xit = 0 would be a country with low welfare spending. We collect all of the treatments for a given unit into a treatment history,Xi = (Xi1,…,XiT), where T is the number of time periods in the study. For example, we might have a country that always had high spending, (1, 1, …, 1), or a country that always had low spending, (0, 0, …, 0). We refer to the partial treatment history up to t as Xi, 1: t = (Xi1, …, Xit), with x1: t as a possible particular realization of this random vector. We define Zit, Zi, 1: t, and z1: t similarly for a set of time-varying covariates that are causally prior to the treatment at time t such as the government capability, population size, and whether or not the country is in a conflict. The goal is to estimate causal effects of the treatment on an outcome, Yit, that also varies over time. In our running example,Yit is the number of terrorist incidents in a given country in a given year.We take a counterfactual approach and define potential outcomes for each time period, Yit(x1: t) (Rubin 1978; Robins 1986).1 This potential outcome represents the incidence of terrorism that would occur in country i in year t if i had followed history of welfare spending equal to x1: t. Obviously, for any country in any year, we only observe one of these potential outcomes since a country cannot follow multiple histories of welfare spending over the same time window. To connect the potential outcomes to the observed outcomes, we make the standard consistency assumption. Namely, we assume that the observed outcome and the potential outcome are the same for the observed history: Yit = Yit(x1: t) when Xi, 1: t = x1: t. 2 To create a common playing field for all the methods we evaluate, we limit ourselves to making causal inferences about the time window observed in the data— that is, we want to study the effect of welfare spending on terrorism for the years in our data set. Under certain assumptions like stationarity of the covariates and error terms, many TSCS methods can make inferences about the long-term effects beyond the end of the study. This extrapolation is typically required with a single time series, but with the multiple units we have in TSCS data, we have the ability to focus our inferences on a particular window and avoid these assumptions about the time-series processes. We view this as a conservative approach because all methods for handling TSCS should be able to generate sensible estimates of causal effects in the period under study. There 1 The definition of potential outcomes in this manner implicitly assumes the usual stable unit treatment value assumption (SUTVA) (Rubin 1978). This assumption is questionable for the many comparative politics and international relations applications, but we avoid discussing this complication in this paper to focus on the issues regarding TSCS data. Implicit in our definition of the potential outcomes is that outcomes at time t only depend on past values of treatment, not future values (Abbring and van den Berg 2003). 2 Implicit in the definition of the potential outcomes is that the treatment history can affect the outcome through the history of time-varying covariates: Yit(xi:t) = Yit(x1: t, Zi, 1: t(x1:t − 1)). Here, Zi, 1: t(x1:t − 1) represents the values that the covariate history would take under this treatment history. 1068 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables is a tradeoff with this approach:we cannot study some common TSCS estimands like the long-run multiplier FIGURE 1. Directed acyclic graph(DAG)of that are based on time-series analysis.We discuss this typical TSCS data.Dotted lines are the causal estimand in particular in the Supplemental Material. pathways that constitute the average causal Given our focus on a fixed time window,we will de- effect of a treatment history at time t. fine expectations over cross-sectional units and con- sider asymptotic properties of the estimators as the number of these units grows(rather than the length XI-1 of the time series).Asymptotics are only useful in how they guide our analyses in the real world of finite sam- ples,and we may worry that "large-N,fixed-T asymp- Z totic results do not provide a reliable approximation when N and T are roughly the same size,as is often the case for TSCS data.Fortunately,as we show in the simulation studies below,our analysis of the various of the effects of Xit,Xi.1,Xi.t-2,and so on,that end ISCS estimators holds even when N and T are small up at Yit.Note that many of these effects flow through and close in size.Thus,we do not see the choices of the time-varying covariates,Zi.This point complicates “fixed time-window”versus“time-series analysis'”or the estimation of causal effects in this setting and we large-N versus large-T asymptotics to be consequential return to it below. to the conclusions we draw. Marginal Effects of Recent Treatments The Effect of a Treatment History As mentioned above,there are numerous possible For an individual country,the causal effect of a par- treatment histories to compare when estimating causal 4r元 ticular history of welfare spending,x:t,relative to effects.This can be daunting for applied researchers some other history of spending,x is the difference who may only be interested in the effects of the first Ya(:)-Ya).That is,it is the difference in the po- few lags of welfare spending.Furthermore,any partic- tential or counterfactual level of terrorism when the ular treatment history may not be well-represented in country follows history x1:minus the counterfactual the data if the number of time periods is moderate.To outcome when it follows historyx Given the number avoid these problems,we introduce causal quantities of possible treatment histories,there can be numerous that focus on recent values of treatment and average causal effects to investigate,even with a simple binary over more distant lags.We define the potential out- treatment.As the length of time under study grows,so comes just intervening on treatment the last j periods does the number of possible comparisons.In fact,with asYn(c-a)=Ya(X,u-j-l,x-a.This“marginal” a binary treatment,there are 2'different potential out- potential outcome represents the potential or counter- comes for the outcome in period t.This large number factual level of terrorism in country i if we let welfare of potential outcomes allows for a very large number of spending run its natural course up to t-j-1 and just comparisons and a host of causal questions:Does the set the last jlags of spending tox 5795.801g stability of spending over time matter for the impact With this definition in hand.we can define one on the incidence of terrorism?Is there a cumulative important quantity of interest,the contemporaneous impact of welfare spending or is it only the current level effect of treatment (CET)of Xit on Yit: that matters? These individual-level causal effects are difficult to c()=E[Ya(X.1:t-1,1)-Yt(X,1t-1,0)小, identify without strong assumptions,so we often focus on estimating the average causal effect of a treatment =E[Ya(1)-Ya(o], history (Robins,Greenland,and Hu 1999;Hernan, Brumback,and Robins 2001): Here we have switched from potential outcomes that depend on the entire history to potential outcomes that only depend on treatment in time t.The CET reflects t(xt,x)=E[Yi(x)-Yi(x)] (1) the effect of treatment in period t on the outcome in period t,averaging across all of the treatment histories eys Here,the expectations are over the units so that this up to period t.Thus,it would be the expected effect of quantity is the average difference in outcomes between switching a random country from low levels of welfare the world where all units had history x1:and the world spending to high levels in period t.A graphical depic- where all units had history xi For example,we might tion of a CET is presented in Figure 2,where the dot- be interested in the effect of a country having always ted arrow corresponds to component of the effect.It is high welfare spending versus a country always having common in pooled TSCS analyses to assume that this low spending levels.Thus,this quantity considers the effect is constant over time so that te(t)=c. effect of treatment at time t,but also the effect of all Researchers are also often interested in how more /:sony lagged values of the treatment as well.A graphical de- distant changes to treatment affect the outcome.Thus, piction of the pathways contained in t(xi:,x)is pre- sented in Figure 1,where the dotted arrows correspond See Shephard and Bojinov(2017)for a similar approach to defining to components of the effect.These arrows represent all recent effects in time-series data. 1069
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables is a tradeoff with this approach: we cannot study some common TSCS estimands like the long-run multiplier that are based on time-series analysis. We discuss this estimand in particular in the Supplemental Material. Given our focus on a fixed time window, we will define expectations over cross-sectional units and consider asymptotic properties of the estimators as the number of these units grows (rather than the length of the time series). Asymptotics are only useful in how they guide our analyses in the real world of finite samples, and we may worry that “large-N, fixed-T” asymptotic results do not provide a reliable approximation when N and T are roughly the same size, as is often the case for TSCS data. Fortunately, as we show in the simulation studies below, our analysis of the various TSCS estimators holds even when N and T are small and close in size. Thus, we do not see the choices of “fixed time-window” versus “time-series analysis” or large-N versus large-T asymptotics to be consequential to the conclusions we draw. The Effect of a Treatment History For an individual country, the causal effect of a particular history of welfare spending, x1: t, relative to some other history of spending, x 1:t , is the difference Yit(x1:t) − Yit(x 1:t). That is, it is the difference in the potential or counterfactual level of terrorism when the country follows history x1: t minus the counterfactual outcome when it follows history x 1:t . Given the number of possible treatment histories, there can be numerous causal effects to investigate, even with a simple binary treatment. As the length of time under study grows, so does the number of possible comparisons. In fact, with a binary treatment, there are 2t different potential outcomes for the outcome in period t. This large number of potential outcomes allows for a very large number of comparisons and a host of causal questions: Does the stability of spending over time matter for the impact on the incidence of terrorism? Is there a cumulative impact of welfare spending or is it only the current level that matters? These individual-level causal effects are difficult to identify without strong assumptions, so we often focus on estimating the average causal effect of a treatment history (Robins, Greenland, and Hu 1999; Hernán, Brumback, and Robins 2001): τ (x1:t, x 1:t) = E[Yit(x1:t) − Yit(x 1:t)]. (1) Here, the expectations are over the units so that this quantity is the average difference in outcomes between the world where all units had history x1: t and the world where all units had history x 1:t . For example, we might be interested in the effect of a country having always high welfare spending versus a country always having low spending levels. Thus, this quantity considers the effect of treatment at time t, but also the effect of all lagged values of the treatment as well. A graphical depiction of the pathways contained in τ (x1:t, x 1:t) is presented in Figure 1, where the dotted arrows correspond to components of the effect. These arrows represent all FIGURE 1. Directed acyclic graph (DAG) of typical TSCS data. Dotted lines are the causal pathways that constitute the average causal effect of a treatment history at time t. ··· ··· ··· ··· Xt−1 Zt−1 Yt−1 Xt Zt Yt of the effects of Xit, Xi, t − 1, Xi, t − 2, and so on, that end up at Yit. Note that many of these effects flow through the time-varying covariates, Zit. This point complicates the estimation of causal effects in this setting and we return to it below. Marginal Effects of Recent Treatments As mentioned above, there are numerous possible treatment histories to compare when estimating causal effects. This can be daunting for applied researchers who may only be interested in the effects of the first few lags of welfare spending. Furthermore, any particular treatment history may not be well-represented in the data if the number of time periods is moderate. To avoid these problems, we introduce causal quantities that focus on recent values of treatment and average over more distant lags. We define the potential outcomes just intervening on treatment the last j periods as Yit(xt − j:t) = Yit(Xi, 1:t − j − 1, xt − j:t). This “marginal” potential outcome represents the potential or counterfactual level of terrorism in country i if we let welfare spending run its natural course up to t − j − 1 and just set the last j lags of spending to xt − j:t. 3 With this definition in hand, we can define one important quantity of interest, the contemporaneous effect of treatment (CET) of Xit on Yit: τc(t) = E[Yit(Xi,1: t−1, 1) − Yit(Xi,1: t−1, 0)], = E[Yit(1) − Yit(0)], Here we have switched from potential outcomes that depend on the entire history to potential outcomes that only depend on treatment in time t. The CET reflects the effect of treatment in period t on the outcome in period t, averaging across all of the treatment histories up to period t. Thus, it would be the expected effect of switching a random country from low levels of welfare spending to high levels in period t. A graphical depiction of a CET is presented in Figure 2, where the dotted arrow corresponds to component of the effect. It is common in pooled TSCS analyses to assume that this effect is constant over time so that τ c(t) = τ c. Researchers are also often interested in how more distant changes to treatment affect the outcome. Thus, 3 See Shephard and Bojinov (2017) for a similar approach to defining recent effects in time-series data. 1069 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357
Matthew Blackwell and Adam N.Glynn responses for each pair of periods.As we discuss next, FIGURE 2.DAG of a TSCS setting where the traditional modeling of TSCS data imposes restrictions dotted line represents the contemporaneous on the data-generating processes,in part,to summarize effect of treatment at time t. this large number of effects with a few parameters. XI-1 Relationship to Traditional TSCS Models The potential outcomes and causal effects defined above are completely nonparametric in the sense that they impose no restrictions on the distribution of Ya. To situate these quantities in the TSCS literature,it is helpful to see how they are parameterized in a par- ticular TSCS model.One general model that encom- FIGURE 3. DAG of a panel setting where the passes many different possible specifications is an ADL dotted lines represent the paths that model: constitute the lagged effect of treatment at time t-1 on the outcome at time t. Yit Bo aYi.t-1+B1Xi+B2Xi.t-1+8ir, (4) 七 where si are independent and identically distributed -1 errors,independent of Xis for all t and s.The key fea- tures of such a model are the presence of lagged inde- pendent and dependent variables and the exogeneity of the independent variables.This model for the out- 4号元 come would imply the following form for the potential outcomes: we define the lagged effect of treatment,which is the Yi(x1:1)=Bo aYi.t-1(x1:1-1)+B1xr +B2x1-1+8it. marginal effect of treatment in time t-1 on the (5) outcome in time t,holding treatment at time t fixed: In this form,it is clear to see what TSCS scholars have E[Yir(1,0)-Yir(0,0)].More generally,the j-step lagged long pointed out:causal effects are complicated with effect is defined as follows: lagged dependent variables(LDVs)since a change in x-1 can have both a direct effect on Yir and an indi- (t,j)=E[Y(X,1t--1,1,0i)-Ya(X1-i-1,0,0小, rect effect through Yi.1.This is why even seemingly simple TSCS models such as the ADLimply quite com- =E[Ya(1,0j)-Ym(0+11, (2) plicated expressions for long-run effects. The ADL model also has implications for the various where 0,is a vector of s zero values.For example,the causal quantities,both short term and long term.The two-step lagged effect would be E[Yi(1,0,0)-Yi(0, coefficient on the contemporaneous treatment,B1,is 0,0)]and represents the effect of welfare spending two constant over time and does not depend on past values years ago on terrorism today holding the intervening of the treatment,so it is equal to the CET,re(r)=B1. welfare spending fixed at low levels.A graphical One can derive the lagged effects from different com- depiction of the one-step lagged effect is presented in binations of a,B1,and B2: Figure 3,where again the dotted arrows correspond to components of the effect.These effects are similar to (L,0)=B1, (6 a common quantity of interest in both time-series and TSCS applications called the impulse response (Box, t,1)=aB1+P2, (7 Jenkins,and Reinsel 2013). (L,2)=a2B1+a2 (8) Another common quantity of interest in the TSCS literature is the step response,which is the cumulative effect of a permanent shift in treatment status on some Note that these lagged effects are constant across t.The future outcome (Box,Jenkins,and Reinsel 2013:Beck step response,on the other hand,has a stronger im- and Katz 2011).The step response function,or SRF, pact because it accumulates the impulse responses over describes how this effect varies by time period and dis- time: tance between the shift and the outcome: t(t,0)=B1, (9) t(,j)=E[Ya(1)-Ya(0)小, (3) t(t,1)=f1+aB+2, (10) where 1,has a similar definition to 0.Thus,(t,j) x,(L,2)=B1+ax5+2+a2B1+af2. (11) is the effect of j periods of treatment starting at time t-j on the outcome at time t.Without further as- 4 For introductions to modeling choices for TSCS data in political sumptions,there are separate lagged effects and step science,see De Boef and Keele (2008)and Beck and Katz(2011). 1070
Matthew Blackwell and Adam N. Glynn FIGURE 2. DAG of a TSCS setting where the dotted line represents the contemporaneous effect of treatment at time t. ··· ··· ··· ··· Xt−1 Zt−1 Yt−1 Xt Zt Yt FIGURE 3. DAG of a panel setting where the dotted lines represent the paths that constitute the lagged effect of treatment at time t − 1 on the outcome at time t. ··· ··· ··· ··· Xt−1 Zt−1 Yt−1 Xt Zt Yt we define the lagged effect of treatment, which is the marginal effect of treatment in time t − 1 on the outcome in time t, holding treatment at time t fixed: E[Yit(1, 0) − Yit(0, 0)].More generally, the j-step lagged effect is defined as follows: τl(t, j) = E[Yit(Xi,1: t−j−1, 1, 0j) − Yit(Xi,1: t−j−1, 0, 0j)], = E[Yit(1, 0j) − Yit(0j+1 )], (2) where 0s is a vector of s zero values. For example, the two-step lagged effect would be E[Yit(1, 0, 0) − Yit(0, 0, 0)] and represents the effect of welfare spending two years ago on terrorism today holding the intervening welfare spending fixed at low levels. A graphical depiction of the one-step lagged effect is presented in Figure 3, where again the dotted arrows correspond to components of the effect. These effects are similar to a common quantity of interest in both time-series and TSCS applications called the impulse response (Box, Jenkins, and Reinsel 2013). Another common quantity of interest in the TSCS literature is the step response, which is the cumulative effect of a permanent shift in treatment status on some future outcome (Box, Jenkins, and Reinsel 2013; Beck and Katz 2011). The step response function, or SRF, describes how this effect varies by time period and distance between the shift and the outcome: τs(t, j) = E[Yit(1j) − Yit(0j)], (3) where 1s has a similar definition to 0s. Thus, τ s(t, j) is the effect of j periods of treatment starting at time t − j on the outcome at time t. Without further assumptions, there are separate lagged effects and step responses for each pair of periods. As we discuss next, traditional modeling of TSCS data imposes restrictions on the data-generating processes, in part, to summarize this large number of effects with a few parameters. Relationship to Traditional TSCS Models The potential outcomes and causal effects defined above are completely nonparametric in the sense that they impose no restrictions on the distribution of Yit. To situate these quantities in the TSCS literature, it is helpful to see how they are parameterized in a particular TSCS model. One general model that encompasses many different possible specifications is an ADL model:4 Yit = β0 + αYi,t−1 + β1Xit + β2Xi,t−1 + εit, (4) where εit are independent and identically distributed errors, independent of Xis for all t and s. The key features of such a model are the presence of lagged independent and dependent variables and the exogeneity of the independent variables. This model for the outcome would imply the following form for the potential outcomes: Yit(x1: t) = β0 + αYi,t−1(x1: t−1 ) + β1xt + β2xt−1 + εit . (5) In this form, it is clear to see what TSCS scholars have long pointed out: causal effects are complicated with lagged dependent variables (LDVs) since a change in xt − 1 can have both a direct effect on Yit and an indirect effect through Yi, t − 1. This is why even seemingly simple TSCS models such as the ADL imply quite complicated expressions for long-run effects. The ADL model also has implications for the various causal quantities, both short term and long term. The coefficient on the contemporaneous treatment, β1, is constant over time and does not depend on past values of the treatment, so it is equal to the CET, τ c(t) = β1. One can derive the lagged effects from different combinations of α, β1, and β2: τl(t, 0) = β1, (6) τl(t, 1) = αβ1 + β2, (7) τl(t, 2) = α2 β1 + αβ2. (8) Note that these lagged effects are constant acrosst. The step response, on the other hand, has a stronger impact because it accumulates the impulse responses over time: τs(t, 0) = β1, (9) τs(t, 1) = β1 + αβ1 + β2, (10) τs(t, 2) = β1 + αβ1 + β2 + α2 β1 + αβ2. (11) 4 For introductions to modeling choices for TSCS data in political science, see De Boef and Keele (2008) and Beck and Katz (2011). 1070 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables Note that the step response here is just the sum of all with no autoregressive component: previous lagged effects.It is clear that one benefit of such a TSCS model is to summarize a broad set of es- Yit Bo+B1Xit+B2Xit-1+ni (13) timands with just a few parameters.This helps to sim- plify the complexity of the TSCS setting while intro- ducing the possibility of bias if this model is incorrect or Here,baseline randomization of the treatment history, combined with the assumptions implicit in linear TSCS misspecified. models,implies the usual identifying assumption in these models,strict exogeneity of the errors: CAUSAL ASSUMPTIONS AND DESIGNS IN TSCS DATA E[nitlXi.1:T]E[ni]=0. (14) Under what assumptions are the above causal quan- tities identified?When we have repeated measure- This is a mean independence assumption about the re- ments on the outcome-treatment relationship,there lationship between the errors,nit,and the treatment his- are a number of assumptions we could invoke to iden- tory,Xi.1:T- tify causal effects.In this section,we discuss several of these assumptions.We focus on cross-sectional as- Sequentially Randomized Treatments sumptions given our fixed time-window approach.That is,we make no assumptions on the time-series pro- Beginning with Robins (1986),scholars in epidemiol- cesses such as stationarity even though imposing these ogy have expanded the potential outcomes framework types of assumptions will not materially affect our con- to handle weaker identifying assumptions than base- clusions about the bias of traditional TSCS methods line randomization.These innovations centered on se This result is confirmed in the simulations below.where quentially randomized experiments,where at each pe- the data generating process is stationary and the biases riod,Xi was randomized conditional on the past values 4号元 we describe below still occur. of the treatment and time-varying covariates (includ- ing past values of the outcome).Under this sequential Baseline Randomized Treatments ignorability assumption,the treatment is randomly as- signed not at the beginning of the process,but at each A powerful,if rare,research design for TSCS data is point in time and can be affected by the past values of one that randomly assigns the entire history of treat- the covariates and the outcome. ment,Xi:r,at time t=0.Under this assumption,treat- At its core,sequential ignorability assumes there is ment at time t cannot be affected by,say,previous val- some function or subset of the observed history up to ues of the outcome or time-varying covariates.In terms time t,Vir =g(Xi.17-1,Yi.1,Zi.1:),that is sufficient of potential outcomes,the baseline randomized treat to satisfy no unmeasured confounders for the effect of ment history assumption is Xit on future outcomes.Formally,the assumption states that,conditional on this set of variables,Vit,the treat- {Y(x1:):t=1,,T}LX.1lZ0 12) ment at time t is independent of the potential outcomes at time t: where A1LBC is defined as"A is independent of B Assumption 1 (Sequential Ignorability).For every conditional on C."This assumes that the entire history treatment history x:T and period t, of welfare spending is independent of all potential lev- els of terrorism,possibly conditional on baseline (that (Yis(xis)s=t,,....T]I Xivit. (15) is,time-invariant)covariates.Hernan.Brumback,and Robins (2001)called Xi.1:causally exogenous under For example,a researcher might assume that sequen- this assumption.The lack of time-varying covariates or tial ignorability for current welfare spending holds con- past values of Yit on the right-hand side of the condi- ditional on lagged levels of terrorism,lagged welfare tioning bar in Equation (12)implies that these vari- spending,and some contemporaneous covariates,so ables do not confound the relationship between the that Va =(Yi.-1 Xi.-1,Za).Unlike baseline ran- treatment and the outcome.For example,this assumes domization and strict exogeneity,it allows for observed there are no time-varying covariates that affect both time-varying covariates like conflict status and lagged welfare spending and the number of terrorist incidents. values of terrorism to confound the relationship be- Thus,baseline randomization relies on strong assump- tween welfare spending and current terrorism levels tions that are rarely satisfied outside of randomized so long as we have measures of these confounders.Fur- experiments and is unsuitable for most observational thermore,these time-varying covariates can be affected TSCS studies. by past values of welfare spending. Baseline randomization is closely related to exo- In the context of traditional linear TSCS models such L geneity assumptions in linear TSCS models.For exam- as Equation (4),with their implicit assumptions,se- ple,suppose we had the following distributed lag model quential ignorability implies the sequential exogeneity assumption: A notable exception are experiments with a panel design that ran- domize rollout of a treatment(e.g.,Gerber et al.2011). E[sitXi.1:t,Zi.1:t,Yi.1:t-1]E[sirlXit,Vit]=0. (16) 1071
How to Make Causal Inferences with Time-Series Cross-Sectional Data under Selection on Observables Note that the step response here is just the sum of all previous lagged effects. It is clear that one benefit of such a TSCS model is to summarize a broad set of estimands with just a few parameters. This helps to simplify the complexity of the TSCS setting while introducing the possibility of bias if this model is incorrect or misspecified. CAUSAL ASSUMPTIONS AND DESIGNS IN TSCS DATA Under what assumptions are the above causal quantities identified? When we have repeated measurements on the outcome-treatment relationship, there are a number of assumptions we could invoke to identify causal effects. In this section, we discuss several of these assumptions. We focus on cross-sectional assumptions given our fixed time-window approach.That is, we make no assumptions on the time-series processes such as stationarity even though imposing these types of assumptions will not materially affect our conclusions about the bias of traditional TSCS methods. This result is confirmed in the simulations below, where the data generating process is stationary and the biases we describe below still occur. Baseline Randomized Treatments A powerful, if rare, research design for TSCS data is one that randomly assigns the entire history of treatment, X1: T, at time t = 0. Under this assumption, treatment at time t cannot be affected by, say, previous values of the outcome or time-varying covariates. In terms of potential outcomes, the baseline randomized treatment history assumption is {Yit(x1: t) : t = 1,...,T} ⊥⊥ Xi,1: t|Zi0, (12) where A⊥⊥B|C is defined as “A is independent of B conditional on C.” This assumes that the entire history of welfare spending is independent of all potential levels of terrorism, possibly conditional on baseline (that is, time-invariant) covariates. Hernán, Brumback, and Robins (2001) called Xi, 1: t causally exogenous under this assumption. The lack of time-varying covariates or past values of Yit on the right-hand side of the conditioning bar in Equation (12) implies that these variables do not confound the relationship between the treatment and the outcome. For example, this assumes there are no time-varying covariates that affect both welfare spending and the number of terrorist incidents. Thus, baseline randomization relies on strong assumptions that are rarely satisfied outside of randomized experiments and is unsuitable for most observational TSCS studies.5 Baseline randomization is closely related to exogeneity assumptions in linear TSCS models. For example, suppose we had the following distributed lag model 5 A notable exception are experiments with a panel design that randomize rollout of a treatment (e.g., Gerber et al. 2011). with no autoregressive component: Yit = β0 + β1Xit + β2Xi,t−1 + ηit . (13) Here, baseline randomization of the treatment history, combined with the assumptions implicit in linear TSCS models, implies the usual identifying assumption in these models, strict exogeneity of the errors: E[ηit|Xi,1: T ] = E[ηit] = 0. (14) This is a mean independence assumption about the relationship between the errors,ηit, and the treatment history, Xi, 1: T. Sequentially Randomized Treatments Beginning with Robins (1986), scholars in epidemiology have expanded the potential outcomes framework to handle weaker identifying assumptions than baseline randomization. These innovations centered on sequentially randomized experiments, where at each period,Xit was randomized conditional on the past values of the treatment and time-varying covariates (including past values of the outcome). Under this sequential ignorability assumption, the treatment is randomly assigned not at the beginning of the process, but at each point in time and can be affected by the past values of the covariates and the outcome. At its core, sequential ignorability assumes there is some function or subset of the observed history up to time t, Vit = g(Xi, 1:t − 1, Yi, 1:t − 1, Zi, 1: t), that is sufficient to satisfy no unmeasured confounders for the effect of Xit on future outcomes. Formally, the assumption states that, conditional on this set of variables, Vit, the treatment at time t is independent of the potential outcomes at time t: Assumption 1 (Sequential Ignorability). For every treatment history x1: T and period t, {Yis(x1:s) : s = t,,...,T} ⊥⊥ Xit|Vit . (15) For example, a researcher might assume that sequential ignorability for current welfare spending holds conditional on lagged levels of terrorism, lagged welfare spending, and some contemporaneous covariates, so that Vit = {Yi, t − 1, Xi, t − 1, Zit}. Unlike baseline randomization and strict exogeneity,it allows for observed time-varying covariates like conflict status and lagged values of terrorism to confound the relationship between welfare spending and current terrorism levels, so long as we have measures of these confounders. Furthermore, these time-varying covariates can be affected by past values of welfare spending. In the context of traditional linear TSCS models such as Equation (4), with their implicit assumptions, sequential ignorability implies the sequential exogeneity assumption: E[εit|Xi,1: t, Zi,1: t,Yi,1: t−1] = E[εit|Xit,Vit] = 0. (16) 1071 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000357