American Political Science Review (2018)112,4,1083-1089 doi:10.1017/S0003055418000254 American Political Science Association 2018 Letter Are Human Rights Practices Improving? DAVID CINGRANELLI Binghamton University MIKHAIL FILIPPOV Binghamton University as government protection of human rights improved?The answer to this and many other research questions is strongly affected by the assumptions we make and the modeling strategy we choose as the basis for creating human rights country scores.Fariss(2014)introduced a statistical model that produced latent scores showing an improving trend in human rights.Consistent with his stringent assumptions,his statistical model heavily weighted rare incidents of mass killings such as genocide,while discounting indicators oflesser and more common violations such as torture and political imprisonment We replicated his analysis,replacing the actual values of all indicators of lesser human rights violations with randomly generated data,and obtained an identical improving trend.However,when we replicated the analysis,relaxing his assumptions by allowing all indicators to potentially have a similar effect on the latent scores,we find no human rights improvement. cience is advanced by a community of investiga- already challenged previous results on the effects of tors who often disagree about explanations for human rights treaty ratification (but see Cingranelli important phenomena.Sometimes disagreements and Filippov 2018).Future studies using his scores are over fundamental conceptual and theoretical issues are likely to produce many findings that conflict with pre- 4r元 so deep that researchers reach an impasse.In statistical vious results. analysis,they may disagree over what evidence should Fariss's model specification and results were strongly be used or how various types of indicators should be affected by his assumptions.He assumes that: weighted to measure important concepts.In that case, depending upon the indicators each scientist empha- Mass killing events (such as genocide)and lesser sizes,conflicting findings persist. human rights violations are indicators of the same Scholars and policymakers in the human rights sub- underlying variable-respect for physical integrity field are now facing a significant disagreement.Those human rights. emphasizing distinctive types of evidence are reaching Incidents of mass killing are recorded more accu- different conclusions about trends in human rights and rately than lesser violations. about a variety of other questions relevant to scholar Since there has been a substantial decline in the ship and policy making.Fariss(2014)suggested a novel records of mass killings(Figure 1),other indicators statistical approach to reevaluate human rights indica- of human rights also should reflect an improving tors.The new measure he introduced gave rise to fur- trend. 575.1018 ther debate.Most importantly,his new scores showing If they do not,it is because of the "changing stan- improving global trends in human rights are at odds dards of accountability"(SI varies)in human rights with trends in previously used measures(Cingranelli reports of lesser human rights violations. and Richards 2010;Wood and Gibney 2010). There has been no change in the "standards of ac- Fariss's new scores encourage scholars to reexamine countability"(S2 constant)in records of mass many research findings accumulated by the subfield. killings. Since his scores,on average,increase over time,they Indicators of lesser human rights violations that do are likely to be correlated with many variables that also not reflect an improving trend should be corrected increase over time such as treaty ratifications,degree of to remove distortion due to the difference in the globalization,the degree of economic inequality within changes in the standards of accountability(S1-S2) nations,and democratization.Fariss (2014.2018)has between the two types of records David Cingranelli is a Professor of Political Science.Binghamton The crucial assumption is that there has been no University,State University of New York,Vestal Parkway East,Bing- change in the standards of accountability in records of hamton.NY 13902-6000.USA(davidc@binghamton.edu) Mikhail Filippov is an Associate Professor of Political Sci. mass killings.An alternative,less restrictive assump- ence,Binghamton University,State University of New York. tion is that there also has been a change in the stan- Vestal Parkway East,Binghamton, NY13902-6000. USA dards for recording mass killings(S2 varies).However, (Mikhail.filippov@gmail.com) a model based on this assumption would produce an The authors thank Rodwan Abouharb.Sabine Carey.David Davis estimation of no improvement in human rights latent Peter Haschke,Neil Mitchell,and David Richards for their help- ful comments on earlier versions of this paper.Replication files scores,because it would assign lesser weights to indica- are available at the American Political Science Review Dataverse: tors of mass killing. https://doi.org/10.7910/DVN/KGVBNC. Fariss's assumptions,and,crucially,that S2 does Received:February 2,2017;revised:December 17 2017:accepted: not vary,led him to use a statistical technique that April 27 2018.First published online:June 13,2018. heavily weights rare "event-based"incidents of mass 1083
American Political Science Review (2018) 112, 4, 1083–1089 doi:10.1017/S0003055418000254 © American Political Science Association 2018 Letter Are Human Rights Practices Improving? DAVID CINGRANELLI Binghamton University MIKHAIL FILIPPOV Binghamton University Has government protection of human rights improved? The answer to this and many other research questions is strongly affected by the assumptions we make and the modeling strategy we choose as the basis for creating human rights country scores. Fariss (2014) introduced a statistical model that produced latent scores showing an improving trend in human rights. Consistent with his stringent assumptions, his statistical model heavily weighted rare incidents of mass killings such as genocide, while discounting indicators of lesser and more common violations such as torture and political imprisonment. We replicated his analysis, replacing the actual values of all indicators of lesser human rights violations with randomly generated data, and obtained an identical improving trend. However, when we replicated the analysis, relaxing his assumptions by allowing all indicators to potentially have a similar effect on the latent scores, we find no human rights improvement. Science is advanced by a community of investigators who often disagree about explanations for important phenomena. Sometimes disagreements over fundamental conceptual and theoretical issues are so deep that researchers reach an impasse. In statistical analysis, they may disagree over what evidence should be used or how various types of indicators should be weighted to measure important concepts. In that case, depending upon the indicators each scientist emphasizes, conflicting findings persist. Scholars and policymakers in the human rights subfield are now facing a significant disagreement. Those emphasizing distinctive types of evidence are reaching different conclusions about trends in human rights and about a variety of other questions relevant to scholarship and policy making. Fariss (2014) suggested a novel statistical approach to reevaluate human rights indicators. The new measure he introduced gave rise to further debate. Most importantly, his new scores showing improving global trends in human rights are at odds with trends in previously used measures (Cingranelli and Richards 2010; Wood and Gibney 2010). Fariss’s new scores encourage scholars to reexamine many research findings accumulated by the subfield. Since his scores, on average, increase over time, they are likely to be correlated with many variables that also increase over time such as treaty ratifications, degree of globalization, the degree of economic inequality within nations, and democratization. Fariss (2014, 2018) has David Cingranelli is a Professor of Political Science, Binghamton University, State University of New York,Vestal Parkway East, Binghamton, NY 13902-6000, USA (davidc@binghamton.edu). Mikhail Filippov is an Associate Professor of Political Science, Binghamton University, State University of New York, Vestal Parkway East, Binghamton, NY 13902-6000, USA (Mikhail.filippov@gmail.com). The authors thank Rodwan Abouharb, Sabine Carey, David Davis, Peter Haschke, Neil Mitchell, and David Richards for their helpful comments on earlier versions of this paper. Replication files are available at the American Political Science Review Dataverse: https://doi.org/10.7910/DVN/KGVBNC. Received: February 2, 2017; revised: December 17, 2017; accepted: April 27, 2018. First published online: June 13, 2018. already challenged previous results on the effects of human rights treaty ratification (but see Cingranelli and Filippov 2018). Future studies using his scores are likely to produce many findings that conflict with previous results. Fariss’s model specification and results were strongly affected by his assumptions. He assumes that: Mass killing events (such as genocide) and lesser human rights violations are indicators of the same underlying variable—respect for physical integrity human rights. Incidents of mass killing are recorded more accurately than lesser violations. Since there has been a substantial decline in the records of mass killings (Figure 1), other indicators of human rights also should reflect an improving trend. If they do not, it is because of the “changing standards of accountability” (S1 varies) in human rights reports of lesser human rights violations. There has been no change in the “standards of accountability” (S2 = constant) in records of mass killings. Indicators of lesser human rights violations that do not reflect an improving trend should be corrected to remove distortion due to the difference in the changes in the standards of accountability (S1 – S2) between the two types of records. The crucial assumption is that there has been no change in the standards of accountability in records of mass killings. An alternative, less restrictive assumption is that there also has been a change in the standards for recording mass killings (S2 varies). However, a model based on this assumption would produce an estimation of no improvement in human rights latent scores, because it would assign lesser weights to indicators of mass killing. Fariss’s assumptions, and, crucially, that S2 does not vary, led him to use a statistical technique that heavily weights rare “event-based” incidents of mass 1083 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000254
David Cingranelli and Mikhail Filippov FIGURE 1. A Comparison of the Trends in Mass Killing Events and in Fariss's Latent Scores, 1949-2010 Proportion of Countries with no Mass Klillings 20 26.0 20.0 1950 1960 1970 1980 1990 2000 2010 Fariss's Latent Scores TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 1950195419581962196619701974197819821986199019941998200220062010 YEAR killing such as genocide,discounting"standards-based Before discussing the substantive consequences of indicators"of"lesser"and more common human rights modeling decisions as they relate to human rights,it is violations such as torture and political imprisonment. illustrative to consider one of the most famous similar The model weighted mass killings so heavily that the divides in political science.This was the debate over increase in the proportion of countries with no mass whether the power structure within US communities killings(beginning in the mid-1970s)closely mimics the was pyramidal(elitist)or horizontal (pluralist).Schol- pattern in Fariss's latent scores(Figure 1). ars who used the decisional method pioneered by Dahl Moreover,as shown in Figure 2,trends in latent (1961)discovered a power structure that was plural- scores produced only from records of mass killing ist.Dahl and his associates attended public meetings trends are hardly distinguishable from Fariss's latent in a particular city,recording who was influential and scores.A similar trend also can be generated from who was not.Others used the reputational method pi- mass killing events combined with random numbers oneered by Hunter(1953).They asked people in a com- substituted for the actual values of lesser violations munity to identify the most influential people who,ei- (Figure 3).Thus,the model he chose would have pro- ther formally or informally,swayed the outcomes of duced the appearance of an improving trend in hu- decisions.They found that community power structure man rights between 1950 and 2010 no matter what the was elitist.By 1980,research on this topic came to a halt records of lesser violations had been. with no satisfactory resolution of the disagreements. More generally,we show that alternative modeling Neither conclusion was right or wrong.Each was based choices have substantive consequences for answering on a different theoretical position and modeling strat- questions about human rights improvement and for the egy.The choice of modeling strategy determined the development of human rights theory and relevant pub- conclusions. lic policy.Any modeling strategy must assign weights The human rights subfield is engaged in a similar to different types of evidence.We do not claim that a debate over whether human rights are improving or "correct"model should treat all of the human rights declining by focusing on different types of evidence. indicators similarly.Rather,we emphasize that the con- Scholars analyzing annual reports of commonplace re- clusion one reaches about the pattern of human rights pressive government practices such as torture and po- improvement depends upon the specific weights as- litical imprisonment conclude that,in most countries, signed to two different types of evidence. governments continue to violate human rights.On the 1084
David Cingranelli and Mikhail Filippov FIGURE 1. A Comparison of the Trends in Mass Killing Events and in Fariss’s Latent Scores, 1949–2010 1950 1960 1970 1980 1990 2000 2010 0.70 0.80 0.90 1.00 Proportion of Countries with no Mass KIillings proportion of countries 0.0 0.4 0.8 Fariss's Latent Scores YEAR Latent score 1950 1954 1958 1962 1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 killing such as genocide, discounting “standards-based indicators” of “lesser” and more common human rights violations such as torture and political imprisonment. The model weighted mass killings so heavily that the increase in the proportion of countries with no mass killings (beginning in the mid-1970s) closely mimics the pattern in Fariss’s latent scores (Figure 1). Moreover, as shown in Figure 2, trends in latent scores produced only from records of mass killing trends are hardly distinguishable from Fariss’s latent scores. A similar trend also can be generated from mass killing events combined with random numbers substituted for the actual values of lesser violations (Figure 3). Thus, the model he chose would have produced the appearance of an improving trend in human rights between 1950 and 2010 no matter what the records of lesser violations had been. More generally, we show that alternative modeling choices have substantive consequences for answering questions about human rights improvement and for the development of human rights theory and relevant public policy. Any modeling strategy must assign weights to different types of evidence. We do not claim that a “correct” model should treat all of the human rights indicators similarly. Rather, we emphasize that the conclusion one reaches about the pattern of human rights improvement depends upon the specific weights assigned to two different types of evidence. Before discussing the substantive consequences of modeling decisions as they relate to human rights, it is illustrative to consider one of the most famous similar divides in political science. This was the debate over whether the power structure within US communities was pyramidal (elitist) or horizontal (pluralist). Scholars who used the decisional method pioneered by Dahl (1961) discovered a power structure that was pluralist. Dahl and his associates attended public meetings in a particular city, recording who was influential and who was not. Others used the reputational method pioneered by Hunter (1953).They asked people in a community to identify the most influential people who, either formally or informally, swayed the outcomes of decisions. They found that community power structure was elitist. By 1980, research on this topic came to a halt with no satisfactory resolution of the disagreements. Neither conclusion was right or wrong. Each was based on a different theoretical position and modeling strategy. The choice of modeling strategy determined the conclusions. The human rights subfield is engaged in a similar debate over whether human rights are improving or declining by focusing on different types of evidence. Scholars analyzing annual reports of commonplace repressive government practices such as torture and political imprisonment conclude that, in most countries, governments continue to violate human rights. On the 1084 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000254
Are Human Rights Practices Improving? FIGURE 2.A Comparison of the Trends of the Dynamic Latent Human Rights Estimates,1949-2010 Estimates from Random Values of ALL Lesser Human Rights Violations 一。中 8 Fariss's Scores 吕 8 1949195419591964196919741979198419891994199920042009 Year Black Line:Replication of Figure 3 in Fariss(2014,308). Blue Line:The values of ALL indicators of Lesser Human Rights violations are replaced by random numbers. 4号 FIGURE 3.A Comparison of Latent Human Rights Trends Estimated Using Only Indicators of Lesser Human Rights Violations or Only Mass Killing Indicators Estimates based on Estimates based on 8 indicators of Lesser Violations 5 indicators of Mass Killings g g g 8 8 Fariss's Estimates 197619821988199420002006 1949"1959"7969"7979"17989"1999"2009 Year Year other hand,scholars focusing on the decline in mass to Fariss,these scores do not accurately record changes killings could conclude that human rights violations are in human rights over time.Fariss presents the prob- becoming less common.Like the previous debate over lem as a "changing standard of accountability."Human the structure of power in US communities,neither po- rights scores may be inconsistent over time,because: sition is right or wrong.The two types of evidence are (a)human rights reports have gotten longer,and more conceptually different. information may have led coders to make more nega- tive assessments of human rights practices;(b)coders A NEW VERSION OF DYNAMIC IRT may have applied more stringent standards in more re- cent years;and(c)there may be new types of critiques The two most commonly used measures of human included in more recent reports (Clark and Sikkink rights are the Political Terror Scale(PTS)and the CIRI 2013:Fariss 2014:Hafner-Burton and Ron 2009).For Physical Integrity Index.Both data projects assign nu- counterarguments and contrary evidence,see Richards merical scores to countries based on information in- (2016)and Haschke and Gibney (2017). cluded in annual reports produced by the US Depart- Fariss suggested that possible biases in human rights ment of State and Amnesty International.According data could be identified and corrected by estimating a 1085
Are Human Rights Practices Improving? FIGURE 2. A Comparison of the Trends of the Dynamic Latent Human Rights Estimates, 1949–2010 Black Line: Replication of Figure 3 in Fariss (2014, 308). Blue Line: The values of ALL indicators of Lesser Human Rights violations are replaced by random numbers. FIGURE 3. A Comparison of Latent Human Rights Trends Estimated Using Only Indicators of Lesser Human Rights Violations or Only Mass Killing Indicators -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Year Latent score 1976 1982 1988 1994 2000 2006 Estimates based on 8 indicators of Lesser Violations -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Year Latent score 1949 1959 1969 1979 1989 1999 2009 Fariss's Estimates -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Year Latent score 1949 1959 1969 1979 1989 1999 2009 Estimates based on 5 indicators of Mass Killings other hand, scholars focusing on the decline in mass killings could conclude that human rights violations are becoming less common. Like the previous debate over the structure of power in US communities, neither position is right or wrong. The two types of evidence are conceptually different. A NEW VERSION OF DYNAMIC IRT The two most commonly used measures of human rights are the Political Terror Scale (PTS) and the CIRI Physical Integrity Index. Both data projects assign numerical scores to countries based on information included in annual reports produced by the US Department of State and Amnesty International. According to Fariss, these scores do not accurately record changes in human rights over time. Fariss presents the problem as a “changing standard of accountability.” Human rights scores may be inconsistent over time, because: (a) human rights reports have gotten longer, and more information may have led coders to make more negative assessments of human rights practices; (b) coders may have applied more stringent standards in more recent years; and (c) there may be new types of critiques included in more recent reports (Clark and Sikkink 2013; Fariss 2014; Hafner-Burton and Ron 2009). For counterarguments and contrary evidence, see Richards (2016) and Haschke and Gibney (2017). Fariss suggested that possible biases in human rights data could be identified and corrected by estimating a 1085 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000254
David Cingranelli and Mikhail Filippov FIGURE 4.Fariss's Scores(unfilled circles)Compared to Latent Scores Estimated Using Only Five Indicators of Mass Killing,1949-2010 -0001v00006/AL0LoL South Africa 194g"1T8"1967"197"1985"18"2003 "19761985199200 1949"158"1967"1971985"19"2005 Year Yes Russia ? 195167"17指"185"19"200 9158"16"1718199w2005 941196"17195"199”2005 Year Year Ethiopia Argentina 1960"1968"1976"1984"192"2006"200 194"19"196"1"71985"799"2003 91958"196"197"785"7990"2003 Year 上二 latent index of human rights abuses.The assumption Figure 4 illustrates that the trends in Fariss's latent behind such an index is that,while the true level of scores for many specific countries also are similar to the human rights abuses is latent (i.e.,unobserved),it is trends in latent scores produced only from records of correlated with observable indicators of human rights. mass killing.For more country examples,see our online Various statistical techniques,ranging from factor anal- appendix(Figures A.1.1-A.1.4). ysis to IRT,would allow one to estimate a latent index based on observable values of several available indica- tors.Dynamic versions of IRT assume that criteria for MODELING CHOICE MATTERS recording the indicators could change over time.Fariss (2014)introduced a unique version of IRT to estimate The customary way to use IRT is to treat all ob- latent human rights scores. servable indicators similarly in their relationship with Combining two types of indicators,giving some the latent variable.This is how dynamic latent IRT weight to each type,should produce latent scores some- models were used previously (e.g.,Martin and Quinn where between the scores obtained when using each 2002;Schnakenberg and Fariss 2014;Wang,Berger,and type separately.However,in Fariss's specification,that Burdick 2013).As applied to the debate in human is not the case.When we replicated his analysis(using rights,this approach would test the argument about the Fariss's computer code)we found that the reported up- changing standards of accountability in human rights ward trend in human rights depended almost entirely records by assuming that potentially all indicators are on the inclusion of the mass killing indicators.No in- subject to such changes.Thus,they all could have vari- dicators of lesser human rights violations were neces- able intercepts.This approach leaves the possibility of sary.We replicated the analysis replacing the actual val- a fixed intercept to be endogenously generated in the ues of all indicators of lesser human rights violations estimation. with randomly generated data and obtained an identi- When Fariss's model combines the two types of data cal trend(Figure 2).When we repeated Fariss's com- in a single estimator,it treats the two groups of indica- putations using only the five indicators of mass killing, tors differently(Fariss 2014,305-306).It sets the mass again we obtained latent scores that show an improving killing indicators to follow a logistic regression with a trend(Figure 3)similar to the trend reported by Fariss fixed intercept(cut point)but allows the indicators of eys (2014,308).In contrast,when we replicated the analy- lesser human rights violations to follow an ordered lo- sis including only indicators of lesser human rights vi- gistic regression with variable intercepts for every year. olations,there is no upward trend in the calculated la- Thus,the latent variable has to fit actual observations tent scores(Figure 3).A simple (bivariate)OLS regres- of the mass killing indicators without allowing a pos- sion shows that scores generated using only records of sible adjustment to the intercepts.With the indicators mass killing can explain 88%of the variation in Fariss's of lesser human rights violations,on the contrary,it scores. is much easier for the algorithm to fit the latent vari- able as there are several dozen additional parameters (time specific intercepts)that could also adjust.Conse- We generated the random numbers by several distinct algorithms. quently,variation in the mass killing indicators gener- All methods produced similar improving trends.See online appendix. ates the improving trend of the latent variable.This is 1086
David Cingranelli and Mikhail Filippov FIGURE 4. Fariss’s Scores (unfilled circles) Compared to Latent Scores Estimated Using Only Five Indicators of Mass Killing, 1949–2010 -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 South Africa -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 United States -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 Iraq -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 Russia -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 China -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 India -3 -1 1 3 Year Latent scores 1960 1968 1976 1984 1992 2000 2008 Nigeria -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 Ethiopia -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 -3 -1 1 3 Year Latent scores 1949 1958 1967 1976 1985 1994 2003 Argentina latent index of human rights abuses. The assumption behind such an index is that, while the true level of human rights abuses is latent (i.e., unobserved), it is correlated with observable indicators of human rights. Various statistical techniques, ranging from factor analysis to IRT, would allow one to estimate a latent index based on observable values of several available indicators. Dynamic versions of IRT assume that criteria for recording the indicators could change over time. Fariss (2014) introduced a unique version of IRT to estimate latent human rights scores. Combining two types of indicators, giving some weight to each type, should produce latent scores somewhere between the scores obtained when using each type separately. However, in Fariss’s specification, that is not the case. When we replicated his analysis (using Fariss’s computer code) we found that the reported upward trend in human rights depended almost entirely on the inclusion of the mass killing indicators. No indicators of lesser human rights violations were necessary.We replicated the analysis replacing the actual values of all indicators of lesser human rights violations with randomly generated data and obtained an identical trend (Figure 2).1 When we repeated Fariss’s computations using only the five indicators of mass killing, again we obtained latent scores that show an improving trend (Figure 3) similar to the trend reported by Fariss (2014, 308). In contrast, when we replicated the analysis including only indicators of lesser human rights violations, there is no upward trend in the calculated latent scores (Figure 3).A simple (bivariate) OLS regression shows that scores generated using only records of mass killing can explain 88% of the variation in Fariss’s scores. 1 We generated the random numbers by several distinct algorithms. All methods produced similar improving trends. See online appendix. Figure 4 illustrates that the trends in Fariss’s latent scores for many specific countries also are similar to the trends in latent scores produced only from records of mass killing. For more country examples, see our online appendix (Figures A.1.1– A.1.4). MODELING CHOICE MATTERS The customary way to use IRT is to treat all observable indicators similarly in their relationship with the latent variable. This is how dynamic latent IRT models were used previously (e.g., Martin and Quinn 2002; Schnakenberg and Fariss 2014;Wang, Berger, and Burdick 2013). As applied to the debate in human rights, this approach would test the argument about the changing standards of accountability in human rights records by assuming that potentially all indicators are subject to such changes. Thus, they all could have variable intercepts. This approach leaves the possibility of a fixed intercept to be endogenously generated in the estimation. When Fariss’s model combines the two types of data in a single estimator, it treats the two groups of indicators differently (Fariss 2014, 305–306). It sets the mass killing indicators to follow a logistic regression with a fixed intercept (cut point) but allows the indicators of lesser human rights violations to follow an ordered logistic regression with variable intercepts for every year. Thus, the latent variable has to fit actual observations of the mass killing indicators without allowing a possible adjustment to the intercepts. With the indicators of lesser human rights violations, on the contrary, it is much easier for the algorithm to fit the latent variable as there are several dozen additional parameters (time specific intercepts) that could also adjust. Consequently, variation in the mass killing indicators generates the improving trend of the latent variable. This is 1086 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000254
Are Human Rights Practices Improving? 3 FIGURE 5.Means of Dynamic Latent Physical Integrity Estimates,1949-2010 Fariss's Estimates o All Intercepts Vary 1949195419591964196919741979198419891994199920042009 Year Key:Unfilled Circles:Replication of Figure 3 in Fariss(2014,308). Filled Circles:The scores are calculated with all indicators assumed to have a similar relationship to the latent variable;that is,all intercepts are allowed to vary.The assumption is that there also has been a change in the standards for recording mass killings. & why the actual values of the indicators of lesser human is likely that both lesser human rights violations and rights violations do not matter much. mass killing events are recorded more accurately now Fixing the intercepts for mass killing indicators is than in the past.There is a higher likelihood now that necessary to obtain the scores showing an improve- mass killings in remote places will be recorded.Cod- ment in human rights.A model where the intercepts ing rules for recording mass killings may be changing. for all items vary across time produces latent variable Coders may have applied more stringent standards in estimates similar to those from a model where none of more recent years.And coding rules across mass killing the intercepts vary.When we rerun Fariss's analysis al- recording projects may be becoming more or less con- lowing all indicators to have variable intercepts(in all sistent with one another. other ways relying on Fariss's original computer code), Though Fariss's model distinguishes between events- we obtain the trend displayed in Figure 5,showing no based and standards-based data,it is important to rec- human rights improvement.These results are robust to ognize that even mass killing indicators are standards- the choice of indicators included in the estimation- based.To record mass killings,scholars must make from four individual CIRI components to all 13 avail- judgment calls that require coding rules determining able indicators. such things as whether,and under what circumstances, to include (a)relatively low death toll events(b)deaths due to interstate and civil war.and(c)killings by non- MASS KILLINGS AS A BASELINE? governmental actors such as paramilitaries. Harff and Gurr (1988,365)developed relatively re- Fariss (2014,301)assumed that instances of mass strictive,explicit coding rules,only counting events killings,genocides,and political executions by oppres- in which"(a)many noncombatants were deliberately sive regimes could "act as a consistent baseline"by killed,(b)the death toll was high (in the thousands which to compare the levels of variables measuring or more),and (c)the campaign was a protracted one." eys lesser human rights violations.As we demonstrated Like the PTS and CIRI coders,they relied on the an- above,the crucial assumption is that there has been no nual reports issued by Amnesty International and the change in the "standards of accountability"in records US State Department,among other sources,to iden- of mass killings.We prefer an alternative,less restric- tify their cases.Other mass killing scholars like Rum- tive assumption-that there has also been a change in mel (1994)applied less restrictive coding rules and the standards for recording mass killings-as a starting recorded the greatest number of mass killings(Figure point for thinking about ways to combine the two types A3),even counting the United States as committing of evidence. mass killings based on civilian wartime deaths in Korea The difference in the changes in the standards of and Vietnam. accountability (SI-S2)between the two types of Even if the standards for recording mass killings had records should be treated as an empirical question.It been more consistent over time and among coders, 1087
Are Human Rights Practices Improving? FIGURE 5. Means of Dynamic Latent Physical Integrity Estimates, 1949–2010 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Year o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 1949 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 2004 2009 Fariss's Estimates -0.2 0.0 0.2 0.4 0.6 0.8 1.0 Year Latent scores All Intercepts Vary Key: Unfilled Circles: Replication of Figure 3 in Fariss (2014, 308). Filled Circles: The scores are calculated with all indicators assumed to have a similar relationship to the latent variable; that is, all intercepts are allowed to vary. The assumption is that there also has been a change in the standards for recording mass killings. why the actual values of the indicators of lesser human rights violations do not matter much. Fixing the intercepts for mass killing indicators is necessary to obtain the scores showing an improvement in human rights. A model where the intercepts for all items vary across time produces latent variable estimates similar to those from a model where none of the intercepts vary. When we rerun Fariss’s analysis allowing all indicators to have variable intercepts (in all other ways relying on Fariss’s original computer code), we obtain the trend displayed in Figure 5, showing no human rights improvement. These results are robust to the choice of indicators included in the estimation— from four individual CIRI components to all 13 available indicators. MASS KILLINGS AS A BASELINE? Fariss (2014, 301) assumed that instances of mass killings, genocides, and political executions by oppressive regimes could “act as a consistent baseline” by which to compare the levels of variables measuring lesser human rights violations. As we demonstrated above, the crucial assumption is that there has been no change in the “standards of accountability” in records of mass killings. We prefer an alternative, less restrictive assumption–that there has also been a change in the standards for recording mass killings—as a starting point for thinking about ways to combine the two types of evidence. The difference in the changes in the standards of accountability (S1 – S2) between the two types of records should be treated as an empirical question. It is likely that both lesser human rights violations and mass killing events are recorded more accurately now than in the past. There is a higher likelihood now that mass killings in remote places will be recorded. Coding rules for recording mass killings may be changing. Coders may have applied more stringent standards in more recent years. And coding rules across mass killing recording projects may be becoming more or less consistent with one another. Though Fariss’s model distinguishes between eventsbased and standards-based data, it is important to recognize that even mass killing indicators are standardsbased. To record mass killings, scholars must make judgment calls that require coding rules determining such things as whether, and under what circumstances, to include (a) relatively low death toll events (b) deaths due to interstate and civil war, and (c) killings by nongovernmental actors such as paramilitaries. Harff and Gurr (1988, 365) developed relatively restrictive, explicit coding rules, only counting events in which “(a) many noncombatants were deliberately killed, (b) the death toll was high (in the thousands or more), and (c) the campaign was a protracted one.” Like the PTS and CIRI coders, they relied on the annual reports issued by Amnesty International and the US State Department, among other sources, to identify their cases. Other mass killing scholars like Rummel (1994) applied less restrictive coding rules and recorded the greatest number of mass killings (Figure A3), even counting the United States as committing mass killings based on civilian wartime deaths in Korea and Vietnam. Even if the standards for recording mass killings had been more consistent over time and among coders, 1087 Downloaded from https://www.cambridge.org/core. Shanghai JiaoTong University, on 26 Oct 2018 at 03:56:49, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0003055418000254