The Nature of econometrics and Economic data As described earlier, this may not seem like a very good experiment, because we have said nothing about choosing plots of land that are identical in all respects except for the amount of fertilizer. In fact, choosing plots of land with this feature is not feasible: some of the factors, such as land quality, cannot even be fully observed. How do we know the results of this experiment can be used to measure the ceteris paribus effect of fertilizer? The answer depends on the specifics of how fertilizer amounts are chosen. If the levels of fer er are assigned to plots independently of ot other plot features that affect yield-that is other characteristics of plots are completely ignored when deciding on fertilizer amounts- then we are in business. We will justify this statement in Chapter 2 The next example is more representative of the difficulties that arise when inferring ity in applied ecor EXAMPLE1 4 (Measuring the return to education) Labor economists and policy makers have long been interested in the"return to educa tion "Somewhat informally, the question is posed as follows: If a person is chosen from the population and given another year of education, by how much will his or her wage increase? As with the previous examples, this is a ceteris paribus question, which implies that all other factors are held fixed while another year of education is given to the person We can imagine a social planner designing an experiment to get at this issue, much as the agricultural researcher can design an experiment to estimate fertilizer effects. One approach is to emulate the fertilizer experiment in Example 1.3: Choose a group of people, randomly give each person an amount of education(some people have an eighth grade education, some are given a high school education, etc. ) and then measure their wages (assuming that each then works in a job). The people here are like the plots in the ferti- lizer example, where education plays the role of fertilizer and wage rate plays the role soybean yield. As with Example 1.3, if levels of education are assigned independently of other characteristics that affect productivity(such as experience and innate ability), then ar analysis that ignores these other factors will yield useful results. Again, it will take some effort in Chapter 2 to justify this claim; for now we state it without support. fertilizer-yield exampl ibed in Example 1.4 is infeasible. The moral issues. not to mention the economic costs associated with ran- domly determining education levels for a group of individuals are obvious. As a logis- tical matter, we could not give someone only an eighth grade education if he or she lready has a college degree Even though experimental data cannot be obtained for measuring the return to ed cation, we can certainly collect nonexperimental data on education levels and wages for a large group by sampling randomly from the population of working people. Such data are available from a variety of surveys used in labor economics, but these data sets have a feature that makes it difficult to estimate the ceteris paribus return to education
As described earlier, this may not seem like a very good experiment, because we have said nothing about choosing plots of land that are identical in all respects except for the amount of fertilizer. In fact, choosing plots of land with this feature is not feasible: some of the factors, such as land quality, cannot even be fully observed. How do we know the results of this experiment can be used to measure the ceteris paribus effect of fertilizer? The answer depends on the specifics of how fertilizer amounts are chosen. If the levels of fertilizer are assigned to plots independently of other plot features that affect yield—that is, other characteristics of plots are completely ignored when deciding on fertilizer amounts— then we are in business. We will justify this statement in Chapter 2. The next example is more representative of the difficulties that arise when inferring causality in applied economics. EXAMPLE 1.4 (Measuring the Return to Education) Labor economists and policy makers have long been interested in the “return to education.” Somewhat informally, the question is posed as follows: If a person is chosen from the population and given another year of education, by how much will his or her wage increase? As with the previous examples, this is a ceteris paribus question, which implies that all other factors are held fixed while another year of education is given to the person. We can imagine a social planner designing an experiment to get at this issue, much as the agricultural researcher can design an experiment to estimate fertilizer effects. One approach is to emulate the fertilizer experiment in Example 1.3: Choose a group of people, randomly give each person an amount of education (some people have an eighth grade education, some are given a high school education, etc.), and then measure their wages (assuming that each then works in a job). The people here are like the plots in the fertilizer example, where education plays the role of fertilizer and wage rate plays the role of soybean yield. As with Example 1.3, if levels of education are assigned independently of other characteristics that affect productivity (such as experience and innate ability), then an analysis that ignores these other factors will yield useful results. Again, it will take some effort in Chapter 2 to justify this claim; for now we state it without support. Unlike the fertilizer-yield example, the experiment described in Example 1.4 is infeasible. The moral issues, not to mention the economic costs, associated with randomly determining education levels for a group of individuals are obvious. As a logistical matter, we could not give someone only an eighth grade education if he or she already has a college degree. Even though experimental data cannot be obtained for measuring the return to education, we can certainly collect nonexperimental data on education levels and wages for a large group by sampling randomly from the population of working people. Such data are available from a variety of surveys used in labor economics, but these data sets have a feature that makes it difficult to estimate the ceteris paribus return to education. Chapter 1 The Nature of Econometrics and Economic Data 15 d 7/14/99 4:34 PM Page 15
The Nature of Econometrics and Economic data People choose their own levels of education, and therefore education levels are proba- bly not determined independently of all other factors affecting wage. This problem is a feature shared by most nonexperimental data sets One factor that affects wage is experience in the work force. Sind ducation generally requires postponing entering the work force, those with more edu cation usually have less experience. Thus, in a nonexperimental data set on wages and education, education is likely to be negatively associated with a key variable that also affects wage. It is also believed that people with more innate ability often choose higher levels of education. Since higher ability leads to higher wages, we again have a correlation between education and a critical factor that affects wage. The omitted factors of experience and ability in the wage example have analogs in the the fertilizer example. Experience is generally easy to measure and therefore is sim- ilar to a variable such as rainfall. Ability, on the other hand, is nebulous and difficult to quantify; it is similar to land quality in the fertilizer example. As we will see through out this text, accounting for other observed factors, such as experience, when estimat- ing the ceteris paribus effect of another variable, such as education, is relatively straightforward. We will also find that accounting for inherently unobservable factors, such as ability, is much more problematical. It is fair to say that many of the advances in econometric methods have tried to deal with unobserved factors in econometric models One final parallel can be drawn between Examples 1.3 and 1. 4. Suppose that in the fertilizer example, the fertilizer amounts were not entirely determined at random Instead, the assistant who chose the fertilizer levels thought it would be better to put more fertilizer on the higher quality plots of land. ( Agricultural researchers should have a rough idea about which plots of land are better quality, even though they may not be able to fully quantify the differences. This situation is completely analogous to the level of schooling being related to unobserved ability in Example 1. 4. Because better observed relationship between yield and fertilizer might be spurious E X M PL E1. 5 (The Effect of Law Enforcement on city crime levels The issue of how best to prevent crime has, and will probably continue to be, with us for some time. One especially important question in this regard is: Does the presence of more police officers on the street deter crime? The ceteris paribus question is easy to state: If a city is randomly chosen and given additional police officers, by how much would its crime rates fall? Another way to state the question is: If two cities are the same in all respects, except that city a has 10 more police officers than city B, by how much would the two cities crime rates differ? It would be virtually impossible to find pairs of communities identical in all respects xcept for the size of their police force. Fortunately, econometric analysis does not require his. What we do need to know is whether the data we can collect on community crime levels and the size of the police force can be viewed as experimental. We can certainly imagine a true experiment involving a large collection of cities where we dictate how many police officers each city will use for the upcoming year
People choose their own levels of education, and therefore education levels are probably not determined independently of all other factors affecting wage. This problem is a feature shared by most nonexperimental data sets. One factor that affects wage is experience in the work force. Since pursuing more education generally requires postponing entering the work force, those with more education usually have less experience. Thus, in a nonexperimental data set on wages and education, education is likely to be negatively associated with a key variable that also affects wage. It is also believed that people with more innate ability often choose higher levels of education. Since higher ability leads to higher wages, we again have a correlation between education and a critical factor that affects wage. The omitted factors of experience and ability in the wage example have analogs in the the fertilizer example. Experience is generally easy to measure and therefore is similar to a variable such as rainfall. Ability, on the other hand, is nebulous and difficult to quantify; it is similar to land quality in the fertilizer example. As we will see throughout this text, accounting for other observed factors, such as experience, when estimating the ceteris paribus effect of another variable, such as education, is relatively straightforward. We will also find that accounting for inherently unobservable factors, such as ability, is much more problematical. It is fair to say that many of the advances in econometric methods have tried to deal with unobserved factors in econometric models. One final parallel can be drawn between Examples 1.3 and 1.4. Suppose that in the fertilizer example, the fertilizer amounts were not entirely determined at random. Instead, the assistant who chose the fertilizer levels thought it would be better to put more fertilizer on the higher quality plots of land. (Agricultural researchers should have a rough idea about which plots of land are better quality, even though they may not be able to fully quantify the differences.) This situation is completely analogous to the level of schooling being related to unobserved ability in Example 1.4. Because better land leads to higher yields, and more fertilizer was used on the better plots, any observed relationship between yield and fertilizer might be spurious. EXAMPLE 1.5 (The Effect of Law Enforcement on City Crime Levels) The issue of how best to prevent crime has, and will probably continue to be, with us for some time. One especially important question in this regard is: Does the presence of more police officers on the street deter crime? The ceteris paribus question is easy to state: If a city is randomly chosen and given 10 additional police officers, by how much would its crime rates fall? Another way to state the question is: If two cities are the same in all respects, except that city A has 10 more police officers than city B, by how much would the two cities’ crime rates differ? It would be virtually impossible to find pairs of communities identical in all respects except for the size of their police force. Fortunately, econometric analysis does not require this. What we do need to know is whether the data we can collect on community crime levels and the size of the police force can be viewed as experimental. We can certainly imagine a true experiment involving a large collection of cities where we dictate how many police officers each city will use for the upcoming year. Chapter 1 The Nature of Econometrics and Economic Data 16 14/99 4:34 PM Page 16
The Nature of econometrics and Economic data While policies can be used to affect the size of police forces, we clearly cannot tell each city how many police officers it can hire. If, as is likely a city's decision on how many police officers to hire is correlated with other city factors that affect crime, then the data must be viewed as nonexperimental. In fact, one way to view this problem is to see that a city's choice of police force size and the amount of crime are simultaneously determined. We will explicitly address such problems in Chapter 16 The first three examples we have discussed have dealt with cross-sectional data at various levels of aggregation(for example, at the individual or city levels). The same hurdles arise when inferring causality in time series problems. E XA PLE1 6 (The effect of the minimum wage on unemployment An important, and perhaps contentious, policy issue concerns the effect of the minimum wage on unemployment rates for various groups of workers. While this problem can be studied in a variety of data settings(cross-sectional, time series, or panel data), time series data are often used to look at aggregate effects. An example of a time series data set on unemployment rates and minimum wages was given in Table 1.3 Standard supply and demand analysis implies that, as the minimum wage is increased above the market clearing wage, we slide up the demand curve for labor and total employ ment decreases. Labor supply exceeds labor demand. ) To quantify this effect, we can study the relationship between employment and the minimum wage over time. In addition to some special difficulties that can arise in dealing with time series data, there are possible problems with inferring causality. The minimum wage in the United States is not deter d in a vacuum. various economic and political forces impinge on the final minimum wage for any given year. (The minimum wage, once determined, is usually in place for sev- eral years, unless it is indexed for inflation. )Thus, it is probable that the amount of the min imum wage is related to other factors that have an effect on employment levels Ve can imagine the U.S. government conducting an experiment to determine the employment effects of the minimum wage(as opposed to worrying about the welfare ow wage workers). The minimum wage could be randomly set by the government each year, and then the employment outcomes could be tabulated. The resulting experimental time series data could then be analyzed using fairly simple econometric methods. But this scenario hardly describes how minimum wages are set. If we can control enough other factors relating to employment, then we can still hope to estimate the ceteris paribus effect of the minimum wage on employment. In this sense, the problem is very similar to the previous cross-sectional examples Even when economic theories are not most naturally described in terms of causal ty, they often have predictions that can be tested using econometric methods. The fol- lowing is an example of this approach
While policies can be used to affect the size of police forces, we clearly cannot tell each city how many police officers it can hire. If, as is likely, a city’s decision on how many police officers to hire is correlated with other city factors that affect crime, then the data must be viewed as nonexperimental. In fact, one way to view this problem is to see that a city’s choice of police force size and the amount of crime are simultaneously determined. We will explicitly address such problems in Chapter 16. The first three examples we have discussed have dealt with cross-sectional data at various levels of aggregation (for example, at the individual or city levels). The same hurdles arise when inferring causality in time series problems. EXAMPLE 1.6 (The Effect of the Minimum Wage on Unemployment) An important, and perhaps contentious, policy issue concerns the effect of the minimum wage on unemployment rates for various groups of workers. While this problem can be studied in a variety of data settings (cross-sectional, time series, or panel data), time series data are often used to look at aggregate effects. An example of a time series data set on unemployment rates and minimum wages was given in Table 1.3. Standard supply and demand analysis implies that, as the minimum wage is increased above the market clearing wage, we slide up the demand curve for labor and total employment decreases. (Labor supply exceeds labor demand.) To quantify this effect, we can study the relationship between employment and the minimum wage over time. In addition to some special difficulties that can arise in dealing with time series data, there are possible problems with inferring causality. The minimum wage in the United States is not determined in a vacuum. Various economic and political forces impinge on the final minimum wage for any given year. (The minimum wage, once determined, is usually in place for several years, unless it is indexed for inflation.) Thus, it is probable that the amount of the minimum wage is related to other factors that have an effect on employment levels. We can imagine the U.S. government conducting an experiment to determine the employment effects of the minimum wage (as opposed to worrying about the welfare of low wage workers). The minimum wage could be randomly set by the government each year, and then the employment outcomes could be tabulated. The resulting experimental time series data could then be analyzed using fairly simple econometric methods. But this scenario hardly describes how minimum wages are set. If we can control enough other factors relating to employment, then we can still hope to estimate the ceteris paribus effect of the minimum wage on employment. In this sense, the problem is very similar to the previous cross-sectional examples. Even when economic theories are not most naturally described in terms of causality, they often have predictions that can be tested using econometric methods. The following is an example of this approach. Chapter 1 The Nature of Econometrics and Economic Data 17 d 7/14/99 4:34 PM Page 17
The Nature of econometrics and Economic data E.7 (The Expectations hypothesis) The expectations hypothesis from financial economics states that, given all information available to investors at the time of investing the expected return on any two investments is the same. For example, consider two possible investments with a three-month investment horizon, purchased at the same time: (1)buy a three-month T-bill with a face value of s10,000, for a price below $10,000; in three months, you receive $10,000.(2)buy a six- month T-bill (at a price below $10,000)and, in three months, sell it as a three-month T-bill Each investment requires roughly the same amount of initial capital, but there is an impor nt difference. For the first investment, you know exactly what the return is at the time of purchase because you know the initial price of the three-month T-bill, along with its face value. This is not true for the second investment: while you know the price of a six-month T-bill when you purchase it, you do not know the price you can sell it for in three months Therefore, there is uncertainty in this investment for someone who has a three-month investment horizon The actual returns on these two investments will usually be different. According to the expectations hypothesis, the expected return from the second investment, given all infor mation at the time of investment, should equal the return from purchasing a three-month T-bill. This theory turns out to be fairly easy to test, as we will see in Chapter 11 SUMMARY In this introductory chapter, we have discussed the purpose and scope of economet is used in all applied economic fields to ories, inform government and private policy makers, and to predict economic time series. Sometimes an econometric model is derived from a formal economic mod in other cases econometric models are based on informal economic reasoning and intuition. The goal of any econometric analysis is to estimate the parameters in the model and to test hypotheses about these parameters; the values and signs of the parameters determine the validity of an economic theory and the effects of certain Cross-sectional, time series, pooled cross-sectional, and panel data are the most common types of data structures that are used in applied econometrics. Data sets involving a time dimension, such as time series and panel data, require special treat ment because of the correlation across time of most economic time series Other issues such as trends and seasonality, arise in the analysis of time series data but not cross- sectional data In Section 1. 4, we discussed the notions of ceteris paribus and causal inference. In most cases, hypotheses in the social sciences are ceteris paribus in nature: all other rel- evant factors must be fixed when studying the relationship between two variables nonexperimental nature of most data collected in the social sciences, uncover ps is very
EXAMPLE 1.7 (The Expectations Hypothesis) The expectations hypothesis from financial economics states that, given all information available to investors at the time of investing, the expected return on any two investments is the same. For example, consider two possible investments with a three-month investment horizon, purchased at the same time: (1) Buy a three-month T-bill with a face value of $10,000, for a price below $10,000; in three months, you receive $10,000. (2) Buy a sixmonth T-bill (at a price below $10,000) and, in three months, sell it as a three-month T-bill. Each investment requires roughly the same amount of initial capital, but there is an important difference. For the first investment, you know exactly what the return is at the time of purchase because you know the initial price of the three-month T-bill, along with its face value. This is not true for the second investment: while you know the price of a six-month T-bill when you purchase it, you do not know the price you can sell it for in three months. Therefore, there is uncertainty in this investment for someone who has a three-month investment horizon. The actual returns on these two investments will usually be different. According to the expectations hypothesis, the expected return from the second investment, given all information at the time of investment, should equal the return from purchasing a three-month T-bill. This theory turns out to be fairly easy to test, as we will see in Chapter 11. SUMMARY In this introductory chapter, we have discussed the purpose and scope of econometric analysis. Econometrics is used in all applied economic fields to test economic theories, inform government and private policy makers, and to predict economic time series. Sometimes an econometric model is derived from a formal economic model, but in other cases econometric models are based on informal economic reasoning and intuition. The goal of any econometric analysis is to estimate the parameters in the model and to test hypotheses about these parameters; the values and signs of the parameters determine the validity of an economic theory and the effects of certain policies. Cross-sectional, time series, pooled cross-sectional, and panel data are the most common types of data structures that are used in applied econometrics. Data sets involving a time dimension, such as time series and panel data, require special treatment because of the correlation across time of most economic time series. Other issues, such as trends and seasonality, arise in the analysis of time series data but not crosssectional data. In Section 1.4, we discussed the notions of ceteris paribus and causal inference. In most cases, hypotheses in the social sciences are ceteris paribus in nature: all other relevant factors must be fixed when studying the relationship between two variables. Because of the nonexperimental nature of most data collected in the social sciences, uncovering causal relationships is very challenging. Chapter 1 The Nature of Econometrics and Economic Data 18 14/99 4:34 PM Page 18
The Nature of econometrics and Economic data KEY TERMS Causal effect Experimental Data Ceteris Paribus Nonexperimental Data Data Frequency Panel data Econometric model Pooled cross section Economic model Random Sampling Time Series D
KEY TERMS Causal Effect Experimental Data Ceteris Paribus Nonexperimental Data Cross-Sectional Data Set Observational Data Data Frequency Panel Data Econometric Model Pooled Cross Section Economic Model Random Sampling Empirical Analysis Time Series Data Chapter 1 The Nature of Econometrics and Economic Data 19 d 7/14/99 4:34 PM Page 19