1. The Nature of ② The McG econometrics. Fourth Regression Analysis CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 21 k-inconmye Inflation rate FIGURE 1.4 Money holding in relation to the inflation rate 6. From monetary economics it is known that, other things remaining the same, the higher the rate of inflation the lower the proportion k of their income that people would want to hold in the form of money, as de- picted in Figure 1.4. a quantitative analysis of this relationship will enable the monetary economist to predict the amount of money, as a proportion of their income, that people would want to hold at various rates of inflation. 7. The marketing director of a company may want to know how the de mand for the company's product is related to, say, advertising expenditure Such a study will be of considerable help in finding out the elasticity of demand with respect to advertising expenditure, that is, the percent change in demand in response to, say, a 1 percent change in the advertising budget. This knowledge may be helpful in determining the "optimum advertising Finally, an agronomist may be interested in studying the dependence yield, say, of wheat, on temperature, rainfall, amount of sunshine, and fertilizer. Such a dependence analysis may enable the prediction or forecasting of the average crop yield, given information about the explana tory variables he reader can supply scores of such examples of the dependence of one variable on one or more other variables. The techniques of regression analy sis discussed in this text are specially designed to study such dependence among variables
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 1. The Nature of Regression Analysis © The McGraw−Hill Companies, 2004 CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 21 0 Inflation rate π k = Money Income FIGURE 1.4 Money holding in relation to the inflation rate π. 6. From monetary economics it is known that, other things remaining the same, the higher the rate of inflation π, the lower the proportion k of their income that people would want to hold in the form of money, as depicted in Figure 1.4. A quantitative analysis of this relationship will enable the monetary economist to predict the amount of money, as a proportion of their income, that people would want to hold at various rates of inflation. 7. The marketing director of a company may want to know how the demand for the company’s product is related to, say, advertising expenditure. Such a study will be of considerable help in finding out the elasticity of demand with respect to advertising expenditure, that is, the percent change in demand in response to, say, a 1 percent change in the advertising budget. This knowledge may be helpful in determining the “optimum” advertising budget. 8. Finally, an agronomist may be interested in studying the dependence of crop yield, say, of wheat, on temperature, rainfall, amount of sunshine, and fertilizer. Such a dependence analysis may enable the prediction or forecasting of the average crop yield, given information about the explanatory variables. The reader can supply scores of such examples of the dependence of one variable on one or more other variables. The techniques of regression analysis discussed in this text are specially designed to study such dependence among variables
② The McG econometrics. Fourth Regression Analysis 22 PART ONE: SINGLE-EQUATION REGRESSION MODELS 1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS From the examples cited in Section 1. 2, the reader will notice that in re- gression analysis we are concerned with what is known as the statistical, not functional or deterministic, dependence among variables, such as those of classical physics. In statistical relationships among variables we essentially deal with random or stochastic+ variables, that is, variables that have prob- ability distributions. In functional or deterministic dependency, on the other hand. we also deal with variables, but these variables are not random or stochastic The dependence of crop yield on temperature, rainfall, sunshine, and fertilizer, for example, is statistical in nature in the sense that the explana tory variables, although certainly important, will not enable the agronomist to predict crop yield exactly because of errors involved in measuring these variables as well as a host of other factors(variables) that collectively affect the yield but may be difficult to identify individually. Thus, there is bound to be some"intrinsic" or random variability in the dependent-variable crop yield that cannot be fully explained no matter how many explanatory vari- ables we consider In deterministic phenomena, on the other hand, we deal with relationships of the type, say, exhibited by Newtons law of gravity, which states: Every particle in the universe attracts every other particle with a force directly pro portional to the product of their masses and inversely proportional to the square of the distance between them. Symbolically, F=k(mm/r2),where F=force, m, and m are the masses of the two particles, r= distance, and k= constant of proportionality. Another example is Ohms law, which states For metallic conductors over a limited range of temperature the current C is proportional to the voltage V; that is, C=()V where k is the constant of proportionality. Other examples of such deterministic relationships are Boyle's gas law, Kirchhoffs law of electricity, and Newtons law of motion. In this text we are not concerned with such deterministic relationships Of course, if there are errors of measurement, say, in the k of Newtons law of gravity, the otherwise deterministic relationship becomes a statistical re lationship. In this situation, force can be predicted only approximately from the given value of k(and m1, m2, and r), which contains errors. The variable F in this case becomes a rando 1.4 REGRESSION VERSUS CAUSATION Although regression analysis deals with the dependence of one variable other variables, it does not necessarily imply causation. In the words Kendall and Stuart, "A statistical relationship, however strong and however The word stochastic comes from the Greek word stokhos meaning"a bull's eye. "The out. ome of throwing darts on a dart board is a stochastic process, that is, a process fraught with
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 1. The Nature of Regression Analysis © The McGraw−Hill Companies, 2004 22 PART ONE: SINGLE-EQUATION REGRESSION MODELS 4The word stochastic comes from the Greek word stokhos meaning “a bull’s eye.” The outcome of throwing darts on a dart board is a stochastic process, that is, a process fraught with misses. 1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS From the examples cited in Section 1.2, the reader will notice that in regression analysis we are concerned with what is known as the statistical, not functional or deterministic, dependence among variables, such as those of classical physics. In statistical relationships among variables we essentially deal with random or stochastic4 variables, that is, variables that have probability distributions. In functional or deterministic dependency, on the other hand, we also deal with variables, but these variables are not random or stochastic. The dependence of crop yield on temperature, rainfall, sunshine, and fertilizer, for example, is statistical in nature in the sense that the explanatory variables, although certainly important, will not enable the agronomist to predict crop yield exactly because of errors involved in measuring these variables as well as a host of other factors (variables) that collectively affect the yield but may be difficult to identify individually. Thus, there is bound to be some “intrinsic” or random variability in the dependent-variable crop yield that cannot be fully explained no matter how many explanatory variables we consider. In deterministic phenomena, on the other hand, we deal with relationships of the type, say, exhibited by Newton’s law of gravity, which states: Every particle in the universe attracts every other particle with a force directly proportional to the product of their masses and inversely proportional to the square of the distance between them. Symbolically, F = k(m1m2/r 2), where F = force, m1 and m2 are the masses of the two particles, r = distance, and k = constant of proportionality. Another example is Ohm’s law, which states: For metallic conductors over a limited range of temperature the current C is proportional to the voltage V; that is, C = ( 1 k )V where 1 k is the constant of proportionality. Other examples of such deterministic relationships are Boyle’s gas law, Kirchhoff’s law of electricity, and Newton’s law of motion. In this text we are not concerned with such deterministic relationships. Of course, if there are errors of measurement, say, in the k of Newton’s law of gravity, the otherwise deterministic relationship becomes a statistical relationship. In this situation, force can be predicted only approximately from the given value of k (and m1, m2, and r), which contains errors. The variable F in this case becomes a random variable. 1.4 REGRESSION VERSUS CAUSATION Although regression analysis deals with the dependence of one variable on other variables, it does not necessarily imply causation. In the words of Kendall and Stuart, “A statistical relationship, however strong and however
1. The Nature of ② The McG econometrics. Fourth Regression Analysis CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 23 suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other. "5 In the crop-yield example cited previously, there is no statistical reason to assume that rainfall does not depend on crop yield. The fact that we treat crop yield as dependent on rainfall (among other things) is due to nonsta- tistical considerations: Common sense suggests that the relationship cannot e reversed, for we cannot control rainfall by varying crop yield In all the examples cited in Section 1. 2 the point to note is that a statisti cal relationship in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori or theoretical considerations. Thus, in the third example cited, one can invoke economic theory in saying that con sumption expenditure depends on real income 1.5 REGRESSION VERSUS CORRELATION Closely related to but conceptually very much different from regression analysis is correlation analysis, where the primary objective is to measure the strength or degree of linear association between two variables. The cor- relation coefficient, which we shall study in detail in Chapter 3, measures this strength of (linear)association. For example, we may be interested in finding the correlation(coefficient) between smoking and lung cancer, between scores on statistics and mathematics examinations between high school grades and college grades, and so on. In regression analysis, as al ready noted, we are not primarily interested in such a measure. Instead, we try to estimate or predict the average value of one variable on the basis of the fixed values of other variables. Thus, we may want to know whethe we can predict the average score on a statistics examination by knowing a students score on a mathematics examination wo, Regression and correlation have some fundamental differences that are th mentioning In regression analysis there is an asymmetry in the way the dependent and explanatory variables are treated. The dependent vari able is assumed to be statistical. random or stochastic. that is to have a probability distribution. The explanatory variables, on the other hand, are assumed to have fixed values(in repeated sampling), which was made ex plicit in the definition of regression given in Section 1. 2. Thus, in Figure 1.2 we assumed that the variable age was fixed at given levels and height mea- surements were obtained at these levels. In correlation analysis, on the endall and A Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, the model used in the analysis is the corr be implicit in the model postulated It is crucial to note that the explanatory variables may be intrinsically stoc the purpose of regression analysis we assume heir values are fixed in repeated sampling (that is, X assumes the same values in variou ples), thus rendering them in effect non-
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 1. The Nature of Regression Analysis © The McGraw−Hill Companies, 2004 CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 23 5M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Charles Griffin Publishers, New York, 1961, vol. 2, chap. 26, p. 279. 6But as we shall see in Chap. 3, classical regression analysis is based on the assumption that the model used in the analysis is the correct model. Therefore, the direction of causality may be implicit in the model postulated. 7It is crucial to note that the explanatory variables may be intrinsically stochastic, but for the purpose of regression analysis we assume that their values are fixed in repeated sampling (that is, X assumes the same values in various samples), thus rendering them in effect nonrandom or nonstochastic. But more on this in Chap. 3, Sec. 3.2. suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other.”5 In the crop-yield example cited previously, there is no statistical reason to assume that rainfall does not depend on crop yield. The fact that we treat crop yield as dependent on rainfall (among other things) is due to nonstatistical considerations: Common sense suggests that the relationship cannot be reversed, for we cannot control rainfall by varying crop yield. In all the examples cited in Section 1.2 the point to note is that a statistical relationship in itself cannot logically imply causation. To ascribe causality, one must appeal to a priori or theoretical considerations. Thus, in the third example cited, one can invoke economic theory in saying that consumption expenditure depends on real income.6 1.5 REGRESSION VERSUS CORRELATION Closely related to but conceptually very much different from regression analysis is correlation analysis, where the primary objective is to measure the strength or degree of linear association between two variables. The correlation coefficient, which we shall study in detail in Chapter 3, measures this strength of (linear) association. For example, we may be interested in finding the correlation (coefficient) between smoking and lung cancer, between scores on statistics and mathematics examinations, between high school grades and college grades, and so on. In regression analysis, as already noted, we are not primarily interested in such a measure. Instead, we try to estimate or predict the average value of one variable on the basis of the fixed values of other variables. Thus, we may want to know whether we can predict the average score on a statistics examination by knowing a student’s score on a mathematics examination. Regression and correlation have some fundamental differences that are worth mentioning. In regression analysis there is an asymmetry in the way the dependent and explanatory variables are treated. The dependent variable is assumed to be statistical, random, or stochastic, that is, to have a probability distribution. The explanatory variables, on the other hand, are assumed to have fixed values (in repeated sampling),7 which was made explicit in the definition of regression given in Section 1.2. Thus, in Figure 1.2 we assumed that the variable age was fixed at given levels and height measurements were obtained at these levels. In correlation analysis, on the
1. The Nature of ② The McG Econometrics. Fourth Regression Analysis 24 PART ONE: SINGLE-EQUATION REGRESSION MODELS other hand, we treat any(two) variables symmetrically; there is no distinc- tion between the dependent and explanatory variables. After all, the corre ation between scores on mathematics and statistics examinations is the same as that between scores on statistics and mathematics examinations Moreover both variables are assumed to be random. as we shall see. most of the correlation theory is based on the assumption of randomness of vari ables, whereas most of the regression theory to be expounded in this book is conditional upon the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic. 8 1.6 TERMINOLOGY AND NOTATION Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter of terminology and notation. In the literature the terms dependent variable and explanatory variable are described variously. a repre sentative list is Dependent variable Explanatory variable 非 Predictor Regressand Regressor Exogenous Outcome Covariate Control variable Although it is a matter of personal taste and tradition, in this text we will use the dependent variable/explanatory variable or the more neutral, regressand and ninology If we are studying the dependence of a variable on only a single explana- tory variable, such as that of consumption expenditure on real income, such a study is known as simple, or two-variable, regression analysis However, if we are studying the dependence of one variable on more In advanced tr ometrics, one can relax the assumption that the explanatory stic(see introduction to Part ID)
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 1. The Nature of Regression Analysis © The McGraw−Hill Companies, 2004 24 PART ONE: SINGLE-EQUATION REGRESSION MODELS 8In advanced treatment of econometrics, one can relax the assumption that the explanatory variables are nonstochastic (see introduction to Part II). other hand, we treat any (two) variables symmetrically; there is no distinction between the dependent and explanatory variables. After all, the correlation between scores on mathematics and statistics examinations is the same as that between scores on statistics and mathematics examinations. Moreover, both variables are assumed to be random. As we shall see, most of the correlation theory is based on the assumption of randomness of variables, whereas most of the regression theory to be expounded in this book is conditional upon the assumption that the dependent variable is stochastic but the explanatory variables are fixed or nonstochastic.8 1.6 TERMINOLOGY AND NOTATION Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter of terminology and notation. In the literature the terms dependent variable and explanatory variable are described variously. A representative list is: Dependent variable Explanatory variable Explained variable Independent variable Predictand Predictor Regressand Regressor Response Stimulus Endogenous Exogenous Outcome Covariate Controlled variable Control variable Although it is a matter of personal taste and tradition, in this text we will use the dependent variable/explanatory variable or the more neutral, regressand and regressor terminology. If we are studying the dependence of a variable on only a single explanatory variable, such as that of consumption expenditure on real income, such a study is known as simple, or two-variable, regression analysis. However, if we are studying the dependence of one variable on more than ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ ⇔
1. The Nature of ② The McG econometrics. Fourth Regression Analysis CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 25 shine, and fertilizer examples, it is known as multiple regression analysis In other words, in two-variable regression there is only one explanatory variable, whereas in multiple regression there is more than one explana The term random is a synonym for the term stochastic. As noted earlier, values, positive or negative, with a given probability 9 n take on any set of random or stochastic variable is a variable that ca Unless stated otherwise, the letter Y will denote the dependent variable and the Xs(X1, X2,..., X2)will denote the explanatory variables, Xk being the kth explanatory variable. The subscript i or t will denote the ith or the tth observation or value. Xk (or Xkt) will denote the ith(or tth)observation on variable Xk N(or T)will denote the total number of observations or values in the population, and n (or t)the total number of observations in a sample As a matter of convention, the observation subscript i will be used for cross- sectional data(i.e, data collected at one point in time)and the subscript t will be used for time series data (i.e, data collected over a period of time The nature of cross-sectional and time series data, as well as the important topic of the nature and sources of data for empirical analysis, is discussed in the following section 1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS10 The success of any econometric analysis ultimately depends on the avail ability of the appropriate data. It is therefore essential that we spend some time discussing the nature, sources, and limitations of the data that one may encounter in empirical analysis Types of Data Three types of data may be available for empirical analysis: time series, cross-section, and pooled (i.e, combination of time series and cro Time Series Data The data shown in Table l1 of the introduction are n example of time series data. a time series is a set of observations on the values that a variable takes at different times. Such data may be collected at regular time intervals, such as daily (e. g, stock prices, weather reports weekly(e.g, money supply figures), monthly [e. g, the unemployment rate, the Consumer Price Index(CPI)l, quarterly(e.g, GDP), annually(e. g See App. A for formal definition and further details oFor an informative account, see Michael D Intriligator, Econometric Models, Techniques
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 1. The Nature of Regression Analysis © The McGraw−Hill Companies, 2004 CHAPTER ONE: THE NATURE OF REGRESSION ANALYSIS 25 9See App. A for formal definition and further details. 10For an informative account, see Michael D. Intriligator, Econometric Models, Techniques, and Applications, Prentice Hall, Englewood Cliffs, N.J., 1978, chap. 3. one explanatory variable, as in the crop-yield, rainfall, temperature, sunshine, and fertilizer examples, it is known as multiple regression analysis. In other words, in two-variable regression there is only one explanatory variable, whereas in multiple regression there is more than one explanatory variable. The term random is a synonym for the term stochastic. As noted earlier, a random or stochastic variable is a variable that can take on any set of values, positive or negative, with a given probability.9 Unless stated otherwise, the letter Y will denote the dependent variable and the X’s (X1, X2, ... , Xk) will denote the explanatory variables, Xk being the kth explanatory variable. The subscript i or t will denote the ith or the tth observation or value. Xki (or Xkt) will denote the ith (or tth) observation on variable Xk. N (or T) will denote the total number of observations or values in the population, and n (or t) the total number of observations in a sample. As a matter of convention, the observation subscript i will be used for crosssectional data (i.e., data collected at one point in time) and the subscript t will be used for time series data (i.e., data collected over a period of time). The nature of cross-sectional and time series data, as well as the important topic of the nature and sources of data for empirical analysis, is discussed in the following section. 1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS10 The success of any econometric analysis ultimately depends on the availability of the appropriate data. It is therefore essential that we spend some time discussing the nature, sources, and limitations of the data that one may encounter in empirical analysis. Types of Data Three types of data may be available for empirical analysis: time series, cross-section, and pooled (i.e., combination of time series and crosssection) data. Time Series Data The data shown in Table I.1 of the Introduction are an example of time series data. A time series is a set of observations on the values that a variable takes at different times. Such data may be collected at regular time intervals, such as daily (e.g., stock prices, weather reports), weekly (e.g., money supply figures), monthly [e.g., the unemployment rate, the Consumer Price Index (CPI)], quarterly (e.g., GDP), annually (e.g