Introduction Lecture 13 This lecture deals with the modeling of dependent variables that are event count count data. An event count refers to the Poisson Regression number of times an event occurs and is Using Stata he realization of a nonnegative integer- valued random variable. Variables that count the number of times that something has happened are common in the social This lecture covers Some examples of count variables are the number of times in a year that persons The Univariate Poisson Distribution visit the doctor. the number of car accidents that occur each day in a city, the The Poisson Regression Model number of love affairs that occur to a The Negative binomial distribution niversity student during the four years, Negative Binomial Regression the number of industrial injuries in the Comparing the Poisson and Negative workplace in a day, the number of Binomial Regression Models cigarettes a person smokes in a day, etc
1 1 Lecture 13 Poisson Regression Using Stata 2 This lecture covers • The Univariate Poisson Distribution • The Poisson Regression Model • The Negative Binomial Distribution • Negative Binomial Regression • Comparing the Poisson and Negative Binomial Regression Models 2 3 Introduction • This lecture deals with the modeling of dependent variables that are event count or count data. An event count refers to the number of times an event occurs, and is the realization of a nonnegative integervalued random variable. Variables that count the number of times that something has happened are common in the social sciences. 4 • Some examples of count variables are the number of times in a year that persons visit the doctor, the number of car accidents that occur each day in a city, the number of love affairs that occur to a university student during the four years, the number of industrial injuries in the workplace in a day, the number of cigarettes a person smokes in a day, etc
In demography, popular count variables Thus. models other than ols models are the number of children born to a ave been used to handle count data. this woman, the number of pregnancies a lecture will cover two: (1)the Poisson regression model, and (2) the negative intercourses a person has in a week, the binomial regression model. The software number of sexual partners in a year, the ed in this le number of abortions a woman has had in regression is also available in SPSS under her lifetime the number of residential general log-linear model) Before the migrations a person makes in a lifetime discussion of the Poisson regressions the number of jobs a migrant worker has let's first take a look at the univariate done since s/he arrived. eto Poisson distribution Frequently, count variables are treated as The Univariate poisson distribution though they are continuous and The univariate poisson distribution unbounded ols models are then used to provides the benchmark for Poisson estimate the effects of x variables on their occurrence OLS is appropriate if the regression. Let Y equal a random variable that represents the number of times that dependent variable, the count, is an event has occurred during an interval independently and identically distributed of time y will have a poisson However, the use of ols for count distribution with a parameter u greater outcomes can result in inefficient inconsistent and biased estimates if one or more of the OLS assumptions are not met Pr(r=y y=0,1,2
3 5 • In demography, popular count variables are the number of children born to a woman, the number of pregnancies a woman has, the number of sexual intercourses a person has in a week, the number of sexual partners in a year, the number of abortions a woman has had in her lifetime, the number of residential migrations a person makes in a lifetime, the number of jobs a migrant worker has done since s/he arrived, etc. 6 • Frequently, count variables are treated as though they are continuous and unbounded. OLS models are then used to estimate the effects of X variables on their occurrence. OLS is appropriate if the dependent variable, the count, is independently and identically distributed. However, the use of OLS for count outcomes can result in inefficient, inconsistent and biased estimates if one or more of the OLS assumptions are not met. 4 7 • Thus, models other than OLS models have been used to handle count data. This lecture will cover two: (1) the Poisson regression model, and (2) the negative binomial regression model. The software used in this lecture is Stata. (Poisson regression is also available in SPSS under general log-linear model) Before the discussion of the Poisson regressions, let’s first take a look at the univariate Poisson distribution. 8 The Univariate Poisson Distribution • The univariate Poisson distribution provides the benchmark for Poisson regression. Let Y equal a random variable that represents the number of times that an event has occurred during an interval of time. Y will have a Poisson distribution with a parameter μ greater than 0:
expected number of counts that have The variance of Y equals p. The equality occurred: for the distribution this will also of the mean and the variance is known as be the mean thus count variables have If y=0, then Pr(r=0)=exp(u) greater than the mean, which is called overdispersion. Sometimes, therefore I y=l, then Pr(r=1)=exp(u )u the Poisson regression model is not If y=2, then Pr(=2)=exp(-u )u/2 entirely appropriate, often leading the analyst to the negative binomial regression If y=3, then Pr(r=3)=exp(u )u/6 model( to be discussed later) Some properties of a As g increases, the probability of Os Poisson distribution decreases. In a poisson distribution for F=0.8, the probability of an 0 is 0.45; As u increases the mass of the for p=1.5, the probability of an 0 is 0.22; distribution shifts to the right; we'llsee for !=2.9, the probability of an 0 is 0.05, this below in the sample graphs of the for A=10.5, the probability of an 0 is univariate Poisson distribution As u increases the Poisson distribution approximates a normal distribution
5 9 • where the parameter μ represents the expected number of counts that have occurred; for the distribution this will also be the mean. Thus, 10 Some properties of a Poisson distribution • As μ increases, the mass of the distribution shifts to the right; we’ll see this below in the sample graphs of the univariate Poisson distribution. 6 11 • The variance of Y equals μ. The equality of the mean and the variance is known as equidispersion. Actually in practice, many count variables have a variance greater than the mean, which is called overdispersion. Sometimes, therefore, the Poisson regression model is not entirely appropriate, often leading the analyst to the negative binomial regression model (to be discussed later). 12 • As μ increases, the probability of 0’s decreases. In a Poisson distribution, for μ = 0.8, the probability of an 0 is 0.45; for μ = 1.5, the probability of an 0 is 0.22; for μ = 2.9, the probability of an 0 is 0.05; for μ = 10.5, the probability of an 0 is 0.00002. • As μ increases, the Poisson distribution approximates a normal distribution
Here are four examples of univariate Poisson distributions, varying on their values of p The first Poisson distribution has u=0.8 The second. F=1.5. The third, F=2.9 The fourth F =10.5 The Stata commands to produce the Four Univariate Poisson Distributions: 0.8, 1.5. 2.9 and 10.5 A critical assumption of the Poisson distribution is that when an event occurs. it prmoumls pya, plot max 20) does not affect the probability of the event occurring in the future. If the "count is tombs pyb, plot max(20) children born to mothers the assumption of independence implies that when a woman has a baby born to her, it does not affect the probability of another baby being born to her prmoumts pyd, plot max(20) In demography, however, future fertility is not independent from past fertility, and rticularly in China, the next birth( abortion)is heavily dependent upon the graph, pyapreq pytpreq pyrpreq pydpreq pyaval, dlm) gap(3) n,"probabality") previous ones in the context of the strict family planning policy 8
7 13 • Here are four examples of univariate Poisson distributions, varying on their values of μ. • The first Poisson distribution has μ =0.8. The second, μ =1.5. The third, μ =2.9. The fourth, μ =10.5. • The Stata commands to produce the graph are as below: 14 8 15 16 • A critical assumption of the Poisson distribution is that when an event occurs, it does not affect the probability of the event occurring in the future. If the “count” is children born to mothers, the assumption of independence implies that when a woman has a baby born to her, it does not affect the probability of another baby being born to her. In demography, however, future fertility is not independent from past fertility, and particularly in China, the next birth (or abortion) is heavily dependent upon the previous ones in the context of the strict family planning policy
The Poisson Regression model The Poisson Regression Model Predicting the Number of Children In the Poisson regression model, the Ever born to chinese women number of events( the dependent varable) is a nonnegative integer; it has a Poisson We are going to modeling the number of distribution with a conditional mean that children ever born(CEB)to Chinese depends on the characteristics(the x women from the 1997 survey. Before doing variables)of the individuals according to the Poisson regression, we would be the following structural model wondering if the count data are Poisson A,=exp(a+b,,+b2x2++b, Xn) Thus we conducted an analysis of the count dependent variable to compare the In(u, )=a+6,X1+bx2++b, Xh observed distribution of the count data with a univariate poisson distribution with the same mean as the count data The Poisson regression model is a nonlinear model, predicting for each individual the The dependent variable is a count variable, number of times, that the event has namely, the number of children ever born occurred. The x variables are related to u a woman. the variable is called"CEB Here are descriptive data on this variable nonlinearly sample women
9 17 The Poisson Regression Model • In the Poisson regression model, the number of events (the dependent variable) is a nonnegative integer; it has a Poisson distribution with a conditional mean that depends on the characteristics (the X variables) of the individuals according to the following structural model: 18 The Poisson regression model is a nonlinear model, predicting for each individual the number of times, μ, that the event has occurred. The X variables are related to μ nonlinearly. 10 19 The Poisson Regression Model Predicting the Number of Children Ever Born to Chinese Women • We are going to modeling the number of children ever born (CEB) to Chinese women from the 1997 survey. Before doing the Poisson regression, we would be wondering if the count data are Poisson distributed. 20 • Thus we conducted an analysis of the count dependent variable to compare the observed distribution of the count data with a univariate Poisson distribution with the same mean as the count data. • The dependent variable is a count variable, namely, the number of children ever born to a woman. The variable is called “CEB”. Here are descriptive data on this variable for the sample women: