CONOMCS ELSEVIER Journal of Health Economics 21(2002)601-625 elsevier com/locate/ecobase The structure of demand for health care latent class versus two-part models Partha Deb a, * Pravin K. Trivedi a Department of Economics, IUPUl, Cavanaugh Hall516, 425 University boulevard. Indianapolis, IN 46202, USA b Department of Economics, Indiana University, Wylie Hall, Bloomington, IN 47405, USA Received I November 2000; accepted 1 January 2002 Abstra We contrast the two-part model(TPm)that distinguishes between users and non-users of health care, with a latent class model (lCm) that distinguishes between infrequent and frequent users. In model comparisons using data on counts of utilization from the RAND Health Insurance Experiment (RHIe), we find strong evidence in favor of the LCM. We show that individuals in the infrequent and frequent user latent classes may be described as being healthy and ill, respectively. Although sample averages of price elasticities, conditional means and event probabilities are not statistically different, the estimates of these policy-relevant measures are substantively different when calculated for hypo- thetical individuals with specific characteristics. C 2002 Elsevier Science B V. All rights reserved Keywords: Latent class model; Finite mixture model; Two-part model; Count data 1. Introduction This paper examines empirical strategies for modeling the demand for health services, measured as counts of utilization. The choice of the econometric framework has implications for a number of empirical issues of central importance in health economics, e.g. the price sensitivity of the demand for medical services, predicted use and the likelihood of being extensive users of services. The paper proposes an approach based on a finite mixture variant of the latent class model (LCM). The proposed approach is compared with the"standard two-part framework for modeling the demand for health care The literature on the demand for medical care analyzes either discrete measures, such as the number of physician or non-physician visits( Cameron et al., 1988; Pohlmeier and Ulrich, 1995; Deb and Trivedi, 1997; Gerdtham, 1997), or continuous measures such as Corresponding author. Tel:+1-317-274-5216 fax: +1-317-274-0097 E-mail address: pdeb(@iupui.edu(. Deb 0167-6296/02/S-see front matter C 2002 Elsevier Science B V. All rights reserved P:S0167-6296(02)00008-5
Journal of Health Economics 21 (2002) 601–625 The structure of demand for health care: latent class versus two-part models Partha Deb a,∗, Pravin K. Trivedi b a Department of Economics, IUPUI, Cavanaugh Hall 516, 425 University Boulevard, Indianapolis, IN 46202, USA b Department of Economics, Indiana University, Wylie Hall, Bloomington, IN 47405, USA Received 1 November 2000; accepted 1 January 2002 Abstract We contrast the two-part model (TPM) that distinguishes between users and non-users of health care, with a latent class model (LCM) that distinguishes between infrequent and frequent users. In model comparisons using data on counts of utilization from the RAND Health Insurance Experiment (RHIE), we find strong evidence in favor of the LCM. We show that individuals in the infrequent and frequent user latent classes may be described as being healthy and ill, respectively. Although sample averages of price elasticities, conditional means and event probabilities are not statistically different, the estimates of these policy-relevant measures are substantively different when calculated for hypothetical individuals with specific characteristics. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Latent class model; Finite mixture model; Two-part model; Count data 1. Introduction This paper examines empirical strategies for modeling the demand for health services, measured as counts of utilization. The choice of the econometric framework has implications for a number of empirical issues of central importance in health economics, e.g. the price sensitivity of the demand for medical services, predicted use and the likelihood of being extensive users of services. The paper proposes an approach based on a finite mixture variant of the latent class model (LCM). The proposed approach is compared with the “standard” two-part framework for modeling the demand for health care. The literature on the demand for medical care analyzes either discrete measures, such as the number of physician or non-physician visits (Cameron et al., 1988; Pohlmeier and Ulrich, 1995; Deb and Trivedi, 1997; Gerdtham, 1997), or continuous measures such as ∗ Corresponding author. Tel.: +1-317-274-5216; fax: +1-317-274-0097. E-mail address: pdeb@iupui.edu (P. Deb). 0167-6296/02/$ – see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S0167-6296(02)00008-5
P Deb, PK. Trivedi/Journal of Health Economics 21(2002)601-625 expenditures(Duan et al., 1983; Manning et al., 1987; Keeler et al., 1988; McCall et al 1991). In modeling the usage of medical services, the two-part model (TPm) has served as a methodological cornerstone of empirical analysis. The first part of the TPM is a binary outcome model that describes the distinction between non-users and users. The second part describes the distribution of use conditional on some use. modeled either as a continuous or integer-valued random variable. Although in health economics the TPM is used pre- dominantly to refer to models of health expenditures, the structure of the TPM is equally applicable for discrete or continuous outcomes. The TPM for count data is often referred to as a hurdle model The appeal of the TPM is partly driven by an important feature of the demand for medical care, which is the high incidence of zero usage. For example, approximately 30% of typical cross-sectional samples of non-institutionalized individuals in the US report no outpatient visits in the survey year. However, the TPM is well supported empirically, with explanatory variables often playing different roles in the two parts of the model. The appeal of the TPM in health economics is also based on its connection to a principal-agent model(see, for example, Zweifel, 1981) where the physician(the agent)determines utilization on behalf of the patient(the principal) once initial contact is made. The following quotes highlight the strength of this argument in the literature the decision to receive some care is largely the consumers, while the physician influences the decision about the level of care(Manning et al., 1987, p. 109) while at the first stage it is the patient who determines whether to visit the physician (contact analysis), it is essentially up to the physician to determine the intensity of the treatment(frequency analysis)(Pohlmeier and Ulrich, 1995, p. 340) where the first part relates to the patient who decides whether to contact the physician (contact decision)and the second to the decision about repeated visits and/or referrals which is determined largely by the preferences of the physician(frequency decision) Gerdtham, 1997, p. 308) This sharp dichotomy between users and non-users may be appealing in modeling data on episodes of medical care but this distinction may not be tenable in the case of typical cross-sectional datasets. In these data. health care events are recorded over a fixed time period(e.g. a year or a month) and not over an episode of illness. More generally, the first part of the TPM may be thought of as modelling the decision to initiate the first episode of treatment, while the second part is a combination of the patients decisions to initiate subsequent treatment and the physicians' decisions about the intensity of each of those episodes. Unless one believes that the initiation of the first episode of care during a fixed time period has special characteristics(relative to initiation of subsequent episodes), the appeal of the TPM may, in principle, be diminished A more tenable distinction for typical cross-sectional data may be between an"infrequent user"and a"frequent user" of medical care, the difference being determined by health status, attitudes to health risk, and choice of life-style. The LCM, in which there is no distinction between users and non-users of care, but which can distinguish between groups with high average demand and low average demand, therefore provides a better framework
602 P. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 expenditures (Duan et al., 1983; Manning et al., 1987; Keeler et al., 1988; McCall et al., 1991). In modeling the usage of medical services, the two-part model (TPM) has served as a methodological cornerstone of empirical analysis. The first part of the TPM is a binary outcome model that describes the distinction between non-users and users. The second part describes the distribution of use conditional on some use, modeled either as a continuous or integer-valued random variable. Although in health economics the TPM is used predominantly to refer to models of health expenditures, the structure of the TPM is equally applicable for discrete or continuous outcomes. The TPM for count data is often referred to as a hurdle model. The appeal of the TPM is partly driven by an important feature of the demand for medical care, which is the high incidence of zero usage. For example, approximately 30% of typical cross-sectional samples of non-institutionalized individuals in the US report no outpatient visits in the survey year. However, the TPM is well supported empirically, with explanatory variables often playing different roles in the two parts of the model. The appeal of the TPM in health economics is also based on its connection to a principal-agent model (see, for example, Zweifel, 1981) where the physician (the agent) determines utilization on behalf of the patient (the principal) once initial contact is made. The following quotes highlight the strength of this argument in the literature: ... the decision to receive some care is largely the consumer’s, while the physician influences the decision about the level of care (Manning et al., 1987, p. 109). ... while at the first stage it is the patient who determines whether to visit the physician (contact analysis), it is essentially up to the physician to determine the intensity of the treatment (frequency analysis) (Pohlmeier and Ulrich, 1995, p. 340). ... where the first part relates to the patient who decides whether to contact the physician (contact decision) and the second to the decision about repeated visits and/or referrals, which is determined largely by the preferences of the physician (frequency decision) (Gerdtham, 1997, p. 308). This sharp dichotomy between users and non-users may be appealing in modeling data on episodes of medical care but this distinction may not be tenable in the case of typical cross-sectional datasets. In these data, health care events are recorded over a fixed time period (e.g. a year or a month) and not over an episode of illness. More generally, the first part of the TPM may be thought of as modelling the decision to initiate the first episode of treatment, while the second part is a combination of the patient’s decisions to initiate subsequent treatment and the physicians’ decisions about the intensity of each of those episodes. Unless one believes that the initiation of the first episode of care during a fixed time period has special characteristics (relative to initiation of subsequent episodes), the appeal of the TPM may, in principle, be diminished. A more tenable distinction for typical cross-sectional data may be between an “infrequent user” and a “frequent user” of medical care, the difference being determined by health status, attitudes to health risk, and choice of life-style. The LCM, in which there is no distinction between users and non-users of care, but which can distinguish between groups with high average demand and low average demand, therefore provides a better framework
P. Deb, PK Trivedi/Journal of Health Economics 21(2002)601-625 We hypothesize that the underlying unobserved heterogeneity which splits the population into latent classes is based on an individual's latent long-term health status. Proxy variables such as self-perceived heal th status and chronic health conditions may not fully capture population heterogeneity from this source. Consequently, in the case of two latent sub- populations, a distinction may be made between the "healthy"and the "ill groups, whose demands for medical care are characterized by low mean and high mean, respectively From a statistical point of view, the TPM is also a finite mixture with a degenerate component. It combines zeros from a binomial density with the positives from a zero- truncated density. The LCM is more flexible because it permits mixing with respect to both zeros and positives. While the TPM and LCM are clearly related, they are not nested Hence it is not a priori clear which model would perform better empirically. In a study of medical care demand by the elderly (Deb and Trivedi, 1997)find that the LCM is superior to the TPM. In other empirical work Cameron and Trivedi(1998), it is shown that the TPM describes the number of recreational trips taken by individuals better than the lCm A careful comparison of the LCM and TPM is useful from a policy perspective. The TPM has been used extensively to estimate demand responses to prices, income and changes in insurance status. The results have been used to propose changes in health insurance design. Statistics of interest in many such policy exercises are non-linear functions of the underlying parameters of the conditional mean function. Therefore, consistent estimation of the conditional mean function does not ensure consistent estimates of the statisties of interest for policy exercises, see Mullahy(1998)for a detailed discussion. Estimating a model that fits the empirical distribution adequately does, on the other hand, ensure that such statistics will be estimated consistently. Moreover, if in fact the TPM is dominated by the lCm, the accumulated evidence in favor of the TPM, interpreted as evidence in favor of a principal-agent framework, is ambiguous. Policies based on the principal-agent framework might, therefore, have unintended consequences Finally, both the TP and the lCM require that the investigator specifies the probability distribution of the data. Although this is a potential source of misspecification in both cases, its impact is smaller in the case of the LCM. This is because LCM is more flexible and can serve as a better approximation to any true, but unknown, probability density (laird, 1978, Heckman and Singer, 1984). Its growing popularity is reflected in an increase in the number of regression-based applications in econometrics. Recent applications include Heckman et al. (1990), Gritz(1993), Wedel et al. (1993), Deb and Trivedi (1997), geweke and Keane (1997), Morduch and Stern(1997), and Wang et al. (1998) We use data from the RAND Health Insurance Experiment(RHIE). The RHIe is one of the largest social experiments ever completed, generating over 400 research studies by members of the RANd group(Newhouse et al., 1993). It is widely regarded as the basis of the most reliable estimates of price sensitivity of demand for medical services. For example, Burtless(1995, p. 82) has stated: "The Health Insurance Experiment improved our knowledge about the price sensitivity of demand for medical services in a way that no non-experimental study has been able to match". Therefore, the public-use data from the RHIE provide a suitable test-bed for our proposed investigations We examine two measures of counts of utilization. The covariates are among those commonly used in studies of health care demand
P. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 603 We hypothesize that the underlying unobserved heterogeneity which splits the population into latent classes is based on an individual’s latent long-term health status. Proxy variables such as self-perceived health status and chronic health conditions may not fully capture population heterogeneity from this source. Consequently, in the case of two latent subpopulations, a distinction may be made between the “healthy” and the “ill” groups, whose demands for medical care are characterized by low mean and high mean, respectively. From a statistical point of view, the TPM is also a finite mixture with a degenerate component. It combines zeros from a binomial density with the positives from a zerotruncated density. The LCM is more flexible because it permits mixing with respect to both zeros and positives. While the TPM and LCM are clearly related, they are not nested. Hence it is not a priori clear which model would perform better empirically. In a study of medical care demand by the elderly (Deb and Trivedi, 1997) find that the LCM is superior to the TPM. In other empirical work Cameron and Trivedi (1998), it is shown that the TPM describes the number of recreational trips taken by individuals better than the LCM. A careful comparison of the LCM and TPM is useful from a policy perspective. The TPM has been used extensively to estimate demand responses to prices, income and changes in insurance status. The results have been used to propose changes in health insurance design. Statistics of interest in many such policy exercises are non-linear functions of the underlying parameters of the conditional mean function. Therefore, consistent estimation of the conditional mean function does not ensure consistent estimates of the statistics of interest for policy exercises; see Mullahy (1998) for a detailed discussion. Estimating a model that fits the empirical distribution adequately does, on the other hand, ensure that such statistics will be estimated consistently. Moreover, if in fact the TPM is dominated by the LCM, the accumulated evidence in favor of the TPM, interpreted as evidence in favor of a principal-agent framework, is ambiguous. Policies based on the principal-agent framework might, therefore, have unintended consequences. Finally, both the TPM and the LCM require that the investigator specifies the probability distribution of the data. Although this is a potential source of misspecification in both cases, its impact is smaller in the case of the LCM. This is because LCM is more flexible and can serve as a better approximation to any true, but unknown, probability density (Laird, 1978; Heckman and Singer, 1984). Its growing popularity is reflected in an increase in the number of regression-based applications in econometrics. Recent applications include Heckman et al. (1990), Gritz (1993), Wedel et al. (1993), Deb and Trivedi (1997), Geweke and Keane (1997), Morduch and Stern (1997), and Wang et al. (1998). We use data from the RAND Health Insurance Experiment (RHIE). The RHIE is one of the largest social experiments ever completed, generating over 400 research studies by members of the RAND group (Newhouse et al., 1993). It is widely regarded as the basis of the most reliable estimates of price sensitivity of demand for medical services. For example, Burtless (1995, p. 82) has stated: “The Health Insurance Experiment improved our knowledge about the price sensitivity of demand for medical services in a way that no non-experimental study has been able to match”. Therefore, the public-use data from the RHIE provide a suitable test-bed for our proposed investigations. We examine two measures of counts of utilization. The covariates are among those commonly used in studies of health care demand
P Deb, PK. Trivedi/Journal of Health Economics 21(2002)601-625 The rhie data are most suitable for our work in spite of the fact that they are considerably older than other nationally representative surveys like the National Medical Expenditure Survey of 1987, the National Health Interview Survey of 1994 or the Medical Expenditure Panel Survey of 1997. First, the rhie dataset is the only one in which individuals were randomized into insurance plans, thus making insurance choice exogenous. Endogeneity of insurance choice is a major problem in non-experimental data; even in cases where suitable instruments exist, they are typically weak thus making statistical corrections fo endogeneity unreliable. Second, RAND researchers gave careful consideration to issues of attrition bias and other sources of"sample contamination"which affect some social experiments(Newhouse et al., 1993, chapter 2; Heckman and Smith, 1995) In the following section of the paper, we formally present the competing models used in this paper and discuss model comparison, selection, and evaluation strategies. The data are described in Section 3. Empirical results are reported in Section 4, and we conclude in 2. Econometric models We develop models for counts of outpatient visits using the LCM and TPM frameworks Both are derived from the negative binomial model (NBM) for count data, so we begin by describing that model 2.. NBM Let yi be a count dependent variable that takes values 0, 1, 2,... The density function for the NBM is given by f(yi|6)= T((y λ+v)(x;+v where ro is the gamma function, i exp(x B)and the precision parameter(vi)is specified as vi=(1/ a)aj. The parameter a >0 is an overdispersion parameter and k is an arbitrary constant. In this specification, the conditional mean is given by E(ylx)=入i nd the variance by Voil ri)=Ai+aii The parameter k is usually held fixed in empirical work. The NBl model is obtained by specifying k= I while the NB2 is obtained by setting k=0 2.2. TPM We choose a nB density to construct the TPM because we wish to focus on the differences between a statistical structure that distinguishes infrequent and frequent users(LCM)from
604 P. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 The RHIE data are most suitable for our work in spite of the fact that they are considerably older than other nationally representative surveys like the National Medical Expenditure Survey of 1987, the National Health Interview Survey of 1994 or the Medical Expenditure Panel Survey of 1997. First, the RHIE dataset is the only one in which individuals were randomized into insurance plans, thus making insurance choice exogenous. Endogeneity of insurance choice is a major problem in non-experimental data; even in cases where suitable instruments exist, they are typically weak thus making statistical corrections for endogeneity unreliable. Second, RAND researchers gave careful consideration to issues of attrition bias and other sources of “sample contamination” which affect some social experiments (Newhouse et al., 1993, chapter 2; Heckman and Smith, 1995). In the following section of the paper, we formally present the competing models used in this paper and discuss model comparison, selection, and evaluation strategies. The data are described in Section 3. Empirical results are reported in Section 4, and we conclude in Section 5. 2. Econometric models We develop models for counts of outpatient visits using the LCM and TPM frameworks. Both are derived from the negative binomial model (NBM) for count data, so we begin by describing that model. 2.1. NBM Let yi be a count dependent variable that takes values 0, 1, 2,... The density function for the NBM is given by f (yi|θ) = Γ (yi + ψi) Γ (ψi)Γ (yi + 1) ψi λi + ψi ψi λi λi + ψi yi (2.1) where Γ (·) is the gamma function, λi = exp(x iβ) and the precision parameter (ψ−1 i ) is specified as ψi = (1/α)λk i . The parameter α > 0 is an overdispersion parameter and k is an arbitrary constant. In this specification, the conditional mean is given by E(yi|xi) = λi (2.2) and the variance by V(yi|xi) = λi + αλ2−k i . (2.3) The parameter k is usually held fixed in empirical work. The NB1 model is obtained by specifying k = 1 while the NB2 is obtained by setting k = 0. 2.2. TPM We choose a NB density to construct the TPM because we wish to focus on the differences between a statistical structure that distinguishes infrequent and frequent users (LCM) from
P. Deb, PK Trivedi/Journal of Health Economics 21(2002)601-625 one that distinguishes non-users and users(TPM) while minimizing all other sources of variation. From the nB density shown in Eq(2. 1 )one can derive the probability of being a non-user as Pr1(y=0x,61) where the subscript 1 denotes parameters associated with the first part of the TPM, A1.i exp(x'P1) and (1/ a1)ii. The probability of being a user is calculated as(1- PrI(i=OLxi, 01)). The first part involves only binary information so the parameters(B1) of the mean function and the parameter aI are not separately identifiable. We set a1=1 without loss of generality In the second part of the TPM, the distribution of utilization conditional on some us is assumed to follow a truncated NB distribution. After some algebraic manipulation, one gets r(y+v)「/x2;+v2吻 f2(ylx,y>0,62)= T(2 i/(i+1) 入2,+v2,i as the conditional density of use. Note that, al though the first and second parts are derived from the nB density, the parameters are allowed to be different The first and second parts of the TPM enter multiplicatively in the likelihood functio Therefore, the likelihood function associated with the binary choice can be maximized separately from the second part, which is estimated using the truncated subsample of positive observations of yi. The mean of the count variable in this TPM is given by E(w|x)=Pr1y>0x,61)2 Pr and the variance by Pr1(y>0|x,61) 2-k Pr1(y>0|x;,61) Pr2(y7>0|x;,62) Pr2(y>0|x;,62) Both the mean and the variance in the TPM are, in general, different from their standard NB counterparts. The TPM can accommodate over and underdispersed data relative to the NBM I Although we have chosen to derive both parts of the hurdle model from parent NB distributions, we rece that users may sometimes choose to estimate the binary choice part using more familiar logit or probit models This choice is typically not significant, because, as is commonly known, the exact choice of distribution in binary choice models makes very little difference to the estimated probabilities In our case, we have also estimated logit models with almost identical results
P. Deb, P.K. Trivedi / Journal of Health Economics 21 (2002) 601–625 605 one that distinguishes non-users and users (TPM) while minimizing all other sources of variation. From the NB density shown in Eq. (2.1)one can derive the probability of being a non-user as Pr1(yi = 0|xi, θ 1) = ψ1,i λ1,i + ψ1,i ψ1,i , (2.4) where the subscript 1 denotes parameters associated with the first part of the TPM, λ1,i = exp(x iβ1) and ψ1,i = (1/α1)λk 1,i. The probability of being a user is calculated as (1 − Pr1(yi = 0|xi, θ 1)). The first part involves only binary information so the parameters (β1) of the mean function and the parameter α1 are not separately identifiable. We set α1 = 1 without loss of generality. In the second part of the TPM, the distribution of utilization conditional on some use is assumed to follow a truncated NB distribution. After some algebraic manipulation, one gets f2(yi|xi, yi > 0, θ 2) = Γ (yi + ψ2,i) Γ (ψ2,i)Γ (yi + 1) λ2,i + ψ2,i ψ2,i ψ2,i − 1 −1 × λ2,i λ2,i + ψ2,i yi (2.5) as the conditional density of use.1 Note that, although the first and second parts are derived from the NB density, the parameters are allowed to be different. The first and second parts of the TPM enter multiplicatively in the likelihood function. Therefore, the likelihood function associated with the binary choice can be maximized separately from the second part, which is estimated using the truncated subsample of positive observations of yi. The mean of the count variable in this TPM is given by E(yi|xi) = Pr1(yi > 0|xi, θ 1) Pr2(yi > 0|xi, θ 2) λ2,i (2.6) and the variance by V(yi|xi)= Pr1(yi > 0|xi, θ 1) Pr2(yi > 0|xi, θ 2) λ2,i+α2λ2−k 2,i + 1− Pr1(yi > 0|xi, θ 1) Pr2(yi > 0|xi, θ 2) λ2 2,i . (2.7) Both the mean and the variance in the TPM are, in general, different from their standard NB counterparts. The TPM can accommodate over and underdispersed data relative to the NBM. 1 Although we have chosen to derive both parts of the hurdle model from parent NB distributions, we recognize that users may sometimes choose to estimate the binary choice part using more familiar logit or probit models. This choice is typically not significant, because, as is commonly known, the exact choice of distribution in binary choice models makes very little difference to the estimated probabilities. In our case, we have also estimated logit models with almost identical results.