128 N.H.Stern,Optimum income taxation d op te rginal ta which are,on the wh lower than one might h ve pr edicted Indeed Mirrlee (1971,D.207)remarked I must confess that I had expe ted the rigorou analysis of income-taxation in the utilitarian manner to provide arguments for high tax rates.It has not done so."10 A partial response to these results has been the use of strongly egalitarian('highly concave')social welfare functions and the limiting case the 'maxi-min welfare function.'1 We shall suggest later that there is no need to use these more extreme social welfare functions to obtain tax rates that se closer to obse and that one has m erely to use labour supply functio seem ose to those which ar ally estimated for the moment,however,we give a brief sketch results and. in the process,set out the model of income taxat The original work on the current models of income taxation was that of Mirrlees (1971).In his model individuals supply labour of different qualities and hence face different pre-tax wage rates.They choose how much to supply by maximising u(c.D)subject to c=g(nlw),where c is consumption,I the hours worked.nw the hourly wage of an n-man-he produces n efficiency hours per hour worked- v is the ag per effciency hou and g()the tax function giving -tax inc come as a fu on ol pre-t aggregate prod ction is (n)dn =H(S nlf(n)dn) H(Z),where X(total consumption)is a functi H(Z)of effecti 2,and f(n)is the density of the distribution of individuals.The problem is to vary g()to maximise f G(u)fdn,where G()is a concave function and the constraints are that the amounts individuals choose to supply of labour and consume of okeamettmhaeaoaoRoghenoanht。eocpehia identify an n-man without affecting his behaviour,then opti mum can levying an appropriate lu ump sum ta for each n with a ze ma Mirrlees provided detaile d calculations for the case here u(c,1) =log c+ log(1-D),n distributed lognormally (parameters of the ass iated norma distribution being i and a),H linear and G(u)=u or -e".Using a value of =0.39,derived from the work of Lydall,12 on the distribution of earnings,he obtained median marginal tax rates for the case of G(u)=u of 22%and 2013 The higher rate was for the case where 7%of product was required by the me and the lower where 17%could be added-the additions or rresponding respe ctively to ases where profits or revenues 100 ent taxation.One is initially surpr 1073 See Lydall (1968)and Mirrlees(1971) Interpolated from Mirrlees (1971,table es I-IV,p.202)
128 N.H. Stern, Optimum income luxation work. The main purpose of this section is to draw attention to the levels of calculated optimum marginal tax rates, in models similar to those of section 4 which are, on the whole, lower than one might have predicted. Indeed Mirrlees (1971, p. 207) remarked ‘. . ., I must confess that I had expected the rigorous analysis of income-taxation in the utilitarian manner to provide arguments for high tax rates. It has not done ~0.‘~’ A partial response to these results has been the use of strongly egalitarian (‘highly concave’) social welfare functions and the limiting case the ‘maxi-min welfare function.‘” We shall suggest later that there is no need to use these more extreme social welfare functions to obtain tax rates that seem closer to observed rates, and that one has merely to use labour supply functions which seem closer to those which are usually estimated. For the moment, however, we give a brief sketch of these earlier results and, in the process, set out the model of income taxation to be used later. The original work on the current models of income taxation was that of Mirrlees (1971). In his model individuals supply labour of different qualities and hence face different pre-tax wage rates. They choose how much to supply by maximising u(c, I) subject to c = g(nlw), where c is consumption, I the hours worked, nw the hourly wage of an n-man - he produces n efficiency hours per hour worked - w is the wage per efficiency hour and g( a) the tax function giving post-tax income as q function of pre-tax income. The aggregate production constraint is X = J cf(n) dn = H(J n&z) dn) = H(Z), where X (total consumption) is a function H(Z) of effective labour 2, and f(n) is the density of the distribution of individuals. The problem is to vary g( .) to maximise J G(u)f dn, where G(a) is a concave function and the constraints are that the amounts individuals choose to supply of Iabour and consume of goods be compatible with the production relation. Note that the formulation involves taxation of nlw and does not require (nw) and I to be separately observable. If one can identify an n-man without affecting his behaviour, then the first-best optimum can be achieved by levying an appropriate lump sum tax for each n with a zero marginal rate of taxation. Mirrlees provided detailed calculations for the cases where u(c, I) = log c+ log (1 -I), n distributed lognormally (parameters of the associated normal distribution being F and a), H linear and G(u) = u or -e-“. Using a value of 0 = 0.39, derived from the work of Lydall,” on the distribution of earnings, he obtained median marginal tax rates for the case of G(u) = u of 22% and 20 “/ ’ 3 The higher rate was for the case where 7 y0 of product was required by the iovernment and the lower where 17 % could be added - the additions or subtractions corresponding respectively to cases where profits or revenues loThe utilitarian optimum ignoring incentives involves 100 percent taxation. One is initially surprised therefore when the introduction of incentives drops the rate down to 20 percent. %ke Atkinson (1972). ‘?3ee Lydall(l968) and Mirrlees (1971). 131nterpolated from Mirrlees (1971, tables I-IV, p. 202)
N.H.Stern,Optimum income taxation 129 elsewhere outweighed or were outweighed by fixed costs.or necessary expendi- ture. With a net exp nditure of 12%of product and G(u) -e", the median margir.rate rises to33 The highest marginal rates for the three cases respectively are 26,21% and 39.The marginal rates rise at first but begin falling before the median is reached.Mirrlees proves that,for the log-normal distribution and where the elasticity of substitution between consumption and leisure is less than one,the marginal rate tends to zero as n tends to co.There is a higher limit in the case of the pareto distribution where with the same condition on the substitution elasticity the marginal rate tends to 1/l+asn→o when("→-+2), Examin tion stribu of earnings(see sectio 0n3.2 suggests values of from.5 to2.5 giving limiting marginal rates Higher rates can also be produced by widening the distributi ion of skills in the log normal case is increased to 1.0(from 0.39),the median rate is 56% for the case G(u)=-e-"and a government requirement of 7%of product. Presumably with a wider distribution of skills,inequality considerations increase relative to those concerned with incentives.However.Mirrlees (1971.p.207) ts that such a a'doe not n to be at all realistic skills too vid compatible with obs earnings There are two main features of the cal tonically increasing-most of the population is in the region where they are falling-and the highest marginal rates are low.For the 'realistic'case of 0.39,applying to 5 out of 6 of Mirrlees'examples,the highest marginal rate is39 (1972)and (1973a)discusses the effect of increasing the concavity of G()a ting case of It seem clear t ha was in a influenced by the low rates in th e Mirrlees calculation tkinson(19 p.2)and (1973a,pp.390-391).The maximin criterion in the Mirrlees mod yields tax rates around 50%for the median person [see Atkinson (1972,p.28)]. We have already given the Sadka argument which explains why,for a finite 2Smanyaige7gomenat妆Romiom infinite domain,p ovided the weight in the tail is not too big.We have noted. for exar nple,that he log-norm a limitin m nal rate of ze o but the Pareto no zero limi of the marginal rat dis suggests that a declining rate at the upper end may be a feature of many del of optimum income taxation.We shall say no more (except for the spec ial case of section 5)about the shape of the tax funtion.and concentrate on the labour supply function and its relation to optimum linear taxation. otleitiobiohgm32whetherthedstnbutionotearninaseamiteadingimpresop These are annou ates rather than effective rates
N.H. Stern, Optimum income taxation 129 elsewhere outweighed or were outweighed by fixed costs, or necessary expenditure. With a net government expenditure of 12% of product and G(u) = -emu, the median margirJ rate rises to 33 %. The highest marginal rates for the three cases respectively are 26 %, 21% and 39%. The marginal rates rise at first but begin falling before the median is reached. Mirrlees proves that, for the log-normal distribution and where the elasticity of substitution between consumption and leisure is less than one, the marginal rate tends to zero as n tends to co. There is a higher limit in the case of the Pareto distribution where, with the same condition on the substitution elasticity, the marginal rate tends to l/(1 +y) as n + co when (nf’/‘) + - (7 +2). Examination of distributions of earnings (see section 3.2) suggests values of y from 0.5 to 2.5 giving limiting marginal rates from 67 % to 29 %. Higher rates can also be produced by widening the distribution of skills -if ts in the log normal case is increased to 1.0 (from 0.39), the median rate is 56% for the case G(u) = -e-” and a government requirement of 7% of product. Presumably with a wider distribution of skills, inequality considerations increase relative to those concerned with incentives. However, Mirrlees (1971, p. 207) suggests that such a cr ‘does not seem to be at all realistic. . .’ since it gives a dispersion of skills too wide to be compatible with observed distributions of earnings.r4 There are two main features of the calculated tax schedules which look different from actual income tax structures.’ 5 Marginal rates are not monotonically increasing - most of the population is in the region where they are falling - and the highest marginal rates are low. For the ‘realistic’ case of IJ = 0.39, applying to 5 out of 6 of Mirrlees’ examples, the highest marginal rate is 39%. Atkinson (1972) and (1973a) discusses the effect of increasing the concavity of G(a) and the limiting case of maximin. It seems clear that he was in part influenced by the low rates in the Mirrlees calculations - see Atkinson (1972, p. 2) and (1973a, pp. 390-391). The maximin criterion in the Mirrlees model yields tax rates around 50 % for the median person [see Atkinson (1972, p. 28)]. We have already given the Sadka argument which explains why, for a finite population, we should expect zero marginal tax rates at the top of the distribution. This argument may also have some intuitive force for distributions with an infinite domain, provided the weight in the tail is not too big. We have noted, for example, that the log-normal gives a limiting marginal rate of zero but the Pareto does not. The zero limit of the marginal rate for certain distributions suggests that a declining rate at the upper end may be a feature of many models of optimum income taxation. We shall say no more (except for the special case of section 5) about the shape of the tax funtion, and concentrate on the labour supply function and its relation to optimum linear taxation. I’IWe discuss in section 3.2 whether the distribution of earnings gives a misleading impression of the distribution of skills. lSThese are announced rates rather than effective rates
130 N.H.Stern,Oprtmum income taxatton 3.The estimation of supply functions and skill distributions 3.1.Supply functions whereotimtion ha deal evely wth siuation The y ave ning capacity.One can also imagine cases where individ als differ in their preferen relations but face the same earnings function which is determined,as far as they are concerned,exogenously.In this subsection we shall be discussing such alternative specifications,and the different problems they pose for estimation. We shall supposc,for the moment (but see section 3.2)that the number of hours of work is the appropriate argument of an individual's utility function and that the pre-tax wag es the skill or efficiency of a worker per hour of mation (but not taxation)purposes we suppose that the wage are separ rately obser To make some of our formulae explicit we shall consider utility fun ctions of the constant elasticity of substitution(CES)form,although it is clear that many of the problems we shall discuss do not depend on the particular form of the utility function. We suppose an individual maximises (c,D=[1-a)c-“+x(L-0)-]1e, (1) subject to the budget constraint c=A+(nw)l. (2) We thus have a linear tax schedule.The individual is characterised by the triple (h,n,L)and one could consider a distribution of this triple over the population. We shall be discussing some special cases.We should think of L as the number of hours available to the individual for allocation between work and leisure,given his family commitments,sleeping requirements,physical attributes and so on. The parameter h measures the ability to enjoy leisure and n the ability to produce efficiency hours of work from clock hours.DifTerent specifications of the rela- tions betweenh,nand L may lead to very different interpretations of data on condition for maximisation of utility subject to the budget constraint is (3) where =1/(1+u). The comments of A.B.Atkinson on this subsection were particularly useful
130 N.H. Stern, Optimum income taxation 3. The estimation of supply functions and skill distributions 3.1. Supply functions’6 The work on optimum income taxation has dealt exclusively with situations where individuals have the same preference relation but differ in their earnings capacity. One can also imagine cases where individuals differ in their preference relations but face the same earnings function which is determined, as far as they are concerned, exogenously. In this subsection we shall be discussing such alternative specifications, and the different problems they pose for estimation. We shall suppose, for the moment (but see section 3.2) that the number of hours of work is the appropriate argument of an individual’s utility function and that the pre-tax wage measures the skill or efficiency of a worker per hour of work. For estimation (but not taxation) purposes we suppose that the wage and hours are separately observable. To make some of our formulae explicit we shall consider utility functions of the constant elasticity of substitution (CES) form, although it is clear that many of the problems we shall discuss do not depend on the particular form of the utility function. We suppose an individual maximises u(c, I) = [(l-U) c-“+cc(h(L-I))-“]-“, (1) subject to the budget constraint c = A+(nw)Z. (2) We thus have a linear tax schedule. The individual is characterised by the triple (h, n, L) and one could consider a distribution of this triple over the population. We shall be discussing some special cases. We should think of L as the number of hours available to the individual for allocation between work and leisure, given his family commitments, sleeping requirements, physical attributes and so on. The parameter h measures the ability to enjoy leisure and n the ability to produce efficiency hours of work from clock hours. Different specifications of the relations between h, n and L may lead to very different interpretations of data on wages and hours. The first-order condition for maximisation of utility subject to the budget constraint is (A + WY> P-"(L-Z) = [_.~J, (3) where E = l/(1 +I*). ‘@like comments of A.B. Atkinson on this subsection were particularly useful
N.H.Stern,Optimum income taxation 131 In the Mirrlees case individuals have identical preferences so that h and L are constant over the population.Puttingh=1 and taking logarithms,we have 1os(t)=eo (+los() (4) We see immediately that where the total quantity of hours available(L)is known or specified,we can estimate e and a by regressing consumption per hour of leisure on the wage rate(mw). (nw Fig.1 Note that our assumption of identical preferences enables us to identify the supply functio on by mer plotting the rela on bety een I and the rate per clock hour(nw)(see fig.1).This form ,especiall convenient for estimation purposes (see section 3.3).The skill d then given by the distribution of wage rates. The above procedure is very sensitive to the assumption of identical preferences.We give two examples to illustrate this point.First,suppose L is onstant in the population but h=n.In other words,individuals have identical vailable hours but those who produce more efficiency hours of work obtain a similar hour of leisure.And supp ose for the sake of illustration,that -0.We see from (3)that is inde ent of In othe words,everyone work s the same numbe hu might fer,or seeing a distribution of wages and no curve was inelastic when an increase in w(the wage per efficiency hour)would change hours worked
N.H. Stern, Optimum income taxation 131 In the Mirrlees case individuals have identical preferences so that h and L are constant over the population. Putting h = 1 and taking logarithms, we have 1% = &log(nw)+&log (4) We see immediately that where the total quantity of hours available (L) is known or specified, we can estimate E and a by regressing consumption per hour of leisure on the wage rate (nw). 0-4 A @p$q - - - - - i j .-_ _ (nzw) - - - - - Fig. 1 Note that our assumption of identical preferences enables us to identify the supply function by merely plotting the relation between I and the post-tax wage rate per clock hour (nw) (see fig. 1). This formulation is, therefore, especially convenient for estimation purposes (see section 3.3). The skill distribution is then given by the distribution of wage rates. The above procedure is very sensitive to the assumption of identical preferences. We give two examples to illustrate this point. First, suppose L is constant in the population but h = n. In other words, individuals have identical available hours but those who produce more efficiency hours of work obtain a similarly increased satisfaction per hour of leisure. And suppose, for the sake of illustration, that A = 0. We see from (3) that I is independent of n. In other words, everyone works the same number of hours. Thus we might infer, on seeing a distribution of wages and no variation of hours, that the supply curve was inelastic when an increase in w (the wage per efficiency hour) would change hours worked
132 N.H.Stern,Optimum tncome taxation A second example has been used by Hall (1974)He supposes but pop He deals in icular with the case where e=1 (equiva alent toμ= 0 or u(c,)= - tha we e have (L-1)= a(4+wL)/w.He assumesL=(1-0)I the beta density on f()=6001-0).He applies the model to the Penn-New Jersey negative income tax (NIT)experiment.Families were offered a choice between (Ao,wo) (participation)and (A,w)(nonparticipation)with Ao A,wo <w.His model predicts both participation rates and changes in hours given participation fairly well.Hall argues that the representative individual is not a sensible concept see 、dis ersion of ho ars worked for a given(A,w),and that a theory of labour supply should account for this dispersion 3.2.Some problems of estimating the skill distribution In the previous subsection we suppose for our discussion of estimation that nw and were separately observable.We had been interpreting I as clock-hours and regarding/as the relevant argument for the disutility of labour and as the sure The probler ism re mplicated than this wever .BoRhdisutiiyandprod ty of lab imarily of the effort required rather than the num ours,although the atter is obviously of importance.In the absence of a direct measurement o should discuss estimation problems when we can observe nwl,total pre-tax labour income,and not nw and separately.Here we interpret as effort. There is one special formulation'7 which makes the problem disappear. If individuals maximise(1-a)log c+a log(1-1)subject to c =a(nwl),where and 6 define the tax function then l is constant and (pre-tax)incomes are ant times n We can,therefore,read off the distribution of skills from tribution of labour in not directly obs ervable the assumption that it is consta nt is not violate though we can ate It is clear,however,that the trick is rather special and will not work for r mor general utility and tax functions. In general then,if is not dircctly observable,we cannot pass from a distribu tion of labour income to a distribution of n unless we have full knowledge of the utility function and the tax function,when can be deduced.We can,however, ax schedule changes.We illustrate this as follows.Put =1 in (3),and we have (nwl)=(1-@)nwL-aA. ⑤ We can now use(5)to estimate.Let us suppose that the current post-tax wage 1The formation was used by Vickrey(1947)and Bevan(1974)
132 N.H. Stern, Optimum income taxation A second example has been used by Hall (1974). He supposes h = n = 1 but L varies in the population. He deals in particular with the case where E = 1 (equivalent to p = 0 or U(C, I) = c’-“(L-I)‘) so that we have (L-I) = a(_4 + wL)/w . He assumes L = (1 - f3)L, where 0 has the beta density on [O, 11: f(0) = 60( I- 0). He applies the model to the Penn-New Jersey negative income tax (NIT) experiment. Families were offered a choice between (A, wO) (participation) and (A, w) (nonparticipation) with A0 > A, w, c w. His model predicts both participation rates and changes in hours given participation fairly well. Hall argues that the representative individual is not a sensible concept when we see a dispersion of hours worked for a given (A, w), and that a theory of labour supply should account for this dispersion. 3.2. Some problems of estimating the skill distribution In the previous subsection we suppose for our discussion of estimation that nw and I were separately observable. We had been interpreting 1 as clock-hours and regarding I as the relevant argument for the disutility of labour and as the basis of the productivity measure. The problem is more complicated than this, however. Both disutility and productivity of labour may be a function primarily of the effort required rather than the number of hours, although the latter is obviously of importance. In the absence of a direct measurement of effort we should discuss estimation problems when we can observe nwl, total pre-tax labour income, and not nw and 1 separately. Here we interpret I as effort. There is one special formulation” which makes the problem disappear. If individuals maximise (1 -a) log c+a log (l-1) subject to c = a(nwi)‘, where a and S define the tax function, then 1 is constant and (pre-tax) incomes are distributed as a constant times n. We can, therefore, read off the distribution of skills from the distribution of labour income. Since I is not directly observable, the assumption that it is constant is not violated, although we cannot estimate tl. It is clear, however, that the trick is rather special and will not work for more general utility and tax functions. In general then, if I is not directly observable, we cannot pass from a distribution of labour income to a distribution of n unless we have full knowledge of the utility function and the tax function, when I can be deduced. We can, however, gain information on the utility function and skill distribution separately if the tax schedule changes. We can illustrate this as follows. Put E = 1 in (3), and we have (nwl) = (l-a)nwL-&4. (5) We can now use (5) to estimate CL. Let us suppose that the current post-tax wage “The formation was used by Vickrey (1947) and Bevan (1974)