Taxation Based on Laszlo Goerke’s 2008 lecture in Tübingen Lion Hirth Eberhard-Karls-Universität Tübingen email@example.com TEXcode available on request this document is published under GFDL 3. April 2008 INHALTSVERZEICHNIS 2 Inhaltsverzeichnis I Foundations 3 1 What are taxes and what are they for? . . . . . . . . . . . . . . . . . . . . . . . 3 2 German Tax revenues in comparison . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Pareto-Efﬁciency and Social Welfare Functions . . . . . . . . . . . . . . . . . . 4 3.1 Conditions for Pareto-Efﬁciency . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 Market Outcome and Market failure . . . . . . . . . . . . . . . . . . . . . 7 3.3 Social Welfare Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4 Welfare Effects of Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 II Tax Incidence 10 1 One Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1 Speciﬁc vs. Ad Valorem Tax . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Invariance of legal Incidence . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Determinants of the incidence . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Market Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Invariance of legal Incidence . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Determinants of the incidence . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Speciﬁc vs. Ad Valorem Tax . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 One-sector General Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 Taxes on Factor Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Taxes on Output, Income, and Consumption . . . . . . . . . . . . . . . 19 4 General Equilibrium Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 III Optimal Commodity Taxation 22 1 Lump-Sum Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 No Distortion means No Revenues . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Requirements for Pareto-Efﬁciency . . . . . . . . . . . . . . . . . . . . . 23 2.2 Constraints on Tax Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Tax Revenue under Pareto-Efﬁciency . . . . . . . . . . . . . . . . . . . . 25 2.4 Non-distortionary Tax Systems . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Theory of the Second Best . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4 Homogeneous Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1 Ramsey’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Reformulating the Ramsey Rule . . . . . . . . . . . . . . . . . . . . . . . 30 a) Income Elasticities of Demand . . . . . . . . . . . . . . . . . . . 30 b) Wage Elasticities of Hicksian Demand . . . . . . . . . . . . . . 31 4.3 Special Cases: Additional Restrictions . . . . . . . . . . . . . . . . . . . 32 a) Fixed Labor Supply . . . . . . . . . . . . . . . . . . . . . . . . . . 32 b) Homothetic Preferences . . . . . . . . . . . . . . . . . . . . . . . 33 c) Zero Cross-Price Elasticities . . . . . . . . . . . . . . . . . . . . . 33 5 Heterogeneous Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 General Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Zero Cross-Price Elasticities . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6 The Production Efﬁciency Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 36 3 Lion Hirth: Taxation IV Optimal Income Taxation 38 1 Wages, Taxation, and Labor Supply . . . . . . . . . . . . . . . . . . . . . . . . . 40 2 Fixed Labor Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 Variable Labor Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Self-Selection Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 First Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Continuous Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5 Different Tax functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1 Random Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Linear Tax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6 Additional Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.1 Home Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 Tax shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 Commodity Taxation in a Atkinson-Stiglitz framework . . . . . . . . . . . . . 49 V Tax Evasion 52 1 Basic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.1 Fine F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.2 Detection Probability z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3 Income y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.4 Tax rate t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.5 Tax exemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 VI On this script 58 Foundations 4 I Foundations • this part draws on Stefan Homburg’s “Allgemeine Steuerlehre” (2007), Christian Keuschnigg’s “Öffentliche Finanzen: Einnahmepolitik” (2005) as well as on Laszlo Goerke’s lecture 1 What are taxes and what are they for? • taxes are compulsory payments without a speciﬁc return service • classiﬁcation – direct taxes versus indirect taxes: direct taxation directly taxes econo- mic performance and can take into account the personal characteristics of the subject taxes, indirect is taxing performance indirectly and can’t diffe- rentiate for personal characteristics – Subjektsteuern versus Objektsteuern – consumption taxes versus transaction taxes: consumption taxes tax value added, transaction taxes tax legal procedures and can cause casca- ding • reasons for taxation – allocation (goods and factors) – redistribution (between households, ﬁrms and to the state) – stabilization → today covered by macroeconomics • reasons for taxation II – revenues (redistribute resources from private agents to the state) – changing behavior of agents (Lenkungszweck) – redistribution (between individuals and ﬁrms) 2 German Tax revenues in comparison • legal system – who can tax is put down in the constitution (“Finanzverfassung”, Art. 105- 108 GG) – today all taxes (except local consumption taxes) are equal Germany-wide – for all those, the Bundesrat has to approve – the revenue from Körperschaftssteuer and income tax is shared equally bet- ween the Bund and the Länder (and a small part for the Gemeinden) – the distribution of the VAT revenues is variable and can be changed by a simple law – but the tax authorities are in hands of the Länder (who deliver the revenues to the Bund) • less than half of all state expenses (48% of GDP) in Germany are ﬁnanced through taxes (22%), the rest through social security contributions, fees, and debt • “total tax revenue” (Abgaben) is taxes and social security contributions • between 1991 and 2000, total tax revenues remained remarkably stable in Germany (between 36% and 37.2%), followed by a strong downward trend in the last years (down to 34.7% in 2005) • that means they are as low as in the 1970s • this is way higher than in Japan and the US (26% each), but below EU15 and below OECD average (and below the UK, France, and Italy) 5 Lion Hirth: Taxation • looking only at tax revenues, Germany looks even more like a tax paradise: 21% in 2005, at the level of the US, only Japan (17%) is lower, UK, France, and Italy stand around 29% • over the long run, tax revenues peaked around 1980 at 25% of GDP and are today lower than any time during the last 40 years • VATs were implemented in most countries as late as the 1970s or 80s (Germany was one of the ﬁrst in 1968 and the US still has no VAT system but a (state-wide) uniform consumption tax) • indirect taxes (on goods and services) generate slightly less revenues in the OECD than direct taxes (on income and proﬁts): 11% and 12% of GDP, respec- tively • Germany lies slightly below average in both categories • Japan and even more the US (and also Switzerland) have very little income tax revenues: around 5% (7%) • labor is heavily taxed in Germany – an average production worker receives only 59% (single person) to 80% (married with two children) of his gross pay as disposable income – this is less than in any other OECD country except Belgium – the sum income tax, overall social security contributions and Cash-Beneﬁts over labor costs are way above average in Germany and reach between 35% and 50% of labor costs – this is true no matter if married or not, with or without children, for high or low incomes • Germany generated 480 billion euros of taxes in 2006, from which was: – 25% Lohnsteuer (Bund / Länder / Gemeinden) – 5% Körperschaftssteuer (Bund / Länder) – 8% Gewerbesteuer (Gemeinden / Bund / Länder) – 30% Umsatzsteuer (Bund / Länder) – 8% Engergiesteuern (Bund) – 3% Tabaksteuer (Bund) – (all the rest was about 21%) • medium earner in Germany pay top tax rates: the highest income tax rate in Germany has to be paid for incomes of 55.000 euro, in the US for incomes of 270.000 euro 3 Pareto-Efﬁciency and Social Welfare Functions • to judge about the efﬁciency effects of taxation, usually it’s effects on the mar- ginal conditions for a Pareto-Efﬁcient allocation are analyzed • in part IV, where income taxation implies the government trading off efﬁciency and equity, sometimes a social welfare function is maximized 3.1 Conditions for Pareto-Efﬁciency • assumptions – two households with utility functions = ( 1 , 2 , K , L ) – utility increases with and decreases with K, L; the MRS is decreasing – two production processes 1 = ƒ1 (L , K ) – there is no waste, so that K1 + K2 = K 1 + K 2 = K, L1 + L2 = L1 + L2 = L and 1+ 2 = Foundations 6 • maximization problem – 1 is maximized while holding 2 constant – to make the problem more tractable, some of the 7 constraints are substitu- ted in, so that we have 3 at the end L= 1 + λ1 [ 2 − 2 ] + λ2 [ƒ1 − 1 ] + λ3 [ƒ2 − 2] 1 1 L= 1 1 , 2 , L1 , K 1 1 1 + λ1 2 ( 1 − 1 ), ( 2 − 2 ), L2 , K 2 − 2 + λ2 [ƒ1 (L1 , K1 ) − 1 ] + λ3 [ƒ2 ((L − L1 ), (K− K1 )) − 2] (1) • the resulting 13 FOCs can be reformulated in six marginal conditions for Pareto-Efﬁcient allocations 1. Marginal Rates of Substitution (MRS) are the same across households • requirement regarding the consumption decision of households (households / goods) • the MRS is the ratio of marginal utilities in consumption of good one and good two for a household • if the MRS are not the same across households, households could gain from trading goods with the other households ∂ 1/ ∂ 1 1 ∂ 2/ ∂ 2 1 1 = (2a) ∂ 1/ ∂ 2 ∂ 2/ ∂ 2 2 2. Marginal Rates of Factor Substitution (MRFS) are the same across households • requirement regarding the factor supply decisions of households (house- holds / factors) • the MRFS is the ratio of marginal disutilities in providing factor L and K for a household • if the MRFS are not the same across households, households could gain from trading factors with the other household ∂ 1 / ∂L1 ∂ 2 / ∂L2 = (2b) ∂ 1 / ∂K 1 ∂ 2 / ∂K 2 3. Marginal Rates of Technical Substitution (MRTS) are the same across production processes • requirement regarding the input decisions of ﬁrms (production / factors) • the MRTS is the ratio of marginal productivities of the factors L and K in a production process • it the MRTS are not the same across ﬁrms, ﬁrms could gain from trading factors with the other ﬁrm ∂ƒ1 / ∂L1 ∂ƒ2 / ∂L2 = (2c) ∂ƒ1 / ∂K1 ∂ƒ2 / ∂K2 • this implies that the Marginal Rate of Transformation (MRT), which is the cross product of the MRTS, is the same across input factors 4. MRS has to equal MRT • requirement regarding the relative quantities of production and consumption of good one and two • if the MRS doesn’t equal the MRT, all could gain from producing and consu- ming more of one one good and less of the other 7 Lion Hirth: Taxation ∂ 1 2 1/ ∂ 1 ∂ 2/ ∂ 1 ∂ƒ2 / ∂L2 ∂ƒ2 / ∂K2 1 = 2 = = (2d) ∂ 1/ ∂ 2 ∂ 2/ ∂ 2 ∂ƒ1 / ∂L1 ∂ƒ1 / ∂K1 5. MRFS has to equal MRTS • requirement regarding the relative quantities of employment and supply of factor L and K • if the MRFS doesn’t equal the MRTS, all could gain from employing and sup- plying more of one factor and less of the other ∂ 1 / ∂L1 ∂ 2 / ∂L2 ∂ƒ1 / ∂L1 ∂ƒ2 / ∂L2 1 / ∂K 1 = 2 / ∂K 2 = = (2e) ∂ ∂ ∂ƒ1 / ∂K1 ∂ƒ2 / ∂K2 6. The MRS between factor supply and the consumption of a commodity for a household has to equal the marginal productivity of that factor in the production of that commodity • requirement regarding the overall level of consumption and production • if this requirement doesn’t hold, households could gain from consuming less and supplying less factors (or consuming more and supplying more factos) ∂ 1 1 / ∂L ∂ƒ1 1 = (2f) ∂ 1/ ∂ 1 ∂L1 ∂ 2 2 / ∂L ∂ƒ1 2 = ∂ 2/ ∂ 1 ∂L1 ∂ 1 1 / ∂L ∂ƒ2 1 = ∂ 1/ ∂ 2 ∂L2 ∂ 2 2 / ∂L ∂ƒ2 2 = ∂ 2/ ∂ 2 ∂L2 ∂ 1 1 / ∂K ∂ƒ1 1 = ∂ 1/ ∂ 1 ∂K1 ∂ 2 2 / ∂K ∂ƒ1 2 = ∂ 2/ ∂ 1 ∂K1 ∂ 1 1 / ∂K ∂ƒ2 1 = ∂ 1/ ∂ 2 ∂K2 ∂ 2 2 / ∂K ∂ƒ2 2 = ∂ 2/ ∂ 2 ∂K2 • obviously, the number of conditions depends on the assumptions about the model – models without production side only have the MRS and the MRFS condi- tions for PE – models with only one household cannot have these two, but all others – models with only one factor cannot have the MRTS equality and the MRTS=MRFS condition – models with cost functions assume implicitly the MRFS equality to be fulﬁlled (costs are minimized) Foundations 8 3.2 Market Outcome and Market failure • it is easy to show that in a perfectly working free economy, the market outcome fulﬁlls these conditions 1. the MRSs equals relative prices, and thus each other 2. the MRFSs equal relative factor prices, and thus each other 3. the MRTSs equal relative factor prices, and thus each other 4. the MRTs equal relative prices, and thus the MRS 5. MRFS equal the MRTS since both equal the factor prices 6. the last condition is fulﬁlled since the ratios equal price/factor price ratios • the intuition is straightforward: there are proﬁt opportunities as long as the conditions are not fulﬁlled 1. if the MRS doesn’t equal relative prices, a household could gain from chan- ging their consumption bundle 2. if the MRFS doesn’t equal the relative factor prices, a household could gain from changing their factor supply combination 3. if the MRTS doesn’t equal the relative factor prices, a ﬁrm could gain from substituting one factor for another 4. if the MRT doesn’t equal the relative prices, a ﬁrm could gain by producing more (or less) 5. MRFS=MRTS since both equal the relative factor prices 6. ??? • in other words: the market approaches the PE allocation through Pareto improve- ments - if there is market failures there is no way how to get to a Pareto-Efﬁcient Allocation without making someone worse off • this result can be derived analytically very easily (so easyly that it is not shown here) • but this result is highly dependent on perfectly working markets • market imperfections, that cause the conditions the be not fulﬁlled include – perfectly mobile factors, no transaction costs, complete information – market power (heterogeneous goods, market entrance barriers) – economies of scale – external effects – public goods – asymmetric information • this suggests that in real world hardly ever the conditions are fulﬁlled in a free market outcome • as discussed in section (3) the Theory of the Second Best states that the vio- lation of any of the conditions for Pareto efﬁciency makes all the other conditions not desirable anymore 3.3 Social Welfare Functions • social welfare functions allow judgments about a signiﬁcantly broader set of al- locations than the Pareto criteria does • for example, they allow to value redistribution (to equalize disposable incomes) - that is, only with social welfare functions, a trade-off between efﬁciency and equity can be made • the cost, however, is that signiﬁcantly more information is needed – about individual preferences: utility functions have to be cardinal to allow for interpersonal utility comparison 9 Lion Hirth: Taxation – about the “the preferences of society” (for example the weights of individual utilities) – just as a sidemark: any political scientist would laugh about the idea of exo- genous “preferences of societies” • the most important speciﬁc social welfare functions are – special utilitarian welfare function: W SU = – utilitarian welfare function: W U = g – Nash social welfare function: W N = ( − 0 ) – Rawlsian social welfare function: W R = m n( 1 , ..., n ) • all these functions belong to the class of Berson-Samuelson welfare functions, where welfare depends positively on the welfare of the individuals: W BS = W( 1 , ..., n ), with ∂W BS / ∂ > 0 • for identical quasi-linear utility functions, the sum of producer surplus and con- sumer surplus (the space between inverse demand function and marginal costs) is equivalent to the special utilitarian welfare function 4 Welfare Effects of Taxation • welfare-reducing substitution effects – for any positive revenue need R > 0 there is obviously an income effect: utility (and proﬁts) is reduced because households can consume less – but substitution effects will reduce utility further (by deﬁnition, since we start at an optimum) - this is the dead weight loss of taxation, also referred to as the excess burden of taxation or “Zusatzlast” in German – agents try to avoid taxation by changing their behavior and thus reduce the tax bases, the government has to rise the tax rate, and agents change their behavior further – what looks sensible from an individual agent’s point of view (changing one’s behavior to avoid taxation) is welfare reducing in a social perspective – the simplest and most drastic example is a commodity tax that is high enough to kill all demand: there are no tax revenues, but all rents (consumer and producer surplus) are lost • signiﬁcance of the welfare effects – the welfare effects of taxation are at the center of any economic analy- sis of taxes – but note that its mere existence seem to be virtually unknown to must policy makers and the wider public: public debate often center on the question of how big the tax revenues should be, and not how they are gene- rated most efﬁciently – the reason is obvious: while tax payments are visible, the welfare effects are invisible • there are no substitution effects of lump-sum taxes (but, as discussed in section (1), lump-sum taxes might not exist and surely are an option in today’s demo- cratic societies) • how to measure the welfare reduction – sum up utilities (read the scale on a social welfare function) – compare equivalent variations (“how much income reduction would ma- ke the households indifferent between a tax and this reduction?”) – ad the sum of consumer and producer surplus (and tax revenues), that is, estimate the size of the Harberger triangle – these triangles were actually discovered by Arnold Harberger in a discussion Foundations 10 of taxes (1964); today they are used in a wide range of economic analysis, perhaps most famously in trade theory • all these measurements have problems – summing up utilities has of kind of valuations and information problems – equivalent incomes as well – the surplus calculation ignores any general equilibrium effects (side effects on other markets) and implicitly assumes that there is no income effect • for linear demand and supply curves ( d = − b(q + τ and s = cq), the reve- nue increase under-linearly (and even decrease after a certain point), while the excess burden increases quadratically: τ − bτ 2 R=c (3) b+c bcτ 2 W=− (4) 2(b + c) • under the assumptions of quasi-linear utility functions, it will be shown in section (1.3) that the reduction in welfare due to the introduction of a small spe- ciﬁc tax τ as proxied by the reduction in the sum of CS and PS is the Harberger triangle: dτd εd εs dτ εd εs W= = dτ = (dτ)2 < 0 (5) 2 p εs − εd 2 2p εs − εd • that implies that the welfare costs rise with the square of the tax rate • the smaller the price elasticities, the smaller the welfare reduction • it can be shown easily that the welfare costs due to the increase of an existing tax rise only linear with the tax rate (but are higher, the higher the initial rate was): W = d (τ + dτ/ 2) < 0 (6) • using equivalent variations a widely cited study from 1985 estimates empirically for the US that the excess burden is in the range of 33% (that means, each dollar in tax revenues causes 1,33 dollar in utility loss) 11 Lion Hirth: Taxation II Tax Incidence • legal incidence doesn’t equal economic incidence – auf Deutsch: formelle Steuerlast (Zahllast) ist nicht gleich materielle Steuer- last (Steuerinzidenz) – welfare effects due to dead weight loss imply that the economic incidence always has to be bigger than the legal incidence, this has been discussed in section (4) – tax shifting (Überwälzung) causes the economic incidence to be borne by other agents than the legal incidence; this is the topic of part (II) • levels of analysis – speciﬁc tax incidence (looking at one single tax) – differential tax incidence (looking at two (or more) taxes; assume no change in overall tax burden → analysis of differential effects) – budgetary tax incidence (takeing expenses of state into account) • Nahwirkung vs. Fernwirkung (ﬁnal economic incidence) – taxes can be shifted forward (e.g., by increasing good prices), backward (by decreasing factor income) or across (by increasing the prices of other goods) – every tax has consequences on every price and the behavior of all agents in the economy → pretty complex modeling with a fully-ﬂedged general equili- brium model – normally, the analysis covers only the market mainly affected (Nahwirkung) by using a partial equilibrium model or a one-sector GE model • what questions do we ask? – are ad valorem and speciﬁc taxes equivalent? (equivalence of taxes) – does it matter who is legally taxed? (invariance of incidence) – who bears the tax? (tax incidence) ∗ how do quantities and prices react? ∗ what are the determinants of that reactions? ∗ how does the distribution of welfare change? • stages of modelling – one good, no factors: buyer vs. seller (partial equilibrium model) – one good, no factors, market power (partial equilibrium model of imperfect competition) – one good, two factors of production: capitalists vs. workers (one sector GE) – two goods, two factors (general equilibrium (GE)) 1 One Sector • model – one homogenous good with quantity (one sector) – no factors of production (no production, only trade) – perfect competition: all agents are price takers – we look at: buyer vs. seller – supply price q (the price the seller receives) differs from demand price p (the price the buyer pays) – a speciﬁc tax τ and ad valorem tax t is allowed for, so that p = (q + τ)/ (1 + t) – supply always has to equal demand: s (q) = d (p) • central ﬁndings – a tax on a good has three effects: it reduces the supply price, increa- ses the demand price, and reduces the quantity trades Tax Incidence 12 – speciﬁc tax and ad valorem tax are equivalent – “invariance of incidence: legal incidence doesn’t matter - taxation of in- come, property, and expenditure are equivalent – shifting and economic incidence depends crucially on price elasticities of agents: the less elastic side of a market bears more – the more elastic the agents, the higher the welfare loss 1.1 Speciﬁc vs. Ad Valorem Tax • speciﬁc tax τ and ad valorem tax t • at the presence of both taxes, deﬁning them as paid by the seller – supply price q = p(1 − t) − τ – demand price p = (q + τ)/ (1 + t) • speciﬁc taxes are deﬁned in terms of units of a good – problems with product deﬁnition (incentive to change good to evade taxati- on) and increase quality – in the model we assume the quality to be ﬁxed (and since there is only one good we don’t have problems with deﬁnitions) • argument – concern: multiplier effect of ad valorem tax: a supplier price increase by one unit increases the demand price by (1 + t) – in turn, a price decease of the demand price is partially paid by the (falling) tax – but in a competitive market, the supply price (as the demand price) is given ’ a price increase isn’t possible in the ﬁrst place – ﬁrm receives supply price q = p(1 − t) or q = p − τ, but it can never inﬂuence p, and hence neither q – that means, speciﬁc and ad valorem tax are equivalent if they introduce the same tax gap – Results – if speciﬁc and ad valorem tax impose the same difference between supply and demand price, they have the same effect on behavior, income, and utility of economic agents and on the state’s income – this result is not very robust and will not longer hold when market power is introduced in section (2) • implications – for policy: the decision what tax is used can be based on other criteria (e.g. the problem of quality changes) – for analytical purposes: what type of tax we use (t or τ) is just a matter of convenience 1.2 Invariance of legal Incidence • formal analysis – assume only speciﬁc tax τ – either the tax is collected from the buyer: pc = qc + τ – or it is collected from the seller: qs = ps − τ c = d (pc ) = d (pc + τ) = s (qc ) = s (pc − τ) (7) s = d (ps ) = d (ps + τ) = s (qs ) = s (ps − τ) (8) 13 Lion Hirth: Taxation – if both taxes result in the same quantity traded ( c = s ), the legal incidence doesn’t matter – using the deﬁnitions pc = qc + τ and qs = ps − τ shows that taxation of any of the two sides with the same rate will introduce the same tax gap between supply and demand side, which will cause the same fall in quantity exchanged – same fall in quantities implies that prices, incomes and utility are effected identically by both taxes (qc = qs , pc = ps ) • graphical illustration – in a graph of the market equilibrium, τ means a downward shift of the de- mand curve or a upward shift of the supply curve – shifting S up by τ and shifting D down by τ results in identical equilibrium for x – in this case, prices, incomes and utility have to be the same • intuition – what matters, are the prices that suppliers and buyers face: q and p. Who transfers the tax - and if buyers notice they’re paying a tax - doesn’t matter. – put more simply, it doesn’t matter if you are buying a good for 100 Euros and putting 20 Euros in a box for the tax authorities, or of you’re paying 120 Euros and the seller puts 20 Euros in the box • Results – the legal incidence doesn’t matter for economic incidence – nor does it matter if agents - or one market side - doesn’t know the good is taxes – the result is robust and doesn’t depend on ∗ competitiveness of market ∗ form of supply and demand curves ∗ only in cases of price bargaining it might matter • implications – for policy: a big part of the political debate about taxes (employer’s contri- bution to social security vs. employee’s contribution) is pretty senseless – for analytical purposes: how we deﬁne taxes (who side it pays) is just a matter of convenience 1.3 Determinants of the incidence • assume a newly introduced speciﬁc tax τ is imposed on buyers (p = q + τ), so that d (q + τ) = s (q) • we look at the changes of prices and quantities • total derivative of market equilibrium has to equal zero – market has to be in equilibrium with and without a tax ( s = d ) – that implies that the tax has to change s and d by the same amount, which implies that the total derivative pf s − d has to be zero d s d =d (9) ∂ d ∂(q + τ) ∂ d ∂(q + τ) ∂ s dq + dτ = dq (10) ∂(q + τ) ∂q ∂(q + τ) ∂τ ∂q d d s dq + dτ = dq (11) • using the equalities S = D (equilibrium) and q = p (introduction of the tax) and the deﬁnition εd = d q/ D we can obtain the marginal changes in prices and Tax Incidence 14 quantities dq d d (q/ d) εd = s d = s (q/ d) − d (q/ d) = (12) dτ − εs − εd dp dq εs = +1= s (13) dτ dτ ε − εd d d d d εs εs εd = = = (14) dτ dτ εs − εd p εs − εd d s εs εs εd = = s s = dτ ε − εd q εs − εd • note that – this result had been already mentioned in section (4) – εs is the percent change of supply for a 1 percent change in supply price, hence it is assumed to be positive – εd is the percent change of demand for a 1 percent change in demand price, hence it is assumed to be negative – this analysis holds fort the introduction of a marginal tax only, not an incre- ase, since in this case p = q – the difference of the changes of p and q has to equal the change in the tax rate: dp − dq = dτ • for a ad valorem tax (p = q(1 + t)) the results are slightly different – (10) becomes slightly more different and the results change ∂ d ∂(q(1 + t)) ∂ d ∂(q(1 + t)) ∂ s dq + dt = dq (15) ∂(q(1 + t)) ∂q ∂(q(1 + t)) ∂t ∂q d d s (1 + t)dq + qdt = dq d d s dq + qdt = dq (16) dq εd =q (17) dt εs − εd dp εs =q (18) dt εs − εd d εs εd = (19) dt εs − εd – the result is easy to understand since the tax wedge introduced by a change dt is not dt, but qdt – the qualitative results are unchanged • Results – the introduction of a tax weakly reduces supply price and weakly increases demand price – that means, overshifting cannot occur – if demand is inelastic, the supply price won’t change and the demand price changes by the full amount of the tax: the buyer bears the full tax – the same is true if supply is perfectly elastic – if supply is inelastic, the demand price won’t change and the supply price changes by the full amount of the tax: the seller bears the full tax – the same is true if demand is perfectly elastic 15 Lion Hirth: Taxation – the more elastic side of the markets bears less – intuition: the inelastic side cannot avoid being taxed – the higher the absolute price elasticities of demand and supply, the bigger the change in quantities trades – the more elastic the agents, the higher the welfare loss • determinants of elasticity – price elasticities are not exogenous, but depend on the market struc- ture, the deﬁnition of the tax base, market imperfection, and other factors – note that in a perfectly competitive market, supply is perfectly elastic – the intuition is clear: ﬁrms (sellers) make no proﬁts and since they cannot bear the tax, the buyers bear the entire tax – the deﬁnition of the tax base also matters: in general, the narrower the base, the more elastic supply and demand – elasticities vary greatly when different time horizons are looked at: in gene- ral, in the long run agents behave more elastic – often, in the short run the demand side is more elastic and in the long run the supply side: thus, in the short run ﬁrms bear a relative large share of most commodity taxes while in the long run consumers bear most – if the market is not in equilibrium (e.g. due to administered prices), the short market side bears all of the tax (and the gap between supply and demand narrows) • we’ve looked at price changes, but changes in rents (CS, PS) is a more precise measure of welfare effects (this analysis is covered in the script with a graphical analysis only) 1.4 Some Applications • this subsection draws on Homburg (2007) and gives some illustrative examples • coffee tax – since there is no coffee production in Germany, a coffee tax is equivalent to a tariff – if Germany is a small country and the world market price unaffected by the tax, the supply is perfectly elastic and consumers will bear all of the tax – this is not the case if the coffee market is oligopolistic and suppliers diffe- rentiate the price – if all consuming countries world wide introduce a tax, and supply is inelastic in the short run (since coffee plants involve large sunk costs and depreciate slowly), suppliers will bear the lion’s share of the tax • wine tax – wine is the only alcoholic beverage that is not taxed in Germany (besides VAT) – white and red wine are supposedly good substitutes – in the case of a tax on red wine, demand reacts highly elastic (by switching to white wine) and producers bear most of the tax – this example shows that the tax base of a commodity tax matters – in general, the narrower the tax base, the more elastic are both demand and supply • land tax – land is perhaps the most inelastic good at all – the unexpected introduction (or increase) of a land tax is borne entirely by the land owner, since the land value decreases by the present value of future Tax Incidence 16 tax payments • note that the incidence of subsidies is driven by elasticities, just as tax incidence 2 Market Power • model – augmented model: one market side (here seller) has market power – in principle, the difference between the two sides are important (market power vs. perfect competition) - one could model oligopsonistic markets ex- actly the same way (the results would all be vice versa) – Nash-Cournot-competition is assumed ∗ n identical ﬁrms ∗ quantity competition ∗ all competitors decide simultaneously taking other quantities as given – there is a valorem tax t paid by the ﬁrm: p = q(1 − t) – SOC is assumed to hold (second derivative negative) by assuming the de- mand curve to be convex, but not too convex • central ﬁndings – legal incidence doesn’t matter – over-shifting may occur – speciﬁc and ad valorem taxes are not longer equivalent 2.1 Invariance of legal Incidence • compare an ad valorem tax t paid by sellers and an ad valorem tax C paid by buyers • sellers receive only (1 − t) of the price consumers pay and buyers have to pay (1 + tC ) the price sellers get • only seller is taxed: q = p(1 − t) • only buyer is taxed: p = q(1 + tC ), q = p/ (1 + tC ) • for the rate t = tC / (1 + tC ), the cases are equivalent • if the prices are the same, the quantity X ∗ has to be the same, too • that means, the invariance of incidence holds 2.2 Determinants of the incidence • from the proﬁt maximization, the the equilibrium condition Z can be derived πj = (1 − t)p j − (c + τ) j p = p( j + X−j ) ∂πj = (1 − t)(p ∗ + p) − (c + τ) = 0 j ∂ j ∗ Z := (1 − t)(p (X) + p) − (c + τ) = 0 mit X = n j (20) • the total derivative of Z has to be zero ∂Z ∂Z ∗ dZ = dt + ∗ d j =0 (21) ∂t ∂ j 17 Lion Hirth: Taxation • solving for d ∗ / dt j results in an expression for the change in quantity traded ∗ n ∂ ∗ Zt (p ∗ + p) p p j p +n j j =− = ∗ = ∗ ∂t Z ∗ (1 − t)(p n j + p (1 + n)) p n j (1 − t)np p j + (1 + n) p 1/ εd + n = (22) (1 − t)np (η + 1 + n) • where η is the elasticity of the inverse demand function and assumed to be negative • the equilibrium is stable if η + 1 + n > 0, which is assumed to be given • we are interested in price and quantity changes ∗ ∂X ∂ j p εd + n =n = (23) ∂t ∂t (1 − t)p (η + 1 + n) ∂p ∂X p εd + n =p = (24) ∂t ∂t (1 − t)(η + 1 + n) ∂q ∂p(1 − t) ∂p ∂X ∂p(1 − t) ∂X = + = (1 − t)p −p ∂t ∂p ∂X ∂t ∂t ∂τ p 1 = −η−1 (25) (1 − t)(η + 1 + n) εd • Results • over-shifting might occur, depending on the demand function – over-shifting is deﬁned as a increase of q due to the introduction of a tax – that implies that the buyer pays more than 100 percent of the tax – this occurs when (εd (η + 1) > 1 – the monopolist always sets a price so that ε < −1 – that means, η < −2 is a sufﬁcient condition for over-shifting – for a linear demand curve, η = p = 0, so there is under-shifting – for constant elasticity of demand η = 1/ εd −1, so there is full forward shifting 2.3 Speciﬁc vs. Ad Valorem Tax • here the same procedure is repeated for a speciﬁc tax τ ∂ ∗ j Zτ 1 =− = (26) ∂τ Z ∗ j p (η + 1 + n) ∂X n = (27) ∂τ p (η + 1 + n) ∂p n = (28) ∂τ (η + 1 + n) ∂q n η+1 = −1=− (29) ∂τ (η + 1 + n) η+1+n • speciﬁc results for speciﬁc demand functions Tax Incidence 18 – for a linear demand curve under-shifting occurs – for a constant elasticity demand curve, over-shifting occurs – in a monopoly with linear demand, both sides bear half of the tax – over-shifting is more likely with an speciﬁc tax than an ad valorem tax – proﬁts will be reduced even in the case of full forward shifting, since q re- mains constant, but ∗ declines j • an ad valorem tax is always better than a speciﬁc tax – the reduction in quantities is more pronounced in the case of a speciﬁc tax – because with an ad valorem tax, ﬁrms have to bear only a part of the price decrease when increasing output: an ad valorem tax is an implicit output subvention – consumers are better of since output increases – for higher output, the tax rate decreases (holding revenues constant), so that a monopoly ﬁrm is also better off – for an oligopoly, proﬁt effects are uncertain, but welfare increases unambi- guously – → a budget-neutral substitution of a speciﬁc tax by an ad valorem tax is Pareto-improving in a monopoly and welfare improving (as measured by sum of surpluses) in all oligopolies – mathematical proof in the script, p. 18-21 • Results – ad valorem and speciﬁc taxes are not longer equivalent – an ad valorem tax is superior in welfare terms 3 One-sector General Equilibrium • even with only one sector, the General Equilibrium (GE) is signiﬁcantly more complex than the previous models • some of the original assumptions of the model in section (1) remain the same – one homogeneous good with quantity (one sector) – supply price q differs from demand price p – supply always has to equal demand: s (q) = d (p) – perfect competition: all agents are price takers • now we introduce production into the model – two factors of production: labor L and capital K with the real prices wage rate and real interest rate r – that means, now we have three markets: for the good and the two factors – backward and forward shifting is possible • factors have different price elasticities of supply – in our example, K is supplied perfectly inelastically (at a constant quantity) – L is supplied elastically depending positively on the real wage – what matters for the outcomes of the model is a difference in price elasticity of supply, one could also model L inelastically or model one factor less elastic than the other (instead of perfectly inelastic) • utility function of households are homogeneous of degree zero on all prices – that means the price level doesn’t matter for demand – this implies we look at real factor prices • the production function is linear homogenous (CRS) – factors can be substituted in production process (in contrast to Leontief-like production functions) 19 Lion Hirth: Taxation • instead of analysing suppliers and demanders, now we focus on capitalists (ca- pital owner) vs. workers (labor owners) • ﬁve ad valorem taxes are analyzed – labor returns (t ) – capital returns (tr ) – output (t ) – consumption (tC ) – income (t ) – taxes on factors and output are paid legally by the ﬁrm, taxes on income and consumption are legally paid by the consumers • exogenous variables – capital supply K – tax rates ((t , tr , t ) • endogenous variables – real factor prices (p, , r) – labor supply Ls • central ﬁndings – a tax on capital reduces the interest rate – a tax on labor reduced both the wage and the interest rate – the share of a labor tax workers have to bear is determined crucially by demand and supply elasticities of labor – invariance of incidence holds for a number of uniform taxes – the share of a uniform tax workers bear depends again on demand and supply elasticities of labor • empirical labor supply and demand elasticities – here labor is modeled as being supplied elastically (compared to perfectly inelastic capital) – labor supply at the extensive margin (through more weekly hours or higher labor intensity) is often found to be pretty low – labor supply at the intensive margin (though a change of the participation in the labor force) is a lot higher, but often only in the long run – classical economists like David Ricardo often assumed that wages are ﬁxed at subsistence levels and thus labor supply is perfectly elastic; that means that a tax on wage income is effectively a tax on ﬁrms (which they opposed on the ground of long-run capital accumulation and growth considerations) – note also that elasticities might vary dramatically depending on the time horizon: in the short run, labor is often assumed to be supplied elastically while capital supply is ﬁxed; in the long run it might be plausible to assume the opposite 3.1 Taxes on Factor Returns • while empirically often taxed uniformly as “income”, taxes on wage income and capital income should be considered differently in an incidence analysis • empirically, with the “Abgeltungssteuer” in Germany from 2009 on, there will be in fact a differentiated income tax on labor and on capital • the proﬁt equation in nominal terms, given that = F(Ls , K) (implicitly assuming the equilibrium condition holds) p π= F(Ls , K − pL(t + t ) − rpK(t + tr ) (30) (1 + t ) Tax Incidence 20 • differentiating with respect to the two factors results in the optimality condition for the representative ﬁrm Z1 = FL (Ls , K) − (1 + t )(1 + t ) = 0 (31) Z2 = FK (Ls , K) − r(1 + tr )(1 + t ) = 0 (32) • both FOC have to hold before and after introducing a tax, thus the changes with respect to the introduction of a tax have to be zero, thus we take total derivative of (31) (32) with respect to , r, t , tr , t ∂Z ∂Z ∂Z ∂Z ∂Z dZ = d + dr + dt + dtr + dt (33) ∂ ∂r ∂t ∂tr ∂t s dZ1 = FLL L − (1 + t )(1 + t ) d + 0dr − (t + t )dt + 0dtr − (t + t )dt = 0 (34) dZ2 = FLL Ls d − (1 + tr )(1 + t )dr + 0dr L − r(1 + t )dtr − r(1 + tr )dt = 0 (35) • solving this system of linear equations can be done by hand or by applying Kra- mer’s rule • for the analysis of a tax, we set the other two taxes to zero for simplicity ∂ εd = <0 (36) ∂t 1 + t εs − εd ∂r ƒKL Ls εd = <0 (37) ∂t 1+t εs − εd ∂ =0 (38) ∂tr ∂r r =− <0 (39) ∂tr (1 + tr ) • Results – a tax on capital only reduces the return on capital r – a tax on labor reduces the returns on both factors, and r (cross shifting) – the more elastic the supply of labor and the less elastic the demand of labor, the smaller the share of a labor tax that the workers bear – this is because capital us supplied inelastically and labor is supplied elasti- cally • empirically, it is often found (or assumed) that capital supply is highly elastic in times of trade openness and globalized capital markets (at least in the long run): that means all taxes on any factor returns are ultimately borne by workers • not analyzed formally here, but quite intuitive, taxes on economic rents (like proﬁts or land rents) cannot be shifted (a tax consumer surplus is not feasible, since it cannot be observed by tax authorities) 3.2 Taxes on Output, Income, and Consumption • all taxes (t , tC , t ) are uniform (“ﬂat tax” without tax exemption) - not progressive as most real world income taxes are • invariance of incidence restated 21 Lion Hirth: Taxation – uniform taxes on income, consumption, output are all equivalent to each other and equivalent to a uniform tax on factor incomes – given linear homogeneous production function the Euler theorem applies and input equals output, the sum of factor incomes, and consumption (all income is spent on ) – that implies that the tax base for all four taxes is the same: what the house- holds get (income), what they spend (consumption), what ﬁrms buy (inputs) and what they produce (output) - it is all the same – if the taxes introduce the same tax gap, they are equivalent – this is the case for t = tr = t = t / (1 − t ) = tC / (1 − tC ) – obviously, tax rates differ only because the how they are paid legally (by the household or the ﬁrm) – it does not matter if factor income is taxed uniformly (and paid by the ﬁrm) or income is taxed (and paid by the households) – it does not matter if income is taxed or consumption – it does not matter if inputs are taxed uniformly or output is taxed • tax incidence – the question of the tax incidence analysis is: which share of the tax is borne by workers and which by capitalists? – the shares depend on elasticities of demand and supply of labor – the higher the demand elasticity and the lower the supply elasticity, the higher the relative tax burden for workers – the share of the tax the workers bear is bound upwards at their share of total income – the share of the tax the capitalists bear is bound downwards at their share of total income – if labor and capital are both supplied inelastically, they bear a share of the tax being their share of total income • Results – uniform taxes on income, consumption, output, and both factor in- comes are equivalent – the higher supply elasticity and the lower demand elasticity of a factor, the smaller the share of the tax the factor has to bear – if labor is supplied perfectly elastically (or if it is demanded inelastic), capital bears the entire tax burden – if both factors are demanded and supplied with the same elastici- ties, they bear a share equal to their income share • the case of savings – if we allow for savings, an expenditure tax is like an income tax where all savings are exempted from taxation – in a dynamic model where all income is consumed, an expenditure tax is like an income tax where interest income is exempted – thus, the exemption of savings and interest income is equivalent – switching from an income tax to an expenditure tax (or, less dramatic, incre- asing the VAT rate while reduce income tax rates) is intergeneration redistri- bution: people who live from capital accumulated pay twice 4 General Equilibrium Model • the model is very restrictive in its assumptions, but still the formal analysis is heavy in notation and quite messy Tax Incidence 22 • two goods (two sectors) and two factors • factors can be substituted in production process as goods are substitutes in con- sumption • supply of capital and labor is ﬁxed, but the factors are perfectly mobile between sectors – formally, K = K1 + K2 , L = L1 + L2 – that means, there is no distortion between the labor-leisure decision, but between the employment of factors in the two sectors – uniform taxes on factor returns are lump-sum, since supply cannot be redu- ced • production functions are well behaving and linear homogeneous (CRS) and mar- kets are perfectly competitive – ﬁrms make no proﬁts, thus they can’t bear any tax – since the economy is closed, the Euler theorem holds: rK + L = p1 1 + p2 2 – the labor-capital ratio chosen by ﬁrms doesn’t depend on the scale • all ﬁrms are identical (have the same production technology) • households have identical homothetic preferences – that implies households are identical – demand for goods depends only on relative prices and aggregate income, pure redistribution between household doesn’t change the structure of over- all demand • taxes – taxes on factor returns can be differentiated between sectors – but taxes on commodities cannot be differentiated between households – we allow for t1 , t2 , t 1 , t 2 , tr , tr 1 2 – that means taxes can be shifted forward (to consumers), backward (to factor suppliers) or across (to the other factor) • the formal analysis is skipped here • a graphical analysis is done in the script p. 41-46 • central ﬁndings – a uniform tax on factor return in one sector is equivalent to an ad valorem commodity tax in that sector at the same rate: t 1 = tr ⇔ t1 1 – this implies that an economy-wide uniform tax on factor returns is equivalent to a VAT of the same rate: t 1 = t 2 = tr = tr ⇔ t1 = t2 1 2 – a tax on output of the labor intensive sector will reduce the wage; this is stronger for ∗ higher labor intensity in that sector ∗ lower elasticity of technical substitution (in both sectors) ∗ higher price elasticity of demand for the good produced in that sector – a wage tax in labor intensive sectors will reduce the wage (since labor is substituted and production is shifted to the other sector) – a wage tax in capital intensive sectors has an ambiguous effect on the wage (since both substitution and output effect work in opposite directions) 23 Lion Hirth: Taxation III Optimal Commodity Taxation • while tax incidence is a positive theory, the analysis of optimal taxation is a normative one • it is analysed what tax system is best from the taxpayers point of view • central question of the chapter is about “optimal taxation”: what is the best way to generate government revenues? • since (as will be shown) there is no feasable revenue-generating tax system, this question is a second-best analysis • the amount of tax revenue is exogenously given and deﬁcits are ruled out, that is we conduct a differential tax analysis • we don’t allow for income taxation (which would be in the case of homogeneous households a lump-sum tax) • so the main question is if taxes on commodities should be differentiated or not (and, if yes, how) • if housholds are homogenous, distribution doesn’t matter, if they’re heteroge- neous it does matter (then a explicit welfare function has to be speciﬁed) • note that there is always one untaxed good: leisure, so that there is always a decision that is distortion (the labor-leisure decision) • this is “the decisive ingredient of the approach” that makes a ﬁrst best solution unattainable • the central result is that whenever it is possible, in the second best the dis- tortion of the labor-freetime decision is counterbalanced by distorting prices through non-uniform commodity taxes • here, the untaxable good is named “leisure”, but it is easy to reformulate the theory and use goods produced at home as the “good number n + 1” • the analysis of this section is in the tradition of Frank Ramsey’s seminal article “A Contribution to the Theory of Taxation” (1927) 1 Lump-Sum Taxes • lump-sum taxes are taxes that cannot be avoided by changing one’s be- havior • lump-sum taxes have negative no welfare effects (as measured as the sum of producer surplus and consumer surplus) - they are pure redistributive – all FOC of utility maximization will remain the same – that means, there is no change of behavior at the margin – all consumption choices and factor supply choices remain the same at the margin – thus all relative prices remain the same – of course incomes will be lower - just by the amount of taxation – the lower income will have effects on others if the tax payer has market power (then its reduced demand effects relative prices) – but there is no dead weight loss: no welfare effects • lump-sum taxes and incidence – in general, if one cannot reduce taxes by changing one’s behavior, agents won’t change their behavior and thus there is no chance to shift the tax burden – but lump-sum taxes can be (partially) shifted, if voluntary transfers are in- volved or if there is market power (so that more binding budget constraint has an effect on others) Optimal Commodity Taxation 24 – what about differential effects in the case of not linear homogeneous utility functions? • proposals for lump-sum taxes – a head-tax comes close to a lump-sum tax, but even this can be avoided - by emigration or in the long run by getting less kids – the tax could be tailored on exogenous characteristics such as age or sex – a tax on “earning capacities” has also been proposed as a lump-sum tax - but how should we measure earnings capabilities? – a tax on all incomes and commodities has been proposed, but ain’t no lump- sum tax since leisure (and other goods) can’t be taxed directly – a tax on proﬁts seems to be lump-sum, but it is not when taking the decisi- on about production locations into account or the decision of becoming an entrepreneur or a dependent worker • problems – all these taxes are probably highly regressive: by deﬁnition they have to impose a tax burden that is unrelated to (for example) income levels; for Laszlo Goerke, this is “the strongest objection” – more generally, a lump-sum tax cannot incorporate almost any accepted perceptions of justice – there are also political economy issues: a differentiated head tax (which would be as lump-sum as a uniform one) would induce corruption and other kinds of rent-seeking behaviour and would be regarded as highly unfair – the technically feasible lump-sum taxes (head tax) is politically not feasible in today’s democratic society due to it’s regressiveness • → it’s not clear that there exist any lump-sum taxes and surely they aren’t any option in the real world • in recent history, the closest thing to a lump-sum tax in the OECD was Magret Thatcher’s head tax for county ﬁnancing - it had to be abolished after two years due to protest 2 No Distortion means No Revenues • the “model” – we don’t use a agent-modeling technique in this subsection, we only state the conditions for Pareto-efﬁciency – we allow for taxes t , tr , t1 , t2 – production functions are linear homogeneous (CRS) • procedure of the argument – ﬁrst step: remainder of marginal requirements for Pareto efﬁciency – second step: ﬁnding constraints of tax rates for the requirements to hold – third step: calculating tax revenues • central ﬁndings – a non-distortionary tax system (that is a system the remains Pareto- efﬁciency of a market outcome) cannot generate any revenues – this is the “basic result” of commodity taxation – the result is driven by the assumption that there is at least one good that we cannot tax (in our case leisure) 2.1 Requirements for Pareto-Efﬁciency • recall the conditions stated in (2) for a Pareto-efﬁcient allocation in section (3.1) 25 Lion Hirth: Taxation 1. Marginal Rates of Substitution (MRS) are the same across households ∂ 1 2 1/ ∂ 1 ∂ 2/ ∂ 1 1 = 2 ∂ 1/ ∂ 2 ∂ 2/ ∂ 2 2. Marginal Rates of Factor Substitution (MRFS) are the same across households ∂ 1 2 1 / ∂L ∂ 2 / ∂L 1 = 2 ∂ 1 / ∂K ∂ 2 / ∂K 3. Marginal Rates of Technical Substitution (MRTS) are the same across production processes ∂ƒ1 / ∂L1 ∂ƒ2 / ∂L2 = ∂ƒ1 / ∂K1 ∂ƒ2 / ∂K2 4. MRS has to equal MRT ∂ 1 2 1/ ∂ 1 ∂ 2/ ∂ 1 ∂ƒ2 / ∂L2 ∂ƒ2 / ∂K2 1 = 2 = = ∂ 1/ ∂ 2 ∂ 2/ ∂ 2 ∂ƒ1 / ∂L1 ∂ƒ1 / ∂K1 5. MRFS has to equal MRTS ∂ 1 / ∂L1 ∂ 2 / ∂L2 ∂ƒ1 / ∂L1 ∂ƒ2 / ∂L2 1 / ∂K 1 = 2 / ∂K 2 = = ∂ ∂ ∂ƒ1 / ∂K1 ∂ƒ2 / ∂K2 6. The marginal rate of substitution between factor supply and the consumption of a commodity for a household has to equal the marginal productivity of that factor in the production of that commodity ∂ / ∂L/ K ∂ƒj j = ∂ /∂ ∂L/ Kj 2.2 Constraints on Tax Rates 1. MRS are the same across households if tax rates on goods are the same for all households ∂ 1/ ∂ 1 1 ∂ 2/ ∂ 2 1 q1 (1 + t1 ) = = (40a) ∂ 1/ ∂ 1 2 ∂ 2/ ∂ 2 2 q2 (1 + t2 ) 2. MRFS are the same across households if tax rates on factor incomes are the same for all households ∂ 1 / ∂L1 ∂ 2 / ∂L2 (1 − t ) = = (40b) ∂ 1 / ∂K 1 ∂ 2 / ∂K 2 r(1 − tr ) 3. MRTS are the same across production processes since households pay the tax (otherwise condition 6. had to hold) ∂ƒ1 / ∂L1 ∂ƒ2 / ∂L2 = = (40c) ∂ƒ1 / ∂K1 ∂ƒ2 / ∂K2 r Optimal Commodity Taxation 26 4. MRT are the same across production processes since households pay the tax (otherwise condition 5. had to hold) ∂ƒ2 / ∂L2 ∂ƒ2 / ∂K2 q1 = = (40d) ∂ƒ1 / ∂L1 ∂ƒ1 / ∂K1 q2 5. MRT equals MRS if tax rates on both goods are equal q1 q1 (1 + t1 ) = (40e) q2 q2 (1 + t2 ) 6. MRTS equals MRFS if tax rates on both factors are equal (1 − t ) = (40f) r r(1 − tr ) 7. MRS between factor supply of a factor and consumption of a good equals the marginal productivity of that factor in the production of that good if tax rates on factors are the negative tax rates on goods (1 − t ) = (40g) q1 (1 + t1 ) q1 • in sum, the constraints are: t = tr = −t1 = −t2 := t 2.3 Tax Revenue under Pareto-Efﬁciency • Tax revenues under the constraints presented are given as: R = ( L + rK)(t) + (q1 + q2 )(−t) (41) • since we assume a linear homogeneous production function, the sum of the in- comes must equal the sum of the goods produced • there are no net revenues • taxing proﬁts – Extending the model and allowing for proﬁts shows that taxing proﬁts does not have any distortionary effects in the short run, when households take proﬁt income as exogenously given – But when making the employment decision (worker vs. entrepreneur) endo- genous in the long run, taxes on proﬁts are distortionary in the sense that they bias this decision towards becoming a worker. • Result: A tax system that preserves Pareto-efﬁciency cannot generate any revenues. 2.4 Non-distortionary Tax Systems • any tax system that doesn’t change marginal behavior is non-distortionary • if income is ﬁxed and all income is consumed, a proportional tax on all goods is non-distortionary • if income is ﬁxed and there is only a ﬁxed consumption bundle available, taxing that boundle is non-distortionary • if leisure can be taxed, a proportional tax on both leisure and work is non- distortionary 27 Lion Hirth: Taxation • if “not consuming” can be taxed, a proportional tax on consumption and “not consuming” is non-distortionary • the crucial assumption for the result derived in section (2.3) are – non-taxability of leisure – endogeneity of labor supply – income taxes are not available – constant returns to scale – taken together, these assumptions effectively rule out any lump-sum taxation – in this sense, we get out of the model what we already assumed 3 Theory of the Second Best • this subsection draws heavily of Lipsey and Lancester’s (1956) article in the Re- view of Economic Studies, The General Theory of the Second Best • the General Theory of the Second Best – a Pareto efﬁcient allocation requires the fulﬁllment of all optimality condition simultaneously – if an additional constraint (as the revenue requirement in combination with the no lump-sum taxes assumption) prevents the allocation from attaining Pareto-efﬁciency, in general all other optimality conditions have to change to attain a second best solution – there is no way how to judge a priori in which direction and by what amount the conditions change – especially, it is not true that a situation where more (but not all) PE-conditions are fulﬁlled is superior to a situation where less conditions are fulﬁlled – or, as Stiglitz (1987) puts it, “counting the number of distortions is no way to do welfare analysis!” – this implies that introducing a second (or third) distortion might be beneﬁcial • implications for the analysis of optimal taxes – we might be able to counterbalance the distortion introduced by taxation with introducing another tax or by deviating from the rules we have derived for the ﬁrst-best solution – what follows in the subsequent sections is essentially a second best analysis that shows that in general we have to deviate from the rules derived in section (2) – there it was shown that commodity taxes should be uniform – when introducing the assumption of non-taxability of leisure (and thereby ruling out the ﬁrst best), it will be shown that uniform commodity taxes are not longer desirable – in sum, in a second best analysis we are not looking for non-distortionary taxes, but for optimal distorting taxes • why second best optimal taxes are almost always distortionary – it is no incidence that almost always we will ﬁnd the second best tax struc- ture to be distortionary – starting from ﬁrst best (non distorting) commodity taxes, introducing a small distortion is of second order (Envelope theorem) – but the effect on labor supply are of ﬁrst order and thus dominate the second order effect Optimal Commodity Taxation 28 4 Homogeneous Households • the model – simple GE model – one (representative) household (this is equivalent to assume homogeneous households who are identical in their utility functions and their productivity) – = ( 1 , 2 , F) – n consumption goods, each covered by a speciﬁc tax τ , but no tax of F (“good number n+1”) – leisure cannot be taxed since it’s consumption doesn’t involve market tran- saction and thus isn’t observable by tax authorities – labor is the only production factor – L=T −F – perfect competition – constant returns to scale / linear homogeneous production functions – perfect competition and CRS imply that there are no proﬁts, so that ﬁrms cannot bear the tax – wage is ﬁxed and untaxed – labor income and in addition exogenous income y • central ﬁndings – the Ramsey Rule states that the relative reduction in Hicksian de- mand should be equal for all goods to minimize efﬁciency loss of a tax system – this shows that in the second best optimum taxes should not be non-distortionary, but distort optimally – this result is hard to implement since Hicksian demand cannot be obser- ved – the ﬁrst best is not attainable since we don’t allow for taxing leisure; putting it differently, for (n + 1) goods there are only n tax instruments available – using additional assumptions, this result can be made more tackable – assuming inelastic supply of labor or assuming homothetic prefe- rences we ﬁnd that a uniform commodity tax rate is optimal – rewriting the Ramsey Rule in terms of wage elasticities makes clear that tax differentiation increases efﬁciency when it can indirectly tax the untaxa- ble good leisure by taxing its complements (and ﬁxed labor supply as well as homothetic preferences just make this impossible) – the results are not robust for changes in the assumptions – → no consistent policy recommendation emerges – in general, a uniform commodity tax (VAT) is not even second-best 4.1 Ramsey’s Rule • Utility and Revenues – = ( 1 , 2 , F) (utility given the quantities of goods and leisure consumed) – = (p1 , p2 , , y) (indirect utility given optimal choices for quantities and given prices, the wage rate and income) – note that ∂ / ∂p1 = ∂ / ∂τ1 – revenues are assumed to be exogenously given: R = τ · = R > 0 • the government is maximizing the utility of the household with respect with re- 29 Lion Hirth: Taxation spect to τ1 , τ2 , ..., τn , λ given its own budget constraint L = (p1 , p2 , ..., pn , , y) + λ τ · −R (42) ∂L ∂ n ∂ = + λ k+ τ · =0 (43) ∂τk ∂pk j=1 ∂pk – there are (n − 1) FOCs like (43) – the term in the brackets is ∂R/ ∂τk and must be positive (if not, we are in an inefﬁcient area of excess taxation, where higher taxes result in lower revenues), thus λ must be positive – in the case of excess taxation we couldn’t obtain an interior solution – the ﬁrst term in the brackets ( k ) is positive, and it can be shown that the second (∂R/ ∂τk − k ) is negative • solving the system of FOCs results in two alternative interpretations for optima- lity – the ratio if utility loss and revenue gain have to be equal for all taxes / the additional tax revenue for every unit of utility lost due to a tax increase has to be equal for all taxes ∂R/ ∂τk ∂R/ ∂τj = (44a) ∂ / ∂τk ∂ / ∂τj – the ratio of utility losses for two taxes has to equal the ratio of revenue gains for those two taxes ∂ / ∂τk ∂R/ ∂τk = (44b) ∂ / ∂τj ∂R/ ∂τj • there is an alternative way to derive this result: – for optimal taxation, two conditions have to hold simultaneously ∂R ∂R dR = dτk + dτj = 0 (45) ∂τk ∂τj ∂ ∂ dR = dτk + dτj = 0 (46) ∂τk ∂τj – solving and setting equal results in the same result as above – the two condition imply that under optimality any change in a tax rate (while holding the others ﬁxed) reduces revenues or reduces utility: there is no Pareto-improvement possible • Using Roy’s Identity – Roy’s identity: ∂ / ∂pk − = k (47) ∂ / ∂y – substituting the identity in (44b) results in: ∂R/ ∂τk k = (48) ∂R/ ∂τj j – the ratio of revenue gains has to equal the ratio of consumption (or output) levels of the goods taxed – this is hard to implement since changing a tax rate effects all quantities consumed and thus effects the revenues derived from all other taxes Optimal Commodity Taxation 30 • the Slutzky Equation – the Slutzky equation states that any change in quantity due to a price change (of the good or another good) can be separated into a substitution effect and an income effect ∂ H ∂ j j ∂ j ∂Y = + · ∂pk ∂pk ∂Y ∂pk ∂ j ∂ H j ∂ j = − · k (49) ∂pk ∂pk ∂Y – where H is the Hicksian demand (dependent on level of utility while Mars- hallian demand is dependent on level of income) – this holds for m = k and m = k: the effect of a price change on the same good can be separated, but the effect on all other goods too – the substitution effect is the change in demand for the utility level hold ﬁx (and changing relative prices), the income effect is the change in demand holding relative prices ﬁx (and changing income) • using Slutzky in the optimality condition (43) – we assume that the marginal utility gain from income is constant: ∂ / ∂τk = −α k – substituting in (43) and solving for α k , using the Slutzky equation and col- lecting terms yields: ∂ α k =λ k + τ · ∂pk α−λ ∂ k = τ · (50) λ ∂p k α−λ ∂ H ∂ k k = τ · k − · j λ ∂pj ∂Y ∂ H α−λ ∂ k k k + τ · = τ · λ ∂Y ∂pj ∂ H k kb = τ · (51) ∂pj – with b being independent of which good k is looked at and of the same value for any good k – it can be shown that b has to be non-positive to result in positive government revenues – this equation holds only if λ > α > 0 • The Ramsey rule – assuming that the taxes are introduced, (51) can be rewritten as H kb = dXk (52) H – where dXk is the impact of changes in all taxes on the Hicksian demand (compensated demand) for good k – this implicitly deﬁnes the optimal tax structure – it states that the reduction of Hicksian demand has to be proportional to the Marshallian demand, and since both demands are the same (in levels), the reduction in demand should be proportional to the demand 31 Lion Hirth: Taxation • Results – The relative reduction in Hicksian demand due to the tax system has to be equal for all goods(Ramsey’s Rule) – we have shown in section (2) that a non-distortionary tax system rises prices proportionally (but doesn’t generate revenues) – this result cannot be extended to the second best – a second-best revenue generating tax system doesn’t rise prices proportionally, but reduces Hicksian demand proportionally – that means, in general, uniform commodity taxes cannot be even se- cond best – since it is probably empirically impossible to design a tax system according to the Ramsey rule, the main ﬁnding of this section is negative 4.2 Reformulating the Ramsey Rule • implementing the Ramsey Rule is difﬁcult because Hicksian demand is not ob- servable • in this section, the result is rewritten in two different ways that allow an easier implementation (although it is still pretty problematic) • rewriting the Ramsey rule both in terms of income elasticities of demand and wage elasticities of Hicksian demand shows that in general, commodity taxes should be differentiated (and not uniform) • this result can be interpreted as an application of the “Theory of the Second Best”, since they are a violation of the ﬁnding in section (2) • since we have introduced another constraint (R > 0), the old optimality conditi- ons aren’t desirable anymore a) Income Elasticities of Demand • income elasticities of demand can be observed • intuition – welfare loss is caused be substitution effects only – the higher the share of overall reduction in demand for a good due to a pure income effect, the smaller the share for the substitution effect and thus the smaller the welfare loss – the higher the income elasticity of a good, the larger the share of the income effect – thus we try to maximize the income effect by taxing goods higher that have a high income elasticity of demand • we assume that all substitution effects are symmetric ∂ H H j ∂ k = (53) ∂pk ∂pj ∂ j ∂ j ∂ k ∂ k + k = + j (54) ∂pk ∂y ∂pj ∂y ∂ j ∂ k ∂ k ∂ j = + j − k (55) ∂pk ∂pj ∂y ∂y Optimal Commodity Taxation 32 • this expression is substituted into (50) α−λ ∂ j k = τj · (56) λ ∂pk α−λ ∂ k ∂ k ∂ j k = τj · + j − k (57) λ ∂pj ∂y ∂y α−λ 1 ∂ k ∂ j 1 ∂ k − ·τ j − k = τ · (58) λ k ∂y ∂y k ∂pj • deﬁning the impact of all taxes on good k as above as dXk , we can rewrite the equation: α−λ 1 ∂ k y ∂ j y dXk − τj j − = (59) λ y ∂y k ∂y j k α−λ 1 dXk − (εk yR − z) = (60) λ y k • where R is the tax revenue of the government and z is a constant, independent from good k • Result: the higher the income elasticity for a good, the higher should be the tax-induced reduction in (Marshallian) demand • for both ﬁxed labor supply (section a)) and homothetic preferences (section b)) the income elasticities of demand are equal for all goods, hence goods are tax uniformly – with ﬁxed labor supply, income is ﬁxed; since there are no income elastici- ties, they can’t differ – with homothetic preferences, the consumption bundle doesn’t depend on the income, so all income elasticities are equal (they are all unity) b) Wage Elasticities of Hicksian Demand • the idea of representing the Ramsey Rule in terms of wage elasticities of Hicksian demand rests on the assumption that leisure is the untaxed good • intuition – commodity taxes have to (partly) replace the missing tax on leisure to mini- mize welfare loss – that is, a second distortion is introduced to counterbalance the ﬁrst distortion – leisure is here interpreted as a consumption good with the price – note that wage elasticities are just another cross-price elasticity – good that are strong complements of leisure should be taxed higher because their consumptions makes the tax base smaller – that is, we should tax goods heavily that have a (highly) negative wage elasticity of demand (Corlett-Hague-Rule) – as Homburg argues, substitutes of leisure might be coffee, while liquor and watching movies are complements – this view is a strong support for differentiated taxes (in contrast to subsecti- ons a) and b)) ∗ ﬁxing labor supply means ﬁxing leisure, which means there is no need to tax complements extra ∗ homothetic preferences mean by deﬁnition that all wage elasticities are zero, thus there are no complements to leisure by deﬁnition of the utility function (there aren’t any substitutes either) 33 Lion Hirth: Taxation • for this analysis, we restrict our model to two goods instead of n • equation (51) then collapses to just two equations; collecting terms brings us to: ∂ H ∂ H 1 1 1b = τ1 + τ2 ∂p1 ∂p2 ∂ H 2 ∂ H 2 2b = τ1 + τ2 ∂p1 ∂p2 ∂ H ∂ H ∂ H ∂ H ∂ H ∂ H 1 2 1 2 2 1 τ1 − =b 1 − 2 (61) ∂p1 ∂p2 ∂p2 ∂p1 ∂p2 ∂p2 τ1 εH εH − εH εH = bp1 εH − εH 11 22 12 21 22 12 (62) • if we assume normal goods, the own price elasticities have to be negative and, since we only have two goods, this implies that the cross price elasticities have to be positive • this means that the right brackets are negative, and, since b is negative, the left brackets have to be positive • since τ = pt/ (1 + t) we can simplify t1 / (1 + t1 ) εH − εH 22 12 =b (63) t2 / (1 + t2 ) εH εH − εH εH 11 22 12 21 • it can be shown (with some effort) that this equals the sum of three elasticities: the (own) price elasticities of good 1, 2 and the wage elasticity. This is the rule of free time complenetarity: t1 / (1 + t1 ) ε11 + ε22 + ε1 = (64) t2 / (1 + t2 ) ε11 + ε22 + ε2 • if both wage elasticities are equal, the term collapses to unity and we have an uniform tax rate • in this case, there is no possibility to tax a complement of leisure, since both goods are equally good complements (or substitutes) • since both εH and εH are negative, the higher the wage elasticity for a good, 11 22 the smaller the tax rate should be (the elasticity of complements is negative) • Result: complements of leisure should be taxed at a higher rate 4.3 Special Cases: Additional Restrictions • implementing the Ramsey Rule is difﬁcult because Hicksian demand is not ob- servable • we have seen that rewriting it helps, but the results are still hard to implement • in this section, we impose three additional assumption to simplify the result • note that all three assumptions are fairly strong and no robust result appears a) Fixed Labor Supply • assumption of ﬁxed labor supply might be suitable in certain demographic groups (e.g., “prime age” men) • this assumption makes leisure consumption ﬁxed, too • this effectively removes the consequences of non-taxability of leisure Optimal Commodity Taxation 34 • taxing labor income means no distortion, since there is no work-leisure decision: L=L • that means, taxing labor income is a non-distortionary lump-sum tax that preserves Pareto-Efﬁciency: it cannot be avoided by changing the behavior • as we have shown, taxing labor is equivalent to a uniform tax on commodity (because this commodity tax doesn’t distort the consumption decision) • a tax on labor income, a uniform commodity tax, or any combination of these two are equivalent: we have an inﬁnite number of possible lump-sum taxes b) Homothetic Preferences • Homotecitiy and Separability of the utility function – utility function can be divided into a function of the consumption bundle and leisure: (C( 1 , ..., n ), F) (separability) – the sub-utility function of the consumption bundle is homogeneous of degree z (has not to be unity) – this implies that the partial derivatives are homogeneous of degree z − 1 • the household’s maximization problem – in the budget constraint, leisure is explicitly modeled as a consumption good T+y− F− p =0 (65) – given the homothetic utility function it can be shown that regardless of the level of consumption (the size of the consumption bundle) the composition of the bundle is ﬁxed – that is, the household will always spend the same share of its income on consumption of a certain good – the separability assumption guarantees that for a given overall tax burden the tax rates can affect the composition of the consumption bundle, but not the work-leisure decision – this is because the marginal utility of leisure doesn’t affect the marginal utility of leisure directly • Result: uniform tax rates on all commodities are optimal • as in the case of inelastic labor supply, this result is driven by the assumption that changing the tax structure cannot affect the work-leisure decision • for homothetic preferences, both wage elasticities of Hicksian demand and inco- me elasticities are zero c) Zero Cross-Price Elasticities • Ramsey Rule states that quantities should fall proportionally to their Hicksian demand • that implies that goods with a high price elasticity should observe a small price increase, that is, a small tax rate • but the Ramsey Rule states the quantity reduction due to the whole tax sys- tem (that is, the price change of all goods) should be proportional, while price elasticities are deﬁned in terms of the own price only • in this section we assume that the Hicksian demand for a good is only affected by the price change of the same good, that is, there are no cross-price effects (no substitution effects): εjk = 0 for all j = k 35 Lion Hirth: Taxation • since ∂ j ∂pk = 0 for all j = k, equation (50) then collapses to α−λ ∂ k k = τk · λ ∂pk α−λ τk = εkk λ pk α−λ tk = εkk (66) λ 1 + tk • since α and λ are independent from the good k, this is a constant for all goods, the inverse elasticity rule holds: tj (1 + tk ) εk = (67) tk (1 + tj ) εj • the ad valorem tax rates have to be inversely proportional to the price elasticities of demand • if price changes don’t effect the demand of other goods, then redu- cing Hicksian demand proportionally is equal to reducing Marshallian demand proportionally • this is done by taxing price elastic goods less than price inelastic goods • note that this stands in sharp contrast to the results derived in the preceding subsections • the inverse elasticity rule was for decades the most general result derived from optimal commodity taxation, before Ramsey’s (1927) paper was discovered again 5 Heterogeneous Households • here we drop the assumption that all households are equal in their utility functi- ons • to get any results, we have to work with a social welfare function • here we use a Bergson-Samuelson-type of welfare function • the analysis gets that complex, that we consider only the special case of zero cross-price elasticities • central ﬁndings – goods consumed more by high-income households should be taxed more heavily – there is a trade-off between so called efﬁciency and equity 5.1 General Result • we use a model with N households, n goods and speciﬁc taxes on all goods, but no income tax • taxes are uniform across households (that means, no discrimination between consumers is possible) • the objective function is the welfare (being a function of N indirect utilities) with Optimal Commodity Taxation 36 the government’s budget constraint reads like this: W = W( 1 , 2 , ..., N ) = (p1 , p2 , ..., pn , , y) R=R n N L = W( 1, 2 , ..., N) + λ τj j − R (68) j • the objective function is maximized with respekt to τk , and the marginal indirect utility of income is assumed to be constant (∂ / ∂y = α1 ), so that Roy’s identity reads: ∂ / ∂pk = ∂ / ∂τk = −α k ∂L N ∂W ∂ n ∂ j = + λ k + τj =0 (69) ∂τk ∂ ∂τk j ∂pk n ∂ N ∂W j λ τj = α k −λ k (70) j ∂pk ∂ • at this point of the analysis, we’re stuck 5.2 Zero Cross-Price Elasticities • to make the analysis easier, we have to assume zero cross price elasticities (as we did before as an additional restriction in the case of one household) • the left handy side simpliﬁes (similar to equation (50)) • further, we use the fact that τk / pk = tk / (1 + tk ) and the deﬁnition of price elasti- cities to get ∂ N ∂W k λτk = α k −λ k ∂pk ∂ τk ∂ pk N ∂W k k λ = α −λ pk ∂pk k ∂ k tk N α ∂W k εkk = −1 (71) 1 + tk λ ∂ k • since the own-price elasticity is negative and the sum must be positive, a larger expression in the sum implies a smaller tax rate • if utility is decreasing with income and the marginal changes of utility of low- utility households are valued more by society than those of high-utility house- holds, both α and ∂W/ ∂ are large for low-income households and small for high-income households • Result: goods consumed mainly by households with higher income should be taxed higher • again, this is an argument against uniform commodity tax rates • for homogeneous households (or one representative household), α = α, ∂W/ ∂ = ∂W/ ∂ , and k = k , so that we can derive the well-known inverse elasticity rule as in equation (67) tk (1 + tm ) εm = tm (1 + tk ) εk 37 Lion Hirth: Taxation 6 The Production Efﬁciency Theorem • introducing another distortion? – in this chapter it has been shown that the requirements of a Pareto-efﬁcient allocation cannot be fulﬁlled at the presence of commodity taxes, unless labor supply is ﬁxed – the theory of the second best shows that it might be beneﬁcial to establish a second distortion to counterbalance the distortionary effects of taxes – we have shown that indeed this is done by taxing complements of leisure higher (and by taxing goods with a higher income elasticity higher) – another idea would be to tax ﬁrms differently (e.g., according to their labor intensity) – the “Production Efﬁciency Theorem” states that production shouldn’t be taxed differently • importance of the Production Efﬁciency Theorem – the theorem was derived in a seminal paper by Diamond and Mirrlees (1971, AER) – it is “perhaps the most important result of the theory of taxation” (Stefan Homburg 2007, p. 181) – in middle of a second best world where “nothing can be said” a positive and robust result emerges: don’t tax intermediate goods! • intution of the production efﬁciency theorem – ﬁrms might be taxed differently by different taxes on intermediate goods (taxing factor inputs differently across sectors is equivalent to taxing inputs differently) – one might think that taxing (or subsidizing) ﬁrms according to the labor in- tensity of production might counterbalance the labor supply reducing effect of commodity taxation – but this implies that the marginal rates of technical substitution vary across ﬁrms (since input costs vary) – this brings the economy away from the production possibility frontier and can’t be beneﬁcial – to frame it differently: introducing a second distortion in the commodity de- cision had the price of bringing us away from the optimal consumption point, but had the beneﬁt of increasing production (moving the PPF to the upper right) – introducing a distortion on the production side has the costs of bringing us down from the PPF, but has no beneﬁt – the deeper reason for this is that the tax system distorts consumption decisi- on (consumption vs. leisure), but not production decisions (labor vs. capital) – since there is no distortion, we can’t counterbalance it; the ﬁrst best FOC holds – on a more abstract level this means: taxes should be located as close as possible to the objective function of the taxpayers • formal model – suppose the production of consumer goods with two intermediate goods 1 and 2 as inputs (with prices s1 and ss ) Optimal Commodity Taxation 38 – taxing the intermediate goods yields for proﬁt maximization: π = qƒ ( 1 , 2 ) − s1 (1 + t1 ) 1 − s2 (1 + t2 ) 2 ∂π ∂π ∂ 1 ∂π ∂ 2 = ƒ ( 1, 2) + + = ƒ ( 1, 2) = (72) ∂q ∂ 1 ∂q ∂ 2 ∂q ∂π ∂π ∂ 1 ∂π ∂ 2 = −sz z + + = −sz z ; for z = 1, 2 (73) ∂tz ∂ 1 ∂sz ∂ 2 ∂sz – this is because the second and third terms of both conditions are zero accor- ding to the Envelope-Theorem – since proﬁts are zero both before and after introduction of the taxes: ∂π ∂π dπ = dq + dtz = 0 ∂q ∂tz dq ∂π/ ∂tz sz z = = >0 (74) dtz ∂π/ ∂q – for the welfare analysis, we look at the case of a representative household who consumes two consumer goods (which can be taxed with τ1 and τ2 ) and taxes on the intermediate goods that are used to produce the two consump- tion goods R = t1 s1 1 + t2 s2 2 + τ1 1 + τ2 2 L = (p1 , p2 , y) + λ t1 s1 1 + t2 s2 2 + τ1 1 + τ2 2 −R (75) – FOCs are pretty messy and skipped here; substituting and rewriting them results in the condition: ∂ 1 ∂ 2 sz z = sz z + t1 s1 + t2 s2 (76) ∂tz ∂tz – this condition is only fulﬁlled if t1 = t2 = 0 • Results: – intermediate goods shouldn’t be taxed – there is no need to complement a system of commodity (consumer good) taxes with taxes on intermediate goods – in contrast to the Ramsey rule, this result can be implemented directly – since tariffs on inputs can be interpreted as taxes on intermediate goods, this result is also an argument against tariffs on intermediates • empirical interpretations – it was claimed above that the production efﬁciency theorem is one of the central if not the most important result of the theory of optimal taxation – so it is straightforward to as: “has this been implemented in empirical tax systems?” – Homburg argues that both the principle of taxing only value added (“Netto- prinzip”) as well as input tax deduction (“Vorsteuerabzug”) are in line with the production efﬁciency theorem – similarly, training can be seen as a intermediate good that doesn’t cause any utility directly, thus expenses for education and training should be de- ductible from taxes 39 Lion Hirth: Taxation IV Optimal Income Taxation • historical role of income taxes – monetary income taxes appear fairly late (1799 in England, 1869 the ﬁrst time in Germany (in Hessen), and in the US) – note that both in the US and in England this happened in the context of extreme revenue needs due to large wars – ﬁrst, people need monetary incomes – second, trade taxes, inﬂation taxes, and some consumption taxes are easier to collect – third, if only few people receive monetary incomes, they often have the po- litical power to prevent income taxation – during the last decades, the share of income taxes in total revenues has been declining on most countries of the world (developed as well as less developed), while there was a remarkable increase in revenues of indirect (commodity) taxes, mainly through VAT • a short history of optimal income taxation literature – at least since the late 19th century economist argue for progressive taxation on fundamental principles – the English economists Francis Ysidro Edgeworth and Arthur Cecil Pigou were the main contributors in this ﬁeld – both argued that diminishing marginal utility in consumptions in combination with utilitarian (and other) welfare functions implied progressive taxation – with identical utility function, diminishing marginal utility and a special utili- tarian welfare functions, social welfare is obviously maximized when dispo- sable incomes are equalized – both Edgworth and Pigou ignored the incentive effect of taxation (the effect of income taxes on labor supply) – the “New Welfare Economics” if the 1930s argued that the question of whe- ther taxes should be progressive is a philosophical one and limited themsel- ves to characterizing Pareto-efﬁcient allocations – the ﬁrst one to recognize the incentive effect of income taxation was the Scot James Mirrlees in his 1971 seminal article (for his “contributions to the economic theory of incentives under asymmetric information” he won the 1996 Nobel prize together with William Vickrey) – the the 1980s Joseph Stiglitz contributed much to what he calls the “New New Welfare Economics” – he argues that the fundamental problem in taxation is the lack of information that doesn’t allow the government to make lump-sum redistributions – this implies that the whole Ramsey tradition-analysis of commodity taxation is ﬂawed since here lump-sum taxation is excluded completely by assump- tion - in fact (argues Stiglitz) only household-speciﬁc lump-sum taxes aren’t feasible • general ideas of today’s optimal income tax literature – potential distortions of the consumption decision are neglected as commo- dity taxes are not available – only the work-leisure decision can be distorted by taxes – households are assumed to differ in their productivity and thus in their wa- ges, while they are identical in their utility functions – but tax authorities cannot observe productivity (in contrast to ﬁrms) and thus have to tax based on observed income Optimal Income Taxation 40 – household-speciﬁc lump-sum taxes are not available (since tax authorities cannot identify households: that is, informational asymmetries make household- speciﬁc lump-sum unavailable) – but general lump-sum taxes are available in the sense that zero marginal rates are possible while having a positive burden – the government wants to tax differentially to make income gaps smaller or even equalize incomes (justiﬁed, for example, with a Bergson-Samuelson- type of welfare function) – differential tax burdens for households imply that household have an incen- tive to change their behavior to reduce their tax burden (and this change reduces welfare) – this becomes the central problem in optimal income taxation analysis: how to prevent mimicking – if we could prevent mimicking costless, we could tax without distortion and equalize incomes without costs – but prevent mimicking is costly in terms of distortion (we have to introduce a positive marginal tax rate) – the additional distortion (besides reducing labor supply) arises but from mi- micking – in other words, people don’t try to avoid taxes by work less and reduce tax payment automatically, but by behaving like a different household to pay a lower “lump-sum” tax • the income tax function available is highly ﬂexible – in general, income taxes are variable in absolute value and at the margin – it is possible to tax a household positively while setting the marginal rate to zero – that is, the tax function can have any functional form – indeed, the optimal tax function in general is highly non-linear, not differen- tiable and very complex – it is hard to see any income tax in the world that is organized like this (in- deed, it is hard to imagine to build one in a democratic process of policy- making) – this assumption is partly made for analytical purposes (to clarify the pro- blem): even if we allow for (general) lump-sum taxes, distortions arise • organization of part IV – ﬁrst, the individual effects of marginal and absolute taxation of the labor supply is analyzed (this is closely related to the analysis of effects of wage changes on labor supply) – second, labor supply is assumed to be ﬁxed and it is shown that income taxes are lump-sum taxes and incomes can be equalized (if this is wished by society) without effecting efﬁciency – third, under endogenous labor supply this is not longer the case: to avoid mimicking, disposable incomes cannot be equalized. This is the basic model of the optimal income taxation analysis. – fourth, the analysis is generalized to a continuum of households – ﬁfth, cases of both more general and more restricted tax functions are looked at – sixth, the analysis is generalized by ﬁrst allowing for home production and second looking at tax shifting – seventh, both income and commodity taxation is allowed for. It is shown that if utility functions are identical and separable, commodity taxes are not 41 Lion Hirth: Taxation needed. If they are not, they have to be used to obtain a second-best result. • this part of the script draws as well on Laszlo Goerkes’ lecture as on Joe Siglitz’ article in the 1987 edition of the Handbook of Public Economics 1 Wages, Taxation, and Labor Supply • total differentiating ( , L ) results in a upward sloping and convex indifference curve in the income-consumption space d ∂ / ∂L =− >0 dL ∂ /∂ d2 >0 d(L )2 • assume that households pay an income tax T depending on their wage earnings T( L ), that is weakly increasing with income and has a constant marginal rate (for example, a proportional tax rate) • maximizing the household’s utility with respect to consumption and labor under the budget constraint results in the condition that the (negative) marginal ra- te of substitution between consumption and labor has to equal the change in disposable income (since the price of x is normalized to unity) −λ =0 (77a) L +λ (1 − T ) = 0 (77b) L −T− =0 (77c) − = (1 − T ) (77d) L • we want to solve the system for L and differentiate the term with respect to T , T, and - but we can’t, since is not speciﬁed • by totally differentiating the three FOCs and solving the system (either by substi- tuting or using Kramer’s rule) we can get partial effects without solving explicitly for L L −1 d 0 0 0 dT (1 − T ) dL = λ 0 −λ (1 − T ) dT (78) L LL −1 (1 − T ) 0 dλ 0 1 −L (1 − T ) d • a pure increase of the marginal tax rate (holding tax level T constant) decreases labor supply, and changes in the level of taxation have am- biguous effects on labor supply • a (non-pure) fall in the marginal tax rate has a positive substitution effect and an ambiguous income effect (allowing for a change of the tax level due to the change in the marginal tax rate) • a rise in the wage rate is qualitatively the same as a fall in the marginal tax rate • as it is well know, a rising wage rate has a positive substitution effect and an ambiguous income effect – price of leisure is going up relatively to the price of consumption, which decreases leisure and thus increases labor supply – if leisure increases due to the rising income, depends on the question if leisure is a normal good Optimal Income Taxation 42 – if it is a normal good, the income effect on labor supply is negative (more income means more leisure which is less labor) while the substitution effect is positive (a higher relative price means less leisure which is more labor), thus the overall effect is ambiguous 2 Fixed Labor Supply • historically, in the analysis of optimal income taxation, labor supply has often be assumed to be constant • obviously, and as mentioned before, with ﬁxed labor supply there is no efﬁciency loss due to taxation because households can’t change their behavior: the income tax is a lump-sum tax • in this case, “the other” objective of society can be reached perfectly: if utilities have equal weight in the welfare function, disposable incomes are equalized = ( ) L( 1 , 2 ) = W( 1 , 2 ) + λ(z1 − 1 + z2 − 2 − R) (79) ∂L ∂W ∂ = −λ=0 ∂ ∂ ∂ ∂ 1 ∂ 2 = ∂ 1 ∂ 2 1 = 2 • this implies that the marginal tax rate for household 2 is 100% and for household 1 is 50%: T2 = 1, T1 = 0.5 • if the revenue needs are bigger than twice the difference of the households in gross income, T1 becomes negative 3 Variable Labor Supply • general idea – there are two households that differ in their wages and thus in their choices of consumption and labor supply – this assumption is equivalent to households that differ in the relative prices of two consumption goods – since the analysis involves equity issues, a social welfare function is maxi- mized – any income taxation is allowed: tax burden and marginal rate can be set freely for each individual household – this very ﬂexible assumptions cause strange results for the optimal tax func- tions (such as zero marginal rates with positive tax burden is hard to imagine and doesn’t exist empirically) • central ﬁndings – if labor supply is endogenous, the tax structure has to be set in a way that the high wage household doesn’t behave like the low wage household (mi- mic), since this can’t be optimal – the high wage household is not taxed at the margin, while the low wage household faces positive marginal taxes – incomes are not equalized, but income gaps are reduced 43 Lion Hirth: Taxation 3.1 Model • objective function – the government’s preferences are captured by a Bergson-Samuelson social welfare function W = W( , 2 ) – the concavity of the function can be interpreted as the society’s aversion against inequality – determining the marginal conditions for Pareto efﬁciency would give qualita- tively the same results • two households – the two households differ in their productivity and thus in their wages, their labor supply and their incomes: 2 > 1 > 0, z = L – utility is a function of consumption and labor and utility functions are equal: = ( ,L ) = ( ,z / ) – totally differentiating the utility function shows that the indifference cur- ves are increasing in the consumption-income space, that they are convex and that the high productivity household has a ﬂatter indifference curve: d / dz = − / ( L ) • tax system – wages are the only income source and households are taxed regarding to their gross income only; commodity taxes are not feasible – the government can neither observe the type (productivity) of the house- hold, nor the labor supplied, only gross earnings – thus taxes are based on gross income only: T = T(z ) – T is weakly increasing in z : T ≥ 0 – gross income is z = L , all income is spend on one consumption good (or a bundle), so that = z − T(z ) – the income tax structure is not speciﬁed, thus the budget constraint in for- mulated as R = z1 − 1 + z2 − 2 – that effectively allows for lump-sum taxation, since it allows for positive tax burden in combination with zero marginal tax – the only remaining problem is that not each household can be taxed in a lump-sum manner individually, since household characteristics cannot be observed (only gross income) 3.2 Self-Selection Constraint • it is assumed that the welfare function is convex, that is, government tries to equalize disposable (after-tax) incomes; this rules out a Pareto-efﬁcient tax sys- tem • households have two ways to reduce tax burden – reduce labor supply to reduce gross income and thus taxes (only attractive if taxes are strictly increasing with income) – mimic the other household to beneﬁt from its lower taxes (only attractive if taxes are lower for the other household) – both changes of behavior reduce welfare since they occur only to save taxes – optimizing welfare can be understood as ﬁnding an efﬁcient trade-off bet- ween these two negative effects • mimicking cannot be efﬁcient – if a household is mimicking, it is ﬁxing its income, and thus also its labor supply Optimal Income Taxation 44 – that means, they can’t be a inner solution of utility maximization and thus they can’t be optimal – graphically, the mimicking household won’t have the utility function tangent at the budget constraint – in the case of mimicking and marginal rates of zero, taxes are effectively lump-sum, since there is no way to change behaviour that would reduce the tax burden – refraining the household from mimicking results in a Pareto-improvement – graphically, starting from a mimicking solution and moving household 2 along its (ﬂatter) indifference curve to the right leaves both households un- change in utility but increases tax revenues – in other words, a “self-selection equilibrium” is always Pareto-superior to a “pooling equilibrium” (this is only true in the case of two households) • only the high-wage household 2 has an incentive to mimic – T2 ≥ T1 is assumed in the model (above) – household 1 ∗ for household 1, mimicking household 2 would mean supply more labor, that is a loss in utility ∗ the additional consumption cannot compensate for this loss if the original values were chosen optimally ∗ in addition, her tax payment (weakly) increases, that is an additional decrease in utility ∗ overall utility has to decrease: there are no incentives to mimic – in contrast, household 2 ∗ for household 2 there is a reduction in consumption when mimicking household 1 ∗ that is insufﬁciently compensated by a reduction in labor ∗ but there might be a gain due to lower tax payments that makes the net effect of mimicking on utility positive – → thus, only household 2 has incentives to mimic • Stiglitz (1987) argues like this: – maximizing a special utilitarian welfare function requires that the marginal utilities in consumption are equal for both households ( 1 = 2 ) – the marginal rate of substitution between consumption and leisure has to equal the wage (good price is normalized to unity): L / = (that is, consumption and thus incomes are equalized) – this implies that the high productivity household 2 has a higher marginal utility of leisure, which in turn implies that he supplies more labor – that is, household 2 is actually worse off (in absolute terms) – then, obviously, he has an incentive to mimic household 1 – “Jeder nach seinen Fähigkeiten, jeder nach seinen Bedüfnissen” (Karl Marx in the “Kritik des Gothaer Programms”) • the relevant self-selection constraint (SSC) thus is: ∗ z2 ∗ z1 ∗ z1 ∗ ∗ m ∗ 2 2 , ≥ 2 1 , := 2 1 , (80) 2 2 2 • in the following analysis the SSC is assumed to be binding; for this to hold the welfare function has to be sufﬁciently convex (that is to weight equality heavily) • if the SSC were not binding, lump-sum taxation without distortions would be possible (indeed, in a limited range this is the case, a sufﬁcient convex welfare function makes sure that the welfare optimum doesn’t lie in this area) 45 Lion Hirth: Taxation 3.3 First Results • some results can be obtained before starting the formal analysis • the SSC is binding by assumption (that is, λ1 below is positive) • the same is true for the budget constraint (λ2 is also positive) • marginal tax rates at 100 percent or higher can’t be optimal: in this case house- holds are strictly better off when reducing their labor supply and thus utility and tax revenues are reduced • negative marginal tax rates cannot be optimal either: direct transfers (uncondi- tional subsidies) are Pareto-improving, since households don’t have to increase their labor supply (above the optimum) and revenues are unaffected • a tax burden of more than the gross income doesn’t make sense since the assu- med no taxation of zero income • a negative tax burden might be optimal 3.4 Optimization • welfare is maximized under the self selection constraint and the government’s budget constraint m L =W( 1, 2 ) + λ1 ( 2 − 2 ) + λ2 (R − R) z1 z2 L( , z , λ ) =W 1 1, , 2 2, 1 2 z2 m z1 + λ1 2 2, − 2 1, + λ2 z1 − 1 + z2 − 2 −R 2 2 (81) • optimizing results in four FOCs (plus the constraints) ∂ m ∂L ∂W ∂ 1 2 = − λ1 − λ2 = 0 (82a) ∂ 1 ∂ 1∂ 1 ∂ 1 ∂ m ∂L ∂W ∂ 1 1 2 1 = − λ1 + λ2 = 0 (82b) ∂z1 ∂ 1 ∂L1 1 ∂L1 2 ∂L ∂W ∂ 2 ∂ 2 = + λ1 − λ2 = 0 (82c) ∂ 2 ∂ 2∂ 2 ∂ 2 ∂L ∂W ∂ 2 1 ∂ 2 1 = + λ1 + λ2 = 0 (82d) ∂z2 ∂ 2 ∂L2 2 ∂L2 2 • note that increasing the net or the gross income of household 1 is costly (is making the SSC more binding) since it makes mimicking more attractive for hou- sehold 2 • increasing net or gross income of household 2 in turn relaxes the SSC • obviously, increasing is costly in terms of foregone government revenues, whi- le the opposite is true for increasing z • under the speciﬁc budget constraint used here (taxes deﬁned as difference bet- ween gross and net income), the tax structure is only deﬁned implicitly in the optimality conditions • the conditions don’t deﬁne the entire tax function, but only characterize the conditions at the two points the households will choose Optimal Income Taxation 46 • solving the FOCs for household 2 gives: 2L = −1 2 2 2L 2 − = (83) 2 1 – this is the well known result that the MRS (the ratio of marginal rates of two consumption goods) has to equal the ratio of prices – comparing to (77d) shows that this holds only for zero marginal income tax T2 = 0 (we talk about the marginal tax rate at the point although, as men- tioned above, the tax function will be in general not differentiable) – this is in line with the often derived result that any positive marginal income tax reduces labor supply – this result is identical to the maximization of income taxes for only one hou- sehold holding its utility ﬁxed (maximizing the vertical line between a indif- ference curve and the 45 degree line in the z-x-space) • the optimality condition for household 1 is less intuitive and pretty messy. Again, we use (77d) λ1 1+ 1L = m 1 m 1 (84) W 1 2 1+ 2L m 2 2 1L 1+ = 1 + (T1 − 1) = T1 (85) 1 1 – since all other ﬁve terms in (84)are positive, the right numerator (1+( 1L / 2 1 )) has to be positive, too – this means that household 1 is taxed positively at the margin – in contrast, if the SSC were not binding, λ1 would be zero and the marginal tax rate would be zero, too • combining the FOCs of both households we can see that ∂ m ∂W ∂ 1 ∂W ∂ 2 2 ∂ 2 − = λ1 + >0 (86) ∂ 1 ∂ 1 ∂ 2 ∂ 2 ∂ 1 ∂ 2 – given plausible assumptions about the welfare function and the utility func- tions (as in the las sub-section; marginal utilities have the same weight, decreasing utility of consumption) this implies that 2 > 1 – that means, incomes are not equalized • results and interpretation – the marginal income tax for the high productivity household is zero while the marginal rate for the low productivity household is positi- ve – this is because there is a trade-off: higher marginal tax rates for household 1 decrease its labor supply but at the same time relax the SSC and thus allow higher absolute taxation of household 2 (without inducing mimicking), but income gaps are reduced – disposable incomes are not equalized (since this would cause mimicking) – if there is no problem with mimicking (meaning the SSC isn’t binding), there is no trade-off and both household are not taxed marginally – the whole analysis is closely related to perfect price discrimination of a mo- nopolist 47 Lion Hirth: Taxation 4 Continuous Households • the results derived for two households are not general • it’s hard to derive any results for a large-n or continues households model • one reason is that for n households there are (n − 1)! SSC that have to hold • it is not the case that a general rule of decreasing marginal tax rates can be derived • in contrast, T behaves in general non-monotonically and is not differentiable • moreover, often a partial pooling equilibrium is optimal • often, quasi-linear preferences are assumed, but even for this special case, very little can be said (often cited as the main result, for example, is that the highest productivity household shouldn’t be taxed at the margin) • under these preferences, very low incomes won’t be taxed marginally either • the optimal marginal tax T rate depends of four variables and is lower – the larger the fraction of the population that pays that marginal tax rate – the smaller the shadow price of the SSC λ (which is hard to analyse, and in the case of utilitarian welfare function ﬁrst increases and then decreases). This result implies that the marginal rate for the highest earning household should be zero. – the higher the wage of the tax payers at that marginal rate – the more elastically labor supply responds • mathematical problems – a convex tax function (which is at least over a region the case if negative tax rates for poor households are negative) induces randomized wage payments – circumstances where the tax function is non-differentiable correspond preci- sely to those where a partial pooling equilibrium is optimal – if the tax function is partially convex (and since utility functions are also convex), there might be multiple tangencies • nevertheless, Mirrlees estimated his model empirically for the US and calculated a optimal tax function that was remarkably close to linear 5 Different Tax functions • this section of the script draws almost exclusively on Siglitz (1987) • the tax function T, or “tax schedule”, relates before-tax (gross) to after-tax (net) income • one of the central lessons of the last decades of taxation literature is the lesson that what is optimal depends crucially on the assumption of what types of taxes are allowed (compare the different results of Ramsey-type analysis - no lump- sum taxes allowed - and Stiglitz-type analysis, where only household-speciﬁc lump-sum taxes are not allowed for) • so far, we have at the same time limited the tax function strongly and allowed it to be very ﬂexible – the tax function was limited because we allowed it only to be a function of the wage income: T = T( L) – this excluded, for example, random taxation – the tax function was very general because any functional form was allowed (indeed, it was shown that the optimal income tax is highly non-linear) – in real world, we observe tax functions that a much more simple, either for practical reasons (administration, collection and monitoring costs) or for political reasons (negative marginal rates for high income earners wouldn’t Optimal Income Taxation 48 be easy to argue for in a democracy) • in this section, both generalizations and restrictions are discussed 5.1 Random Taxation • in ex ante randomization the government assigns individuals randomly to one of two tax functions • in ex post randomization the individuals are assigned only after they have an- nounced their productivity • ex ante randomization is always beneﬁcial if the welfare as a function of tax revenues W(R) is convex • ex post randomization is beneﬁcial, for example, if household 2 is much more risk averse than household 1 • the chance of loosing much (when paying the high taxes) makes it for household 2 less attractive to mimic (because earning little and paying much is a scaring scenario for household 2 while not so much for household 1) • note that if individuals are risk averse, ex post randomization has the costly effect of introducing risk on both households • random taxation violates the principle of horizontal equity • one interesting result of the debate is that the principle of horizontal equity may in fact be inconsistent with Pareto efﬁciency (which rises doubts on Pareto efﬁ- ciency, too) • a second lesson is that it is not a trivial question what the set of available taxes is (and results depend crucially on this decision) 5.2 Linear Tax • not only empirically observed income taxes are much more simple than the op- timal highly non-linear optimal tax, most debate in recent years has been about simplifying it further (both in Germany and the US, ﬂat taxes (linear income ta- xes) are discussed) • problems of non-linear taxes – income averaging (for example, intertemporally or between couples) comes an issue – the unit of taxation becomes important – as noted above, decreasing marginal rates (convex tax functions) provide incentives to pay random wages – taxing at the source is much more difﬁcult (if there is more than one source of income) – administrative costs – for these reasons and the political feasiblility of highly non-linear or random taxation it might be reasonable to focus on linear taxes • optimal linear income tax – compared to other issues of the subject, this problem is a fairly simple one – all households receive a lump-sum payment and pay a marginal rate T on all income z = + (1 − T )z (87) – for a continuum of households with the distribution F( ) along their produc- 49 Lion Hirth: Taxation tivity, the optimization problem is as follows: L= W( ( (1 − T ), ))dF( ) + λ T L( ) dF( ) − − R (88) • three general results – for R = 0, the optimal tax entails > 0, which implies that T > 0: the dead weight loss due to marginal taxation is overcompensated by the welfare gain due to income redistribution (obviously, the ﬁrst result holds also for R ≤ 0) – if R is very large, becomes negative (and is R becomes very large, so that an increase of T decreases revenues, has to generate all additional income needs) – the optimal income tax can be written in a remarkably simple formula: T co (W λ +T L , Y) =− (89) 1−T YεH dF L ∗ W / λ is the net social marginal value of income: marginal utility of income relative to marginal value of government revenues multiplied with the marginal welfare of utility ∗ L is the change of labor supply due to a change in lump-sum payments: that is, how labor supply reacts to a pure income effect (sign is not de- termined in general) ∗ εL is the compensated elasticity of labor supply ∗ the covariance can be seen as a marginal measure of inequality ∗ thus, the marginal tax rate should be higher for a larger measure of marginal inequality and for a smaller weighted average of compensated elasticity of labor supply 6 Additional Generalizations 6.1 Home Production • home production is modeled here as a ﬁnal consumption c good that is produced with labor and a commodity (where labor is either used for market production to buy the commodity or for home production) • productivity in home production h differs from market productivity , but is proportionally related: h = k • the trick in the model is that the home production function (Cobb-Douglass) is set in such a way that besides this relationship (h = h( )) there is no way to interfere from labor supply decision to home productivity (again: this is an artifact of the production function used) α α α c=h (1 − L)1−α = h L (1 − L)1−α (90) L∗ = α (91) • for a certain range of α, this implies redistribution towards the high productivity individuals • intuition ??? • Stiglitz argues that since the is a much stronger social agreement on redistribu- tion than what utilitarian ethic implies, the utilitarian approach is a questionable guide to policy Optimal Income Taxation 50 6.2 Tax shifting • much of the traditional tax theory (and part II of this scriptum) has dealt with tax incidence, that is tax shifting • this has been ignored completely in the analysis of optimal taxation so far • implicitly it was assumed that before-tax prices and wages are not effected by taxation • if we allow for tax shifting (that is, endogenize the before-tax incomes), there is a new channel for redistribution! • we can not only use taxes to change disposable incomes by transferring income from one to another, but by changing the market outcome in the ﬁrst place • it results that if the labor households of different productivity aren’t perfect sub- stitutes (which seems to be a plausable assumption), the marginal tax rate on the most productive household should be negative • the smaller the elasticity of substitution, the higher the marginal tax rate for the low productivity household (although it is always positive) • that is, the less substitutable different types of labor are, the more the govern- ment relies on general equilibrium effects for redistribution (in the extreme case of perfect substitutes, general equilibrium effects cannot be used and we’re back in the standard analysis) 7 Commodity Taxation in a Atkinson-Stiglitz framework • the theory of the second best implies that it might be beneﬁcial to introduce a second distortion (by taxing commodities differently) to counterbalance the ﬁrst distortion (due to income taxation / mimicking) • to make the analysis interesting, we have to introduce a second good and allow for differentiated taxation • the budget constraint reads R = z1 + z2 − 1 − 2 − 1 − 2 1 1 2 2 • that means, the tax structure is not restricted at all: everything is allowed for – “shopping center entrance fee”-taxes (zero marginal commodity taxes that are positive in absolute value) – differentiated taxes on the same good for different households – commodity taxes that depend on the consumption quantities of this good or other goods, too – mimicking applies to commodity taxes, too: if household 2 behaves like hou- 1 1 sehold 1, she also pays t1 and t2 • the government maximizes welfare: m L =W( 1, 2 ) + λ1 ( 2 2 − ) + λ2 (R − R) j z1 z2 L( , z j , λ ) =W 1 1 1 , 1, 2 , 2 2 1 , 2 2 1 2 z2 z1 + λ1 2 2 2 1 , 22 − m 2 1 1 , 1, 2 2 2 + λ2 z 1 + z 2 − 1 − 2 − 1 − 2 − R 1 1 2 2 (92) • the six FOCs (plus the constraints) resemble the condition of the one good ana- 51 Lion Hirth: Taxation lysis in (82): ∂ m ∂L ∂W ∂ 1 2 1 = 1 − λ1 1 − λ2 = 0 (93a) ∂ j ∂ 1 ∂ j ∂ j ∂ m ∂L ∂W ∂ 2 2 2 = 2 + λ1 2 − λ2 = 0 (93b) ∂ j ∂ 2 ∂ j ∂ j ∂ m ∂L ∂W ∂ 1 1 2 1 = − λ1 + λ2 = 0 (93c) ∂z 1 ∂ 1 ∂L1 1 ∂L1 2 ∂ m ∂L ∂W ∂ 2 1 2 1 = + λ1 + λ2 = 0 (93d) ∂z 2 ∂ 2 ∂L2 2 ∂L2 2 • again, increasing j increases welfare while increasing zj reduces it due to higher labor supply; increasing 1 or z 1 tightens the SSC while increasing 2 or z 2 rela- xes it; and increasing j tightens the BC while the opposite is true for increasing zj • combining the three optimality conditions for household 2 results in ∂ 2 ∂ 2 2 = 2 := 2 (94) ∂ 1 ∂ 2 2L 2 − = (95) 2 1 – the ﬁrst formulation states that marginal utilities for all goods have to be the same, as derived in the model with one good in (3.4) – this implies that goods for household 2 can only be taxed at the same mar- ginal rate (which might or might not be zero) – the second condition implies that income cannot be taxed at the margin (recall (77d)) • to interpret the conditions for household 1, we have to assume identical and separable utility functions: (h ( j ), z / ) ∂ m ∂h1 ∂h1 1 W 1 λ1 2 + λ2 + λ2 ∂ 1 1 1 1 ∂ 1 ∂ 1 1 = 1 = 1 = 1 m = 1 (96) ∂h1 ∂ ∂h1 2 W 1 2 λ1 2 + λ2 + λ2 ∂ 1 2 ∂ 1 ∂ 1 2 2 – the equation has to be read from inner to outer equality signs – the left and the right expression are only equal if the MRS is not affected by taxation, which implies that both commodities have to be taxed at the same marginal rate for household 1 • results and interpretation – there is no need for commodity taxation if utility functions are iden- tical and separable – this is because commodity taxes cannot help making the SSC less binding – the results derived for optimal income taxation hold: T1 > 0 and T2 = 0 – if the assumption of identical and separable utility function is dropped, howe- ver, the good valued highly by household 1 should be taxed higher when consumed by household 1 Optimal Income Taxation 52 – This makes mimicking less attractive since it makes the labor-consumption bundle of 1 less attractive for 2. • note that the result for Ramsey-type commodity taxation analysis are completely different from the results of this (Atkinson-Stiglitz) type of analysis • this comes only from the fact that Ramsey excluded all lump-sum taxes while here only household-speciﬁc lump-sum taxes are excluded 53 Lion Hirth: Taxation V Tax Evasion • Tax evasion, tax avoidance and change of relative prices – tax evasion is often deﬁned as “violations of the law” or “illegal and intentio- nal actions to reduce tax obligations” while tax avoidance is changing one’s behavior to reduce taxes within the legal framework – but the distinction - and the separation from normal consumption and input adjustments due to changing relative prices - is not that clear-cut – the boarder between legal and illegal in taxes is often a bargaining process and determined by courts – avoidance deﬁned as “behavior that reduces taxes while leaving the con- sumption basket unchanged” (as some authors do) runs into problems if the income effect of tax avoidance causes a change in relative quantities con- sumed – here, tax evasion is deﬁned as being risky: not being observed reduces tax payments while being caught means paying more taxes than initially obliga- ted • costs of tax evasion – in the literature tax evasion is generally seen as a bad (welfare reducing) action – direct costs are caused by the reduction of provision of public goods that reduces welfare of all consumers (while the evader’s utility increases) – (obviously, if the provision of public goods was excessive, evasion might increase welfare) – both sides - tax authorities and evaders - spend real resources to detect evasion and prevent evasion, respectively – tax authorities have to adjust the tax system to prevent evasion; this means another constraint is added that moves the system further away from Pareto efﬁciency and welfare optimum – tax evasion causes by deﬁnition uncertainty, which reduces welfare in a world of risk-averse households – further negative effects might arise, say eroding belief in the legal system or negative consequences on the political culture or on the voluntary provision of local public goods (of course, these are not further investigated here) • measurement and empirical estimations – inherently hard to measure due to incentives not to declare evasion openly – in the US, the tax authorities estimate that 16% of the legal tax burden is evaded – from this, only 16% is detected and recovered – 80% of evasion comes through underdeclaration of incomes, the rest through overdeclaration of expenditures – only 1% of taxes on wages and salaries are evaded, but 43% of business incomes • legal situation in Germany – there is a distinction between “Steuerstraftaten” and “Steuerordnungswid- rigkeiten” – prison sentences are hardly ever used (although there stands up to 10 years for professionally committed tax evasion): the highest penalty was probably 3.5 years for Stefﬁ Graf’s dad Peter – the maximal penalty is 1.8 million euros (360 “Tagessätze” times 5000 eu- ros) Tax Evasion 54 – ﬁnes are set by judges, which means that there is no clear-cut rule for the ﬁne (in contrast to the model employed below) 1 Basic model • tax evasion is modeled as a rational gamble of risk-averse households (tax eva- sion of ﬁrm is modeled only slightly different since they are often assumed to be risk-neutral) • household pay a linear tax of the form T(y) = (y−t0 )t, where to is a tax exemption and t is a constant tax rate (linear tax) • households decide on the fraction α of their income that they declare • if not caught, they receive income y e = y − T(αy) = y − (yα − to )t • they are detected with probability z (ﬁrst assumed as exogenous, later endoge- nized) • if detected, they have to pay the full amount of taxes plus a ﬁne (Fy(1 − α)t β ) • the ﬁne depends both in the income not declared (for β = 0) and the amount of taxes evaded (β = 1) (but is not a linear combination) • if being detected, income is y d = y − T(y) − Fy(1 − α)t β • risk aversion is modeled by assuming strictly concave utility functions • the household maximizes expected utility (von Neumann-Morgenstern are assu- med) with respect to α EU = (1 − z) (y e ) + z (y d ) EU = (1 − z) (y − (yα − to )t) + z y − (y − to )t − Fy(1 − α)t β (97) ∂EU = −(1 − z) (y e ) + z (y d )Ft β−1 = 0 (98) ∂α • the cost of declaring more taxes is a higher tax payments if not detected (ﬁrst term), the gain of declaring more is a lower fee in the case of detection (second term) • corner solutions – α is set to unity (this implies e = y d ), if (zFt β−1 + z − 1) ≥ 0 – this implies that a high detection probability z or a high ﬁne F lead to decla- ration of the full income (which is pretty intuitive) – other corner solution (α = 0) cannot be derived nicely • the second derivative is always negative for a linear tax • totally differentiating (97) gives us the indifference curve in the y d − y e -space dy d (1 − z) (dy e ) =− <0 (99) dy e z (dy d ) • the indifference curve is always decreasing (and strictly convex if households are strictly risk averse) • graphical analysis – this indifference curve is convex if households are strictly risk averse – the feasible combinations of y e and y d are a line in the y d − y e -space (“fea- sability line”) – for t0 = 0, the line has the slope −Ft β−1 – it is a line (constant slope) because both y e and y d are linear in α – in the case of (risk-neutral) ﬁrms the “indifference curve” is linear, so that the feasibility constraint has to be modeled concavely (by a convex ﬁne function) 55 Lion Hirth: Taxation 2 Comparative Statics • in this section, we analyze how α ∗ changes when F, z, y, t and to changes • that is, we are interested how the voluntarily declared fraction of the income varies for changes in parameters • analytical procedure – we want to solve for α ∗ and differentiate the term with repect to F, z, y, t and to – but since we haven’t speciﬁed the utility functions, we cannot solve for α ∗ (this is a pattern that shows up over and over again when working with unspeciﬁed function - see section (1), but there we had a system of FOCs) – instead, we differentiate ∂EU/ ∂α – for a change in a parameter, the EU-function changes – we ask: what is the slope of the new EU-function at the point α ∗ ? – if it is positive, the new α ∗ has to lie to the right, that is, α ∗ raises (the opposite is true if the slope is negative) – we always assume an interior solution (0 < α < 1) • central ﬁndings – pretty obviously, α ∗ rises for a higher F and z – the effect of a rising y, t or t0 depends on the risk aversion and is often ambigeous 2.1 Fine F • differentiating (98) with respect to F results in: ∂(∂EU/ ∂α ∗ ) = zt β (y d ) − (y d )Fy(1 − α)t β > 0 (100) ∂F • a higher ﬁne increases the gains of honesty because of two effects – the income loss is higher if detected (ﬁrst term in brackets) – the marginal gain from more honesty rises because income is reduced more if detected (second term in brackets) - this is because strict risk aversion has been assumed – the costs of honesty (ﬁrst term in (98) are unaffected, since higher ﬁnes have no effect if being not detected • graphically, an increase of F is increasing the slope of the feasability line • differentiating (97) with respect to F shows that the level of expected utility is reduced unambiguously - this makes perfectly sense since there is no way how a higher ﬁne could make the household better off • empirical testing is hard since here incentive effects of ﬁnes matter - and most people don’t know how big the ﬁnes are • Result: an increase in F rises both α ∗ (the share) and yα ∗ (the amount) of taxes declared 2.2 Detection Probability z • differentiating (98) with respect to z results in: ∂(∂EU/ ∂α ∗ ) = (y e ) + (y d )Ft β−1 > 0 (101) ∂z Tax Evasion 56 • a higher detection probability increases the gains of honesty and reduces the costs of honesty – the gain (paying a smaller ﬁne) is more probable – the cost (higher tax payment) is less probable • graphically, an increase of z is making the indifference curve ﬂatter, as differen- tiating (99) makes clear • differentiating (97) with respect to z shows that the level of expected utility is reduced unambiguously - there is no way how more controls could make the household better off • empirical testing is hard since here incentive effects of ﬁnes matter - and most people don’t know how big the probability of controls and detection are • further, people don’t even know on what z depends (high incomes, funny decla- rations, randomness, ...) - and tax authorities don’t reveal it to not support tax evasion • Result: an increase in z rises both α ∗ (the share) and yα ∗ (the amount) of taxes declared 2.3 Income y • recall that y was exogenously given • the effect of a change in y on α ∗ is a lot less obvious than the effect of changing F or z • differentiating (98) with respect to y results in an ambigeous result: ∂(∂EU/ ∂α ∗ ) = −(1 − z) (y e )(1 − αt) + z (y d )(1 − t − F(1 − α)t β )Ft β−1 (102) ∂y • we substitute for Ft β−1 according to (98) and extract (1 − z) (y e ) to get (y e ) (y d ) EUαy = −(1 − z) (y e ) − (1 − αt) − − (1 − t − F(1 − α)t β ) (y e ) (y d ) = −(1 − z) (y e ) r (y e )(1 − αt) − r (y d )(1 − t − F(1 − α)t β ) (103) • absolute risk aversion – r = − / is the Arrow-Pratt measure of absolute risk aversion – if r is constant or rising with income, α ∗ will rise with income – if r falls with income, the change of α ∗ is not determined • relative risk aversion – to get the relative measure of risk aversion rr = yr , we multiply with y and set t0 to zero (so that the terms in parenthesis collapse to y e and y d , respectively yEUαy = (1 − z) (y e )[rr (y e ) − rr (y d )] (104) – if rr = −y / is increasing with income, α ∗ will be rise with income – if rr is constant with income, α ∗ will be unchanged if income is changed – if rr is decreasing with income, α ∗ will be fall with income • Results – the effects of a rising income on the share of taxes declared depend on the change of risk aversion due to the income rise – for increasing risk aversion, the share will increase – for decreasing risk aversion, the share will decrease 57 Lion Hirth: Taxation – for constant risk aversion the result depends on how we measure risk aversion: constant absolute risk aversion implies a increasing share, con- stant relative risk aversion means a constant share – the results do make some sense: if risk aversion increases, households are less willing to gamble and thus declare a higher share voluntarily to tax authorities – the opposite is true if risk aversion decreases 2.4 Tax rate t • the linear tax system can be changed by changing the tax exemption to or by changing the marginal tax rate t • here, a change of t is analyzed, in the subsequent subsection t0 is changed • changing t and t0 simultaneously in such a way that R remains constant can be interpreted as a change of the progressivity of the tax system • both the effects of a change of t0 and even more the effects of a change of t depend how the ﬁne is deﬁned, that is, how big β is ∂(∂EU/ ∂α ∗ ) =(1 − z) (y e )(αy − to ) ∂t − zFt β−1 (y d )(y − t0 + y(1 − α)βFt β−1 + zFt β−2 (y d )(β − 1) (105) • income and substitution effect – there is an income and a substitution effect – income effect: rising the tax rate implies that income changes in both states of the world, and that means that marginal utilities change – substitution effect: for β < 1, the ﬁne payment is proportional to tβ < t, while the utility loss due to a higher declaration is proportional to t • further restrictions: no tax exemption and β = 1 – restricting β to 1 implies that the ﬁne is a function of taxes evaded only: Fy(1 − α)t – this also implies that there is no substitution effect, since both ﬁne and utility loss increase proportionally to t – mathematically, the double restriction is needed to relate to the FOC (98), so that we can derive the Arrow-Pratt measures of risk aversion – with t0 = 0 and β = 1, (105) simpliﬁes to EZαt = (1 − z) (y e )(αy) − zF (y d )y(1 + (1 − α)F) = (1 − z)y (y e ) (1 + (1 − α)F)r (y d ) − r (y e )α (106) • Results – for β = 1 and to = 0, a higher tax t will cause the share α ∗ to rise if absolute risk aversion is decreasing or constant – for β = 1 and to > 0, a higher tax t will cause the share α ∗ to rise if absolute risk aversion is constant (if it varies, nothing can be said) – if β = 0, the effect of a higher t on α ∗ is ambiguous 2.5 Tax exemption • a change in t0 is very similar to a change in income y Tax Evasion 58 • for increasing absolute risk aversion, the share α ∗ rise with increasing tax ex- emption t0 • for constant absolute risk aversion, it will will remain constant • for falling absolute risk aversion, it will wall with rising tax exemption 59 Lion Hirth: Taxation VI On this script • this script was written during the winter term 2007/08 and reﬂects the structure of the lecture during this term • it should be understood as a complement to Goerke’s lecture notes and the ad- ditional literature rather than a substitute: my idea was to compile a short sum- mary than can be used in class and to look up formulas and results quickly • I highly recommend to read the following additional literature – Stefan Homburg’s (2007) Allgemeine Steuerlehre should be read in advan- ced to get some intuition and empirical examples (it’s easy reading, and it’s in German) – Lipsey and Lancester’s (1956) short article in the Review of Economic Stu- dies, The General Theory of the Second Best should be read before starting with optimal taxation theory – Joe Stiglitz’ (1987) article in the Handbook of Public Economics should be read after working through section IV of the lecture since it is rather technical and has a broader scope; nevertheless it is incredibly rich and helpful – the books by Salanie (2003), Kotlikoff & Summers (1987), and Myles (1995) are pretty technical and didn’t help me too much • I tried to stick as closely as possible to the notation used in class, but sometimes I do deviate (with some justiﬁcation, I believe): – more often than not, household-speciﬁc items (goods, factors) are indexed with a superscript while ﬁrm-speciﬁc items are indexed with subscripts. I did this to avoid double subscripts as much as possible. – sometimes I skip indexes at all – sometimes I use 2 households (or ﬁrms) instead of n • the ordering and naming of sections and subsections is close to the structure of the lecture, but not identical • the script is written in TEX, the code is available on request at firstname.lastname@example.org • hyperref-features are included in the .pdf version, so you can jump to sections and equations by clicking on the numbers, and can use the tree to jump to sec- tions quickly • the script is by far not free from errors, ranging from typos and bad translations to layout problems and fundamental misunderstandings; if you ﬁnd errors, please write me an email! • the layout is optimized for printing out two pages on a sheet with odd pages on the left • This document is published under GFDL. That means you can do whatever you want with it (copy it, change it, distribute it, ...) as long as your work is released under the same open license again.