Docstoc

Taxation

Document Sample
Taxation Powered By Docstoc
					               Taxation
Based on Laszlo Goerke’s 2008 lecture in Tübingen



                  Lion Hirth
       Eberhard-Karls-Universität Tübingen
              lion.hirth@gmail.com
           TEXcode available on request
     this document is published under GFDL

                  3. April 2008
                                      INHALTSVERZEICHNIS                                                                                 2



Inhaltsverzeichnis
I   Foundations                                                                                                                          3
    1  What are taxes and what are they for? . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3
    2  German Tax revenues in comparison . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    3
    3  Pareto-Efficiency and Social Welfare Functions            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4
       3.1   Conditions for Pareto-Efficiency . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    4
       3.2   Market Outcome and Market failure . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
       3.3   Social Welfare Functions . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    7
    4  Welfare Effects of Taxation . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    8

II Tax Incidence                                                                                                                        10
   1   One Sector . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   10
       1.1   Specific vs. Ad Valorem Tax . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
       1.2   Invariance of legal Incidence . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
       1.3   Determinants of the incidence . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12
       1.4   Some Applications . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
   2   Market Power . . . . . . . . . . . . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
       2.1   Invariance of legal Incidence . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
       2.2   Determinants of the incidence . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
       2.3   Specific vs. Ad Valorem Tax . . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
   3   One-sector General Equilibrium . . . . . . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
       3.1   Taxes on Factor Returns . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
       3.2   Taxes on Output, Income, and Consumption                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   19
   4   General Equilibrium Model . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20

III Optimal Commodity Taxation                                                                                                          22
    1   Lump-Sum Taxes . . . . . . . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
    2   No Distortion means No Revenues . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
        2.1   Requirements for Pareto-Efficiency . . . . . . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23
        2.2   Constraints on Tax Rates . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
        2.3   Tax Revenue under Pareto-Efficiency . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
        2.4   Non-distortionary Tax Systems . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
    3   Theory of the Second Best . . . . . . . . . . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
    4   Homogeneous Households . . . . . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
        4.1   Ramsey’s Rule . . . . . . . . . . . . . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
        4.2   Reformulating the Ramsey Rule . . . . . . . . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
              a)      Income Elasticities of Demand . . . . .                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
              b)      Wage Elasticities of Hicksian Demand                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   31
        4.3   Special Cases: Additional Restrictions . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   32
              a)      Fixed Labor Supply . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   32
              b)      Homothetic Preferences . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   33
              c)      Zero Cross-Price Elasticities . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   33
    5   Heterogeneous Households . . . . . . . . . . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
        5.1   General Result . . . . . . . . . . . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
        5.2   Zero Cross-Price Elasticities . . . . . . . . . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   35
    6   The Production Efficiency Theorem . . . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   36
3                                       Lion Hirth: Taxation



IV Optimal Income Taxation                                                                                                                                       38
   1   Wages, Taxation, and Labor Supply . . . . . . . . . . . .                                             .   .   .   .   .   .   .   .   .   .   .   .   .   40
   2   Fixed Labor Supply . . . . . . . . . . . . . . . . . . . . . . .                                      .   .   .   .   .   .   .   .   .   .   .   .   .   41
   3   Variable Labor Supply . . . . . . . . . . . . . . . . . . . . .                                       .   .   .   .   .   .   .   .   .   .   .   .   .   41
       3.1    Model . . . . . . . . . . . . . . . . . . . . . . . . . . .                                    .   .   .   .   .   .   .   .   .   .   .   .   .   42
       3.2    Self-Selection Constraint . . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   42
       3.3    First Results . . . . . . . . . . . . . . . . . . . . . . .                                    .   .   .   .   .   .   .   .   .   .   .   .   .   44
       3.4    Optimization . . . . . . . . . . . . . . . . . . . . . . .                                     .   .   .   .   .   .   .   .   .   .   .   .   .   44
   4   Continuous Households . . . . . . . . . . . . . . . . . . . .                                         .   .   .   .   .   .   .   .   .   .   .   .   .   46
   5   Different Tax functions . . . . . . . . . . . . . . . . . . . . .                                     .   .   .   .   .   .   .   .   .   .   .   .   .   46
       5.1    Random Taxation . . . . . . . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   47
       5.2    Linear Tax . . . . . . . . . . . . . . . . . . . . . . . . .                                   .   .   .   .   .   .   .   .   .   .   .   .   .   47
   6   Additional Generalizations . . . . . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   48
       6.1    Home Production . . . . . . . . . . . . . . . . . . . .                                        .   .   .   .   .   .   .   .   .   .   .   .   .   48
       6.2    Tax shifting . . . . . . . . . . . . . . . . . . . . . . . .                                   .   .   .   .   .   .   .   .   .   .   .   .   .   49
   7   Commodity Taxation in a Atkinson-Stiglitz framework                                                   .   .   .   .   .   .   .   .   .   .   .   .   .   49

V Tax Evasion                                                                                                                                                    52
  1   Basic model . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   53
  2   Comparative Statics . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
      2.1   Fine F . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
      2.2   Detection Probability z          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
      2.3   Income y . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   55
      2.4   Tax rate t . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   56
      2.5   Tax exemption . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   56

VI On this script                                                                                                                                                58
                                      Foundations                                     4



I    Foundations
    • this part draws on Stefan Homburg’s “Allgemeine Steuerlehre” (2007), Christian
      Keuschnigg’s “Öffentliche Finanzen: Einnahmepolitik” (2005) as well as on Laszlo
      Goerke’s lecture


1    What are taxes and what are they for?
    • taxes are compulsory payments without a specific return service
    • classification
         – direct taxes versus indirect taxes: direct taxation directly taxes econo-
           mic performance and can take into account the personal characteristics of
           the subject taxes, indirect is taxing performance indirectly and can’t diffe-
           rentiate for personal characteristics
         – Subjektsteuern versus Objektsteuern
         – consumption taxes versus transaction taxes: consumption taxes tax
           value added, transaction taxes tax legal procedures and can cause casca-
           ding
    • reasons for taxation
         – allocation (goods and factors)
         – redistribution (between households, firms and to the state)
         – stabilization → today covered by macroeconomics
    • reasons for taxation II
         – revenues (redistribute resources from private agents to the state)
         – changing behavior of agents (Lenkungszweck)
         – redistribution (between individuals and firms)


2    German Tax revenues in comparison
    • legal system
         – who can tax is put down in the constitution (“Finanzverfassung”, Art. 105-
           108 GG)
         – today all taxes (except local consumption taxes) are equal Germany-wide
         – for all those, the Bundesrat has to approve
         – the revenue from Körperschaftssteuer and income tax is shared equally bet-
           ween the Bund and the Länder (and a small part for the Gemeinden)
         – the distribution of the VAT revenues is variable and can be changed by a
           simple law
         – but the tax authorities are in hands of the Länder (who deliver the revenues
           to the Bund)
    • less than half of all state expenses (48% of GDP) in Germany are financed
      through taxes (22%), the rest through social security contributions, fees, and
      debt
    • “total tax revenue” (Abgaben) is taxes and social security contributions
    • between 1991 and 2000, total tax revenues remained remarkably stable in
      Germany (between 36% and 37.2%), followed by a strong downward trend in the
      last years (down to 34.7% in 2005)
    • that means they are as low as in the 1970s
    • this is way higher than in Japan and the US (26% each), but below EU15 and
      below OECD average (and below the UK, France, and Italy)
5                                   Lion Hirth: Taxation



    • looking only at tax revenues, Germany looks even more like a tax paradise:
      21% in 2005, at the level of the US, only Japan (17%) is lower, UK, France, and
      Italy stand around 29%
    • over the long run, tax revenues peaked around 1980 at 25% of GDP and are
      today lower than any time during the last 40 years
    • VATs were implemented in most countries as late as the 1970s or 80s (Germany
      was one of the first in 1968 and the US still has no VAT system but a (state-wide)
      uniform consumption tax)
    • indirect taxes (on goods and services) generate slightly less revenues in the
      OECD than direct taxes (on income and profits): 11% and 12% of GDP, respec-
      tively
    • Germany lies slightly below average in both categories
    • Japan and even more the US (and also Switzerland) have very little income tax
      revenues: around 5% (7%)
    • labor is heavily taxed in Germany
         – an average production worker receives only 59% (single person) to 80%
           (married with two children) of his gross pay as disposable income
         – this is less than in any other OECD country except Belgium
         – the sum income tax, overall social security contributions and Cash-Benefits
           over labor costs are way above average in Germany and reach between 35%
           and 50% of labor costs
         – this is true no matter if married or not, with or without children, for high or
           low incomes
    • Germany generated 480 billion euros of taxes in 2006, from which was:
         – 25% Lohnsteuer (Bund / Länder / Gemeinden)
         – 5% Körperschaftssteuer (Bund / Länder)
         – 8% Gewerbesteuer (Gemeinden / Bund / Länder)
         – 30% Umsatzsteuer (Bund / Länder)
         – 8% Engergiesteuern (Bund)
         – 3% Tabaksteuer (Bund)
         – (all the rest was about 21%)
    • medium earner in Germany pay top tax rates: the highest income tax rate in
      Germany has to be paid for incomes of 55.000 euro, in the US for incomes of
      270.000 euro


3     Pareto-Efficiency and Social Welfare Functions
    • to judge about the efficiency effects of taxation, usually it’s effects on the mar-
      ginal conditions for a Pareto-Efficient allocation are analyzed
    • in part IV, where income taxation implies the government trading off efficiency
      and equity, sometimes a social welfare function is maximized


3.1    Conditions for Pareto-Efficiency

    • assumptions
        – two households with utility functions = ( 1 , 2 , K , L )
        – utility increases with  and decreases with K, L; the MRS is decreasing
        – two production processes 1 = ƒ1 (L , K )
        – there is no waste, so that K1 + K2 = K 1 + K 2 = K, L1 + L2 = L1 + L2 = L and
           1+ 2 =
                                              Foundations                                         6



• maximization problem
   – 1 is maximized while holding 2 constant
   – to make the problem more tractable, some of the 7 constraints are substitu-
     ted in, so that we have 3 at the end

                    L=    1   + λ1 [      2   −    2 ] + λ2 [ƒ1    −    1 ] + λ3 [ƒ2   −   2]
                                1      1
                    L=    1     1
                                  ,    2
                                         , L1 , K   1

                                                      1                1
                         + λ1         2   (   1   −   1
                                                        ), ( 2    −    2
                                                                         ), L2 , K 2   −   2
                         + λ2 [ƒ1 (L1 , K1 ) − 1 ]
                         + λ3 [ƒ2 ((L − L1 ), (K− K1 )) −                2]                      (1)

 • the resulting 13 FOCs can be reformulated in six marginal conditions for
   Pareto-Efficient allocations
1. Marginal Rates of Substitution (MRS) are the same across households
     • requirement regarding the consumption decision of households (households
       / goods)
     • the MRS is the ratio of marginal utilities in consumption of good one and
       good two for a household
     • if the MRS are not the same across households, households could gain from
       trading goods with the other households
                                  ∂ 1/ ∂ 1
                                         1
                                             ∂ 2/ ∂ 2
                                                    1
                                         1
                                           =                                 (2a)
                                  ∂ 1/ ∂ 2   ∂ 2/ ∂ 2
                                                    2
2. Marginal Rates of Factor Substitution (MRFS) are the same across households
     • requirement regarding the factor supply decisions of households (house-
       holds / factors)
     • the MRFS is the ratio of marginal disutilities in providing factor L and K for a
       household
     • if the MRFS are not the same across households, households could gain from
       trading factors with the other household
                                  ∂ 1 / ∂L1    ∂ 2 / ∂L2
                                             =                                     (2b)
                                  ∂ 1 / ∂K 1   ∂ 2 / ∂K 2
3. Marginal Rates of Technical Substitution (MRTS) are the same across production
   processes
     • requirement regarding the input decisions of firms (production / factors)
     • the MRTS is the ratio of marginal productivities of the factors L and K in a
       production process
     • it the MRTS are not the same across firms, firms could gain from trading
       factors with the other firm
                                                  ∂ƒ1 / ∂L1       ∂ƒ2 / ∂L2
                                                              =                                 (2c)
                                                  ∂ƒ1 / ∂K1       ∂ƒ2 / ∂K2

    • this implies that the Marginal Rate of Transformation (MRT), which is the
      cross product of the MRTS, is the same across input factors
4. MRS has to equal MRT
    • requirement regarding the relative quantities of production and consumption
      of good one and two
    • if the MRS doesn’t equal the MRT, all could gain from producing and consu-
      ming more of one one good and less of the other
7                                         Lion Hirth: Taxation


                           ∂        1                  2
                               1/ ∂ 1         ∂   2/ ∂ 1         ∂ƒ2 / ∂L2         ∂ƒ2 / ∂K2
                                    1
                                          =            2
                                                             =                 =               (2d)
                           ∂   1/ ∂ 2         ∂   2/ ∂ 2         ∂ƒ1 / ∂L1         ∂ƒ1 / ∂K1

    5. MRFS has to equal MRTS
        • requirement regarding the relative quantities of employment and supply of
          factor L and K
        • if the MRFS doesn’t equal the MRTS, all could gain from employing and sup-
          plying more of one factor and less of the other
                           ∂   1 / ∂L1        ∂   2 / ∂L2        ∂ƒ1 / ∂L1         ∂ƒ2 / ∂L2
                               1 / ∂K 1
                                          =       2 / ∂K 2
                                                             =                 =               (2e)
                          ∂                   ∂                  ∂ƒ1 / ∂K1         ∂ƒ2 / ∂K2
    6. The MRS between factor supply and the consumption of a commodity
       for a household has to equal the marginal productivity of that factor in the
       production of that commodity
          • requirement regarding the overall level of consumption and production
          • if this requirement doesn’t hold, households could gain from consuming less
            and supplying less factors (or consuming more and supplying more factos)


                                                    ∂            1
                                                        1 / ∂L           ∂ƒ1
                                                               1
                                                                     =                         (2f)
                                                    ∂   1/ ∂ 1           ∂L1
                                                    ∂          2
                                                        2 / ∂L           ∂ƒ1
                                                               2
                                                                     =
                                                    ∂   2/ ∂ 1           ∂L1
                                                    ∂          1
                                                        1 / ∂L           ∂ƒ2
                                                               1
                                                                     =
                                                    ∂   1/ ∂ 2           ∂L2
                                                    ∂          2
                                                        2 / ∂L           ∂ƒ2
                                                               2
                                                                     =
                                                    ∂   2/ ∂ 2           ∂L2
                                                   ∂           1
                                                        1 / ∂K           ∂ƒ1
                                                               1
                                                                     =
                                                    ∂   1/ ∂ 1           ∂K1
                                                   ∂           2
                                                        2 / ∂K           ∂ƒ1
                                                               2
                                                                     =
                                                    ∂   2/ ∂ 1           ∂K1
                                                   ∂           1
                                                        1 / ∂K           ∂ƒ2
                                                               1
                                                                     =
                                                    ∂   1/ ∂ 2           ∂K2
                                                   ∂           2
                                                        2 / ∂K           ∂ƒ2
                                                             2
                                                                     =
                                                    ∂   2/ ∂ 2           ∂K2

         • obviously, the number of conditions depends on the assumptions about the
           model
            – models without production side only have the MRS and the MRFS condi-
              tions for PE
            – models with only one household cannot have these two, but all others
            – models with only one factor cannot have the MRTS equality and the
              MRTS=MRFS condition
            – models with cost functions assume implicitly the MRFS equality to be
              fulfilled (costs are minimized)
                                      Foundations                                      8



3.2   Market Outcome and Market failure

  • it is easy to show that in a perfectly working free economy, the market outcome
    fulfills these conditions
      1. the MRSs equals relative prices, and thus each other
      2. the MRFSs equal relative factor prices, and thus each other
      3. the MRTSs equal relative factor prices, and thus each other
      4. the MRTs equal relative prices, and thus the MRS
      5. MRFS equal the MRTS since both equal the factor prices
      6. the last condition is fulfilled since the ratios equal price/factor price ratios
  • the intuition is straightforward: there are profit opportunities as long as the
    conditions are not fulfilled
      1. if the MRS doesn’t equal relative prices, a household could gain from chan-
          ging their consumption bundle
      2. if the MRFS doesn’t equal the relative factor prices, a household could gain
          from changing their factor supply combination
      3. if the MRTS doesn’t equal the relative factor prices, a firm could gain from
          substituting one factor for another
      4. if the MRT doesn’t equal the relative prices, a firm could gain by producing
          more (or less)
      5. MRFS=MRTS since both equal the relative factor prices
      6. ???
  • in other words: the market approaches the PE allocation through Pareto improve-
    ments - if there is market failures there is no way how to get to a Pareto-Efficient
    Allocation without making someone worse off
  • this result can be derived analytically very easily (so easyly that it is not shown
    here)
  • but this result is highly dependent on perfectly working markets
  • market imperfections, that cause the conditions the be not fulfilled include
        – perfectly mobile factors, no transaction costs, complete information
        – market power (heterogeneous goods, market entrance barriers)
        – economies of scale
        – external effects
        – public goods
        – asymmetric information
  • this suggests that in real world hardly ever the conditions are fulfilled in a free
    market outcome
  • as discussed in section (3) the Theory of the Second Best states that the vio-
    lation of any of the conditions for Pareto efficiency makes all the other conditions
    not desirable anymore


3.3   Social Welfare Functions

  • social welfare functions allow judgments about a significantly broader set of al-
    locations than the Pareto criteria does
  • for example, they allow to value redistribution (to equalize disposable incomes)
    - that is, only with social welfare functions, a trade-off between efficiency and
    equity can be made
  • the cost, however, is that significantly more information is needed
       – about individual preferences: utility functions have to be cardinal to allow for
         interpersonal utility comparison
9                                    Lion Hirth: Taxation



         – about the “the preferences of society” (for example the weights of individual
           utilities)
         – just as a sidemark: any political scientist would laugh about the idea of exo-
           genous “preferences of societies”
    • the most important specific social welfare functions are
         – special utilitarian welfare function: W SU =
         – utilitarian welfare function: W U = g
         – Nash social welfare function: W N = ( − 0 )
         – Rawlsian social welfare function: W R = m n( 1 , ..., n )
    • all these functions belong to the class of Berson-Samuelson welfare functions,
      where welfare depends positively on the welfare of the individuals:
      W BS = W( 1 , ..., n ), with ∂W BS / ∂ > 0
    • for identical quasi-linear utility functions, the sum of producer surplus and con-
      sumer surplus (the space between inverse demand function and marginal costs)
      is equivalent to the special utilitarian welfare function


4    Welfare Effects of Taxation
    • welfare-reducing substitution effects
         – for any positive revenue need R > 0 there is obviously an income effect:
           utility (and profits) is reduced because households can consume less
         – but substitution effects will reduce utility further (by definition, since we start
           at an optimum) - this is the dead weight loss of taxation, also referred to
           as the excess burden of taxation or “Zusatzlast” in German
         – agents try to avoid taxation by changing their behavior and thus reduce the
           tax bases, the government has to rise the tax rate, and agents change their
           behavior further
         – what looks sensible from an individual agent’s point of view (changing one’s
           behavior to avoid taxation) is welfare reducing in a social perspective
         – the simplest and most drastic example is a commodity tax that is high
           enough to kill all demand: there are no tax revenues, but all rents (consumer
           and producer surplus) are lost
    • significance of the welfare effects
         – the welfare effects of taxation are at the center of any economic analy-
           sis of taxes
         – but note that its mere existence seem to be virtually unknown to must
           policy makers and the wider public: public debate often center on the
           question of how big the tax revenues should be, and not how they are gene-
           rated most efficiently
         – the reason is obvious: while tax payments are visible, the welfare effects are
           invisible
    • there are no substitution effects of lump-sum taxes (but, as discussed in section
      (1), lump-sum taxes might not exist and surely are an option in today’s demo-
      cratic societies)
    • how to measure the welfare reduction
         – sum up utilities (read the scale on a social welfare function)
         – compare equivalent variations (“how much income reduction would ma-
           ke the households indifferent between a tax and this reduction?”)
         – ad the sum of consumer and producer surplus (and tax revenues), that
           is, estimate the size of the Harberger triangle
         – these triangles were actually discovered by Arnold Harberger in a discussion
                                   Foundations                                     10



       of taxes (1964); today they are used in a wide range of economic analysis,
       perhaps most famously in trade theory
• all these measurements have problems
     – summing up utilities has of kind of valuations and information problems
     – equivalent incomes as well
     – the surplus calculation ignores any general equilibrium effects (side effects
       on other markets) and implicitly assumes that there is no income effect
• for linear demand and supply curves ( d = − b(q + τ and s = cq), the reve-
  nue increase under-linearly (and even decrease after a certain point), while the
  excess burden increases quadratically:

                                           τ − bτ 2
                                    R=c                                            (3)
                                            b+c
                                             bcτ 2
                                     W=−                                           (4)
                                           2(b + c)

• under the assumptions of quasi-linear utility functions, it will be shown in section
  (1.3) that the reduction in welfare due to the introduction of a small spe-
  cific tax τ as proxied by the reduction in the sum of CS and PS is the Harberger
  triangle:
                       dτd        εd εs     dτ        εd εs
                   W=        =           dτ    =             (dτ)2 < 0             (5)
                         2     p εs − εd    2     2p εs − εd
• that implies that the welfare costs rise with the square of the tax rate
• the smaller the price elasticities, the smaller the welfare reduction
• it can be shown easily that the welfare costs due to the increase of an existing
  tax rise only linear with the tax rate (but are higher, the higher the initial rate
  was):
                                  W = d (τ + dτ/ 2) < 0                           (6)
• using equivalent variations a widely cited study from 1985 estimates empirically
  for the US that the excess burden is in the range of 33% (that means, each dollar
  in tax revenues causes 1,33 dollar in utility loss)
11                                   Lion Hirth: Taxation



II     Tax Incidence
     • legal incidence doesn’t equal economic incidence
         – auf Deutsch: formelle Steuerlast (Zahllast) ist nicht gleich materielle Steuer-
           last (Steuerinzidenz)
         – welfare effects due to dead weight loss imply that the economic incidence
           always has to be bigger than the legal incidence, this has been discussed in
           section (4)
         – tax shifting (Überwälzung) causes the economic incidence to be borne by
           other agents than the legal incidence; this is the topic of part (II)
     • levels of analysis
         – specific tax incidence (looking at one single tax)
         – differential tax incidence (looking at two (or more) taxes; assume no change
           in overall tax burden → analysis of differential effects)
         – budgetary tax incidence (takeing expenses of state into account)
     • Nahwirkung vs. Fernwirkung (final economic incidence)
         – taxes can be shifted forward (e.g., by increasing good prices), backward (by
           decreasing factor income) or across (by increasing the prices of other goods)
         – every tax has consequences on every price and the behavior of all agents in
           the economy → pretty complex modeling with a fully-fledged general equili-
           brium model
         – normally, the analysis covers only the market mainly affected (Nahwirkung)
           by using a partial equilibrium model or a one-sector GE model
     • what questions do we ask?
         – are ad valorem and specific taxes equivalent? (equivalence of taxes)
         – does it matter who is legally taxed? (invariance of incidence)
         – who bears the tax? (tax incidence)
             ∗ how do quantities and prices react?
             ∗ what are the determinants of that reactions?
             ∗ how does the distribution of welfare change?
     • stages of modelling
         – one good, no factors: buyer vs. seller (partial equilibrium model)
         – one good, no factors, market power (partial equilibrium model of imperfect
           competition)
         – one good, two factors of production: capitalists vs. workers (one sector GE)
         – two goods, two factors (general equilibrium (GE))


1     One Sector
     • model
         – one homogenous good with quantity (one sector)
         – no factors of production (no production, only trade)
         – perfect competition: all agents are price takers
         – we look at: buyer vs. seller
         – supply price q (the price the seller receives) differs from demand price p (the
           price the buyer pays)
         – a specific tax τ and ad valorem tax t is allowed for, so that p = (q + τ)/ (1 + t)
         – supply always has to equal demand: s (q) = d (p)
     • central findings
         – a tax on a good has three effects: it reduces the supply price, increa-
           ses the demand price, and reduces the quantity trades
                                          Tax Incidence                              12



       – specific tax and ad valorem tax are equivalent
       – “invariance of incidence: legal incidence doesn’t matter - taxation of in-
         come, property, and expenditure are equivalent
       – shifting and economic incidence depends crucially on price elasticities of
         agents: the less elastic side of a market bears more
       – the more elastic the agents, the higher the welfare loss


1.1   Specific vs. Ad Valorem Tax

  • specific tax τ and ad valorem tax t
  • at the presence of both taxes, defining them as paid by the seller
       – supply price q = p(1 − t) − τ
       – demand price p = (q + τ)/ (1 + t)
  • specific taxes are defined in terms of units of a good
       – problems with product definition (incentive to change good to evade taxati-
         on) and increase quality
       – in the model we assume the quality to be fixed (and since there is only one
         good we don’t have problems with definitions)
  • argument
       – concern: multiplier effect of ad valorem tax: a supplier price increase by one
         unit increases the demand price by (1 + t)
       – in turn, a price decease of the demand price is partially paid by the (falling)
         tax
       – but in a competitive market, the supply price (as the demand price) is given
         ’ a price increase isn’t possible in the first place
       – firm receives supply price q = p(1 − t) or q = p − τ, but it can never influence
         p, and hence neither q
       – that means, specific and ad valorem tax are equivalent if they introduce the
         same tax gap
       – Results
       – if specific and ad valorem tax impose the same difference between
         supply and demand price, they have the same effect on behavior,
         income, and utility of economic agents and on the state’s income
       – this result is not very robust and will not longer hold when market power is
         introduced in section (2)
  • implications
       – for policy: the decision what tax is used can be based on other criteria (e.g.
         the problem of quality changes)
       – for analytical purposes: what type of tax we use (t or τ) is just a matter of
         convenience


1.2   Invariance of legal Incidence

  • formal analysis
       – assume only specific tax τ
       – either the tax is collected from the buyer: pc = qc + τ
       – or it is collected from the seller: qs = ps − τ

                          c =   d (pc ) =    d (pc + τ) =   s (qc ) =   s (pc − τ)   (7)
                          s =   d (ps ) =    d (ps + τ) =   s (qs ) =   s (ps − τ)   (8)
13                                               Lion Hirth: Taxation



            – if both taxes result in the same quantity traded ( c = s ), the legal incidence
              doesn’t matter
            – using the definitions pc = qc + τ and qs = ps − τ shows that taxation of
              any of the two sides with the same rate will introduce the same tax gap
              between supply and demand side, which will cause the same fall in quantity
              exchanged
            – same fall in quantities implies that prices, incomes and utility are effected
              identically by both taxes (qc = qs , pc = ps )
     •   graphical illustration
            – in a graph of the market equilibrium, τ means a downward shift of the de-
              mand curve or a upward shift of the supply curve
            – shifting S up by τ and shifting D down by τ results in identical equilibrium
              for x
            – in this case, prices, incomes and utility have to be the same
     •   intuition
            – what matters, are the prices that suppliers and buyers face: q and p. Who
              transfers the tax - and if buyers notice they’re paying a tax - doesn’t matter.
            – put more simply, it doesn’t matter if you are buying a good for 100 Euros
              and putting 20 Euros in a box for the tax authorities, or of you’re paying 120
              Euros and the seller puts 20 Euros in the box
     •   Results
            – the legal incidence doesn’t matter for economic incidence
            – nor does it matter if agents - or one market side - doesn’t know the good is
              taxes
            – the result is robust and doesn’t depend on
                 ∗ competitiveness of market
                 ∗ form of supply and demand curves
                 ∗ only in cases of price bargaining it might matter
     •   implications
            – for policy: a big part of the political debate about taxes (employer’s contri-
              bution to social security vs. employee’s contribution) is pretty senseless
            – for analytical purposes: how we define taxes (who side it pays) is just a
              matter of convenience

1.3      Determinants of the incidence

     • assume a newly introduced specific tax τ is imposed on buyers (p = q + τ), so
       that d (q + τ) = s (q)
     • we look at the changes of prices and quantities
     • total derivative of market equilibrium has to equal zero
         – market has to be in equilibrium with and without a tax ( s = d )
         – that implies that the tax has to change s and d by the same amount, which
            implies that the total derivative pf s − d has to be zero
                          d        s
                      d       =d                                                                                     (9)
                                         ∂   d     ∂(q + τ)                 ∂   d    ∂(q + τ)          ∂   s
                                                                dq +                            dτ =           dq   (10)
                                       ∂(q + τ)          ∂q               ∂(q + τ)     ∂τ              ∂q
                                        d            d           s
                                            dq +         dτ =        dq                                             (11)

     • using the equalities S = D (equilibrium) and q = p (introduction of the tax) and
       the definition εd = d q/ D we can obtain the marginal changes in prices and
                                                 Tax Incidence                                                    14



  quantities

                        dq              d                       d (q/   d)                  εd
                              =     s        d
                                                 =     s (q/    d) −    d (q/   d)
                                                                                      =                          (12)
                        dτ              −                                                 εs − εd
                        dp         dq                 εs
                              = +1= s                                                                            (13)
                        dτ   dτ      ε − εd
                        d    d d   d
                                        εs        εs εd
                           =     =           =                                                                   (14)
                        dτ   dτ      εs − εd   p εs − εd
                             d s       εs         εs εd
                           =     = s s       =
                             dτ     ε − εd     q εs − εd

• note that
    – this result had been already mentioned in section (4)
    – εs is the percent change of supply for a 1 percent change in supply price,
       hence it is assumed to be positive
    – εd is the percent change of demand for a 1 percent change in demand price,
       hence it is assumed to be negative
    – this analysis holds fort the introduction of a marginal tax only, not an incre-
       ase, since in this case p = q
    – the difference of the changes of p and q has to equal the change in the tax
       rate: dp − dq = dτ
• for a ad valorem tax (p = q(1 + t)) the results are slightly different
    – (10) becomes slightly more different and the results change

                    ∂   d         ∂(q(1 + t))                   ∂   d        ∂(q(1 + t))            ∂   s
                                                  dq +                                      dt =            dq   (15)
               ∂(q(1 + t))              ∂q                 ∂(q(1 + t))           ∂t                 ∂q
                d                       d              s
                    (1 + t)dq +             qdt =          dq
                d             d              s
                    dq +          qdt =          dq                                                              (16)
               dq             εd
                    =q                                                                                           (17)
               dt           εs − εd
               dp              εs
                    =q                                                                                           (18)
               dt           εs − εd
               d             εs εd
                    =                                                                                            (19)
               dt           εs − εd
    – the result is easy to understand since the tax wedge introduced by a change
      dt is not dt, but qdt
    – the qualitative results are unchanged
• Results
    – the introduction of a tax weakly reduces supply price and weakly increases
      demand price
    – that means, overshifting cannot occur
    – if demand is inelastic, the supply price won’t change and the demand price
      changes by the full amount of the tax: the buyer bears the full tax
    – the same is true if supply is perfectly elastic
    – if supply is inelastic, the demand price won’t change and the supply price
      changes by the full amount of the tax: the seller bears the full tax
    – the same is true if demand is perfectly elastic
15                                   Lion Hirth: Taxation



         – the more elastic side of the markets bears less
         – intuition: the inelastic side cannot avoid being taxed
         – the higher the absolute price elasticities of demand and supply, the bigger
           the change in quantities trades
         – the more elastic the agents, the higher the welfare loss
     • determinants of elasticity
         – price elasticities are not exogenous, but depend on the market struc-
           ture, the definition of the tax base, market imperfection, and other factors
         – note that in a perfectly competitive market, supply is perfectly elastic
         – the intuition is clear: firms (sellers) make no profits and since they cannot
           bear the tax, the buyers bear the entire tax
         – the definition of the tax base also matters: in general, the narrower the base,
           the more elastic supply and demand
         – elasticities vary greatly when different time horizons are looked at: in gene-
           ral, in the long run agents behave more elastic
         – often, in the short run the demand side is more elastic and in the long run
           the supply side: thus, in the short run firms bear a relative large share of
           most commodity taxes while in the long run consumers bear most
         – if the market is not in equilibrium (e.g. due to administered prices), the short
           market side bears all of the tax (and the gap between supply and demand
           narrows)
     • we’ve looked at price changes, but changes in rents (CS, PS) is a more precise
       measure of welfare effects (this analysis is covered in the script with a graphical
       analysis only)


1.4    Some Applications

     • this subsection draws on Homburg (2007) and gives some illustrative examples
     • coffee tax
          – since there is no coffee production in Germany, a coffee tax is equivalent to
            a tariff
          – if Germany is a small country and the world market price unaffected by the
            tax, the supply is perfectly elastic and consumers will bear all of the tax
          – this is not the case if the coffee market is oligopolistic and suppliers diffe-
            rentiate the price
          – if all consuming countries world wide introduce a tax, and supply is inelastic
            in the short run (since coffee plants involve large sunk costs and depreciate
            slowly), suppliers will bear the lion’s share of the tax
     • wine tax
          – wine is the only alcoholic beverage that is not taxed in Germany (besides
            VAT)
          – white and red wine are supposedly good substitutes
          – in the case of a tax on red wine, demand reacts highly elastic (by switching
            to white wine) and producers bear most of the tax
          – this example shows that the tax base of a commodity tax matters
          – in general, the narrower the tax base, the more elastic are both demand and
            supply
     • land tax
          – land is perhaps the most inelastic good at all
          – the unexpected introduction (or increase) of a land tax is borne entirely by
            the land owner, since the land value decreases by the present value of future
                                       Tax Incidence                                    16



          tax payments
    • note that the incidence of subsidies is driven by elasticities, just as tax incidence


2     Market Power
    • model
        – augmented model: one market side (here seller) has market power
        – in principle, the difference between the two sides are important (market
          power vs. perfect competition) - one could model oligopsonistic markets ex-
          actly the same way (the results would all be vice versa)
        – Nash-Cournot-competition is assumed
            ∗ n identical firms
            ∗ quantity competition
            ∗ all competitors decide simultaneously taking other quantities as given
        – there is a valorem tax t paid by the firm: p = q(1 − t)
        – SOC is assumed to hold (second derivative negative) by assuming the de-
          mand curve to be convex, but not too convex
    • central findings
        – legal incidence doesn’t matter
        – over-shifting may occur
        – specific and ad valorem taxes are not longer equivalent


2.1   Invariance of legal Incidence

    • compare an ad valorem tax t paid by sellers and an ad valorem tax C paid by
      buyers
    • sellers receive only (1 − t) of the price consumers pay and buyers have to pay
      (1 + tC ) the price sellers get
    • only seller is taxed: q = p(1 − t)
    • only buyer is taxed: p = q(1 + tC ), q = p/ (1 + tC )
    • for the rate t = tC / (1 + tC ), the cases are equivalent
    • if the prices are the same, the quantity X ∗ has to be the same, too
    • that means, the invariance of incidence holds


2.2   Determinants of the incidence

    • from the profit maximization, the the equilibrium condition Z can be derived

                       πj = (1 − t)p j − (c + τ) j
                       p = p( j + X−j )
                       ∂πj
                           = (1 − t)(p ∗ + p) − (c + τ) = 0
                                         j
                       ∂ j
                                                                           ∗
                       Z := (1 − t)(p (X) + p) − (c + τ) = 0 mit X = n     j
                                                                                      (20)

    • the total derivative of Z has to be zero
                                          ∂Z          ∂Z          ∗
                                   dZ =        dt +       ∗
                                                              d   j
                                                                      =0              (21)
                                          ∂t          ∂   j
17                                                       Lion Hirth: Taxation



     • solving for d     ∗ / dt
                         j
                                    results in an expression for the change in quantity traded

                                                                                                         ∗
                                                                                                     n
           ∂   ∗
                         Zt                          (p     ∗ + p)                             p p       j
                                                                                                         p
                                                                                                             +n
               j                                            j
                   =−           =                            ∗
                                                                                    =                        ∗
            ∂t          Z   ∗       (1 − t)(p n              j + p (1 + n))
                                                                                                     p n
                            j                                                           (1 − t)np        p
                                                                                                             j
                                                                                                                 + (1 + n)

                            p 1/ εd + n
                   =                                                                                                         (22)
                       (1 − t)np (η + 1 + n)

     • where η is the elasticity of the inverse demand function and assumed to be
       negative
     • the equilibrium is stable if η + 1 + n > 0, which is assumed to be given
     • we are interested in price and quantity changes
                                                 ∗
                            ∂X           ∂       j                p εd + n
                                    =n               =                                                                       (23)
                              ∂t         ∂t              (1 − t)p (η + 1 + n)
                            ∂p           ∂X                   p εd + n
                                   =p     =                                                                                  (24)
                            ∂t         ∂t   (1 − t)(η + 1 + n)
                            ∂q      ∂p(1 − t) ∂p ∂X ∂p(1 − t)             ∂X
                                  =                 +          = (1 − t)p    −p
                              ∂t        ∂p    ∂X ∂t       ∂t              ∂τ
                                          p           1
                              =                         −η−1                                                                 (25)
                                 (1 − t)(η + 1 + n) εd

     • Results
     • over-shifting might occur, depending on the demand function
         – over-shifting is defined as a increase of q due to the introduction of a tax
         – that implies that the buyer pays more than 100 percent of the tax
         – this occurs when (εd (η + 1) > 1
         – the monopolist always sets a price so that ε < −1
         – that means, η < −2 is a sufficient condition for over-shifting
         – for a linear demand curve, η = p = 0, so there is under-shifting
         – for constant elasticity of demand η = 1/ εd −1, so there is full forward shifting


2.3    Specific vs. Ad Valorem Tax

     • here the same procedure is repeated for a specific tax τ

                                             ∂       ∗
                                                     j        Zτ                1
                                                         =−            =                                                     (26)
                                             ∂τ               Z   ∗
                                                                  j
                                                                           p (η + 1 + n)
                                             ∂X                    n
                                                     =                                                                       (27)
                                             ∂τ          p (η + 1 + n)
                                             ∂p               n
                                                     =                                                                       (28)
                                             ∂τ          (η + 1 + n)
                                             ∂q               n                          η+1
                                                     =                      −1=−                                             (29)
                                             ∂τ          (η + 1 + n)                    η+1+n

     • specific results for specific demand functions
                                       Tax Incidence                                     18



        – for a linear demand curve under-shifting occurs
        – for a constant elasticity demand curve, over-shifting occurs
        – in a monopoly with linear demand, both sides bear half of the tax
        – over-shifting is more likely with an specific tax than an ad valorem tax
        – profits will be reduced even in the case of full forward shifting, since q re-
          mains constant, but ∗ declines
                                   j
    • an ad valorem tax is always better than a specific tax
        – the reduction in quantities is more pronounced in the case of a specific tax
        – because with an ad valorem tax, firms have to bear only a part of the price
          decrease when increasing output: an ad valorem tax is an implicit output
          subvention
        – consumers are better of since output increases
        – for higher output, the tax rate decreases (holding revenues constant), so
          that a monopoly firm is also better off
        – for an oligopoly, profit effects are uncertain, but welfare increases unambi-
          guously
        – → a budget-neutral substitution of a specific tax by an ad valorem tax is
          Pareto-improving in a monopoly and welfare improving (as measured by sum
          of surpluses) in all oligopolies
        – mathematical proof in the script, p. 18-21
    • Results
        – ad valorem and specific taxes are not longer equivalent
        – an ad valorem tax is superior in welfare terms


3    One-sector General Equilibrium
    • even with only one sector, the General Equilibrium (GE) is significantly more
      complex than the previous models
    • some of the original assumptions of the model in section (1) remain the same
         – one homogeneous good with quantity (one sector)
         – supply price q differs from demand price p
         – supply always has to equal demand: s (q) = d (p)
         – perfect competition: all agents are price takers
    • now we introduce production into the model
         – two factors of production: labor L and capital K with the real prices wage
            rate   and real interest rate r
         – that means, now we have three markets: for the good and the two factors
         – backward and forward shifting is possible
    • factors have different price elasticities of supply
         – in our example, K is supplied perfectly inelastically (at a constant quantity)
         – L is supplied elastically depending positively on the real wage
         – what matters for the outcomes of the model is a difference in price elasticity
            of supply, one could also model L inelastically or model one factor less elastic
            than the other (instead of perfectly inelastic)
    • utility function of households are homogeneous of degree zero on all prices
         – that means the price level doesn’t matter for demand
         – this implies we look at real factor prices
    • the production function is linear homogenous (CRS)
         – factors can be substituted in production process (in contrast to Leontief-like
            production functions)
19                                  Lion Hirth: Taxation



     • instead of analysing suppliers and demanders, now we focus on capitalists (ca-
       pital owner) vs. workers (labor owners)
     • five ad valorem taxes are analyzed
          – labor returns (t )
          – capital returns (tr )
          – output (t )
          – consumption (tC )
          – income (t )
          – taxes on factors and output are paid legally by the firm, taxes on income and
            consumption are legally paid by the consumers
     • exogenous variables
          – capital supply K
          – tax rates ((t , tr , t )
     • endogenous variables
          – real factor prices (p, , r)
          – labor supply Ls
     • central findings
          – a tax on capital reduces the interest rate
          – a tax on labor reduced both the wage and the interest rate
          – the share of a labor tax workers have to bear is determined crucially
            by demand and supply elasticities of labor
          – invariance of incidence holds for a number of uniform taxes
          – the share of a uniform tax workers bear depends again on demand
            and supply elasticities of labor
     • empirical labor supply and demand elasticities
          – here labor is modeled as being supplied elastically (compared to perfectly
            inelastic capital)
          – labor supply at the extensive margin (through more weekly hours or higher
            labor intensity) is often found to be pretty low
          – labor supply at the intensive margin (though a change of the participation
            in the labor force) is a lot higher, but often only in the long run
          – classical economists like David Ricardo often assumed that wages are fixed
            at subsistence levels and thus labor supply is perfectly elastic; that means
            that a tax on wage income is effectively a tax on firms (which they opposed
            on the ground of long-run capital accumulation and growth considerations)
          – note also that elasticities might vary dramatically depending on the time
            horizon: in the short run, labor is often assumed to be supplied elastically
            while capital supply is fixed; in the long run it might be plausible to assume
            the opposite


3.1    Taxes on Factor Returns

     • while empirically often taxed uniformly as “income”, taxes on wage income and
       capital income should be considered differently in an incidence analysis
     • empirically, with the “Abgeltungssteuer” in Germany from 2009 on, there will be
       in fact a differentiated income tax on labor and on capital
     • the profit equation in nominal terms, given that = F(Ls , K) (implicitly assuming
       the equilibrium condition holds)
                                p
                        π=              F(Ls , K −   pL(t + t ) − rpK(t + tr )      (30)
                             (1 + t )
                                                   Tax Incidence                                         20



  • differentiating with respect to the two factors results in the optimality condition
    for the representative firm

                                      Z1 = FL (Ls , K) −       (1 + t )(1 + t ) = 0                    (31)
                                       Z2 = FK (Ls , K) − r(1 + tr )(1 + t ) = 0                       (32)

  • both FOC have to hold before and after introducing a tax, thus the changes with
    respect to the introduction of a tax have to be zero, thus we take total derivative
    of (31) (32) with respect to , r, t , tr , t

             ∂Z           ∂Z           ∂Z          ∂Z           ∂Z
      dZ =        d   +        dr +         dt +        dtr +        dt                                (33)
           ∂       ∂r      ∂t        ∂tr   ∂t
                 s
      dZ1 = FLL L − (1 + t )(1 + t ) d + 0dr −                           (t + t )dt + 0dtr −   (t + t )dt = 0
                                                                                                        (34)
      dZ2 = FLL Ls d      − (1 + tr )(1 + t )dr + 0dr L − r(1 + t )dtr − r(1 + tr )dt = 0              (35)

  • solving this system of linear equations can be done by hand or by applying Kra-
    mer’s rule
  • for the analysis of a tax, we set the other two taxes to zero for simplicity

                                             ∂                      εd
                                                   =                       <0                          (36)
                                             ∂t        1 + t εs − εd
                                              ∂r         ƒKL Ls  εd
                                                   =                        <0                         (37)
                                             ∂t         1+t      εs − εd
                                             ∂
                                                   =0                                                  (38)
                                             ∂tr
                                             ∂r            r
                                                   =−               <0                                 (39)
                                             ∂tr        (1 + tr )

  • Results
       – a tax on capital only reduces the return on capital r
       – a tax on labor reduces the returns on both factors,              and r (cross
         shifting)
       – the more elastic the supply of labor and the less elastic the demand
         of labor, the smaller the share of a labor tax that the workers bear
       – this is because capital us supplied inelastically and labor is supplied elasti-
         cally
  • empirically, it is often found (or assumed) that capital supply is highly elastic in
    times of trade openness and globalized capital markets (at least in the long run):
    that means all taxes on any factor returns are ultimately borne by workers
  • not analyzed formally here, but quite intuitive, taxes on economic rents (like
    profits or land rents) cannot be shifted (a tax consumer surplus is not feasible,
    since it cannot be observed by tax authorities)


3.2   Taxes on Output, Income, and Consumption

  • all taxes (t , tC , t ) are uniform (“flat tax” without tax exemption) - not progressive
    as most real world income taxes are
  • invariance of incidence restated
21                                    Lion Hirth: Taxation



         – uniform taxes on income, consumption, output are all equivalent to each
            other and equivalent to a uniform tax on factor incomes
         – given linear homogeneous production function the Euler theorem applies
            and input equals output, the sum of factor incomes, and consumption (all
            income is spent on )
         – that implies that the tax base for all four taxes is the same: what the house-
            holds get (income), what they spend (consumption), what firms buy (inputs)
            and what they produce (output) - it is all the same
         – if the taxes introduce the same tax gap, they are equivalent
         – this is the case for t = tr = t = t / (1 − t ) = tC / (1 − tC )
         – obviously, tax rates differ only because the how they are paid legally (by the
            household or the firm)
         – it does not matter if factor income is taxed uniformly (and paid by the firm)
            or income is taxed (and paid by the households)
         – it does not matter if income is taxed or consumption
         – it does not matter if inputs are taxed uniformly or output is taxed
     • tax incidence
         – the question of the tax incidence analysis is: which share of the tax is borne
            by workers and which by capitalists?
         – the shares depend on elasticities of demand and supply of labor
         – the higher the demand elasticity and the lower the supply elasticity, the
            higher the relative tax burden for workers
         – the share of the tax the workers bear is bound upwards at their share of total
            income
         – the share of the tax the capitalists bear is bound downwards at their share
            of total income
         – if labor and capital are both supplied inelastically, they bear a share of the
            tax being their share of total income
     • Results
         – uniform taxes on income, consumption, output, and both factor in-
            comes are equivalent
         – the higher supply elasticity and the lower demand elasticity of a
            factor, the smaller the share of the tax the factor has to bear
         – if labor is supplied perfectly elastically (or if it is demanded inelastic), capital
            bears the entire tax burden
         – if both factors are demanded and supplied with the same elastici-
            ties, they bear a share equal to their income share
     • the case of savings
         – if we allow for savings, an expenditure tax is like an income tax where all
            savings are exempted from taxation
         – in a dynamic model where all income is consumed, an expenditure tax is like
            an income tax where interest income is exempted
         – thus, the exemption of savings and interest income is equivalent
         – switching from an income tax to an expenditure tax (or, less dramatic, incre-
            asing the VAT rate while reduce income tax rates) is intergeneration redistri-
            bution: people who live from capital accumulated pay twice


4     General Equilibrium Model
     • the model is very restrictive in its assumptions, but still the formal analysis is
       heavy in notation and quite messy
                                  Tax Incidence                                  22



• two goods (two sectors) and two factors
• factors can be substituted in production process as goods are substitutes in con-
  sumption
• supply of capital and labor is fixed, but the factors are perfectly mobile between
  sectors
     – formally, K = K1 + K2 , L = L1 + L2
     – that means, there is no distortion between the labor-leisure decision, but
       between the employment of factors in the two sectors
     – uniform taxes on factor returns are lump-sum, since supply cannot be redu-
       ced
• production functions are well behaving and linear homogeneous (CRS) and mar-
  kets are perfectly competitive
     – firms make no profits, thus they can’t bear any tax
     – since the economy is closed, the Euler theorem holds: rK + L = p1 1 + p2 2
     – the labor-capital ratio chosen by firms doesn’t depend on the scale
• all firms are identical (have the same production technology)
• households have identical homothetic preferences
     – that implies households are identical
     – demand for goods depends only on relative prices and aggregate income,
       pure redistribution between household doesn’t change the structure of over-
       all demand
• taxes
     – taxes on factor returns can be differentiated between sectors
     – but taxes on commodities cannot be differentiated between households
     – we allow for t1 , t2 , t 1 , t 2 , tr , tr
                                           1 2

     – that means taxes can be shifted forward (to consumers), backward (to factor
       suppliers) or across (to the other factor)
• the formal analysis is skipped here
• a graphical analysis is done in the script p. 41-46
• central findings
     – a uniform tax on factor return in one sector is equivalent to an ad valorem
       commodity tax in that sector at the same rate: t 1 = tr ⇔ t1
                                                                1

     – this implies that an economy-wide uniform tax on factor returns is equivalent
       to a VAT of the same rate: t 1 = t 2 = tr = tr ⇔ t1 = t2
                                                  1 2

     – a tax on output of the labor intensive sector will reduce the wage; this is
       stronger for
         ∗ higher labor intensity in that sector
         ∗ lower elasticity of technical substitution (in both sectors)
         ∗ higher price elasticity of demand for the good produced in that sector
     – a wage tax in labor intensive sectors will reduce the wage (since labor is
       substituted and production is shifted to the other sector)
     – a wage tax in capital intensive sectors has an ambiguous effect on the wage
       (since both substitution and output effect work in opposite directions)
23                                  Lion Hirth: Taxation



III     Optimal Commodity Taxation
     • while tax incidence is a positive theory, the analysis of optimal taxation is a
       normative one
     • it is analysed what tax system is best from the taxpayers point of view
     • central question of the chapter is about “optimal taxation”: what is the best
       way to generate government revenues?
     • since (as will be shown) there is no feasable revenue-generating tax system, this
       question is a second-best analysis
     • the amount of tax revenue is exogenously given and deficits are ruled out, that
       is we conduct a differential tax analysis
     • we don’t allow for income taxation (which would be in the case of homogeneous
       households a lump-sum tax)
     • so the main question is if taxes on commodities should be differentiated
       or not (and, if yes, how)
     • if housholds are homogenous, distribution doesn’t matter, if they’re heteroge-
       neous it does matter (then a explicit welfare function has to be specified)
     • note that there is always one untaxed good: leisure, so that there is always
       a decision that is distortion (the labor-leisure decision)
     • this is “the decisive ingredient of the approach” that makes a first best solution
       unattainable
     • the central result is that whenever it is possible, in the second best the dis-
       tortion of the labor-freetime decision is counterbalanced by distorting
       prices through non-uniform commodity taxes
     • here, the untaxable good is named “leisure”, but it is easy to reformulate the
       theory and use goods produced at home as the “good number n + 1”
     • the analysis of this section is in the tradition of Frank Ramsey’s seminal article
       “A Contribution to the Theory of Taxation” (1927)


1     Lump-Sum Taxes
     • lump-sum taxes are taxes that cannot be avoided by changing one’s be-
       havior
     • lump-sum taxes have negative no welfare effects (as measured as the sum of
       producer surplus and consumer surplus) - they are pure redistributive
         – all FOC of utility maximization will remain the same
         – that means, there is no change of behavior at the margin
         – all consumption choices and factor supply choices remain the same at the
           margin
         – thus all relative prices remain the same
         – of course incomes will be lower - just by the amount of taxation
         – the lower income will have effects on others if the tax payer has market
           power (then its reduced demand effects relative prices)
         – but there is no dead weight loss: no welfare effects
     • lump-sum taxes and incidence
         – in general, if one cannot reduce taxes by changing one’s behavior, agents
           won’t change their behavior and thus there is no chance to shift the tax
           burden
         – but lump-sum taxes can be (partially) shifted, if voluntary transfers are in-
           volved or if there is market power (so that more binding budget constraint
           has an effect on others)
                                 Optimal Commodity Taxation                             24



           – what about differential effects in the case of not linear homogeneous utility
             functions?
    •   proposals for lump-sum taxes
           – a head-tax comes close to a lump-sum tax, but even this can be avoided -
             by emigration or in the long run by getting less kids
           – the tax could be tailored on exogenous characteristics such as age or sex
           – a tax on “earning capacities” has also been proposed as a lump-sum tax -
             but how should we measure earnings capabilities?
           – a tax on all incomes and commodities has been proposed, but ain’t no lump-
             sum tax since leisure (and other goods) can’t be taxed directly
           – a tax on profits seems to be lump-sum, but it is not when taking the decisi-
             on about production locations into account or the decision of becoming an
             entrepreneur or a dependent worker
    •   problems
           – all these taxes are probably highly regressive: by definition they have to
             impose a tax burden that is unrelated to (for example) income levels; for
             Laszlo Goerke, this is “the strongest objection”
           – more generally, a lump-sum tax cannot incorporate almost any accepted
             perceptions of justice
           – there are also political economy issues: a differentiated head tax (which
             would be as lump-sum as a uniform one) would induce corruption and other
             kinds of rent-seeking behaviour and would be regarded as highly unfair
           – the technically feasible lump-sum taxes (head tax) is politically not feasible
             in today’s democratic society due to it’s regressiveness
    •   → it’s not clear that there exist any lump-sum taxes and surely they
        aren’t any option in the real world
    •   in recent history, the closest thing to a lump-sum tax in the OECD was Magret
        Thatcher’s head tax for county financing - it had to be abolished after two years
        due to protest


2       No Distortion means No Revenues
    • the “model”
        – we don’t use a agent-modeling technique in this subsection, we only state
          the conditions for Pareto-efficiency
        – we allow for taxes t , tr , t1 , t2
        – production functions are linear homogeneous (CRS)
    • procedure of the argument
        – first step: remainder of marginal requirements for Pareto efficiency
        – second step: finding constraints of tax rates for the requirements to hold
        – third step: calculating tax revenues
    • central findings
        – a non-distortionary tax system (that is a system the remains Pareto-
          efficiency of a market outcome) cannot generate any revenues
        – this is the “basic result” of commodity taxation
        – the result is driven by the assumption that there is at least one good
          that we cannot tax (in our case leisure)


2.1     Requirements for Pareto-Efficiency

    • recall the conditions stated in (2) for a Pareto-efficient allocation in section (3.1)
25                                    Lion Hirth: Taxation



 1. Marginal Rates of Substitution (MRS) are the same across households

                                          ∂        1                        2
                                              1/ ∂ 1               ∂   2/ ∂ 1
                                                   1
                                                           =                2
                                          ∂   1/ ∂ 2               ∂   2/ ∂ 2

 2. Marginal Rates of Factor Substitution (MRFS) are the same across households

                                          ∂            1                        2
                                              1 / ∂L               ∂   2 / ∂L
                                                       1
                                                           =                    2
                                          ∂   1 / ∂K               ∂   2 / ∂K

 3. Marginal Rates of Technical Substitution (MRTS) are the same across production
    processes
                                   ∂ƒ1 / ∂L1   ∂ƒ2 / ∂L2
                                             =
                                   ∂ƒ1 / ∂K1   ∂ƒ2 / ∂K2
 4. MRS has to equal MRT

                       ∂        1                   2
                           1/ ∂ 1         ∂    2/ ∂ 1              ∂ƒ2 / ∂L2            ∂ƒ2 / ∂K2
                                1
                                      =             2
                                                               =                    =
                       ∂   1/ ∂ 2         ∂    2/ ∂ 2              ∂ƒ1 / ∂L1            ∂ƒ1 / ∂K1

 5. MRFS has to equal MRTS

                       ∂   1 / ∂L1        ∂    2 / ∂L2             ∂ƒ1 / ∂L1            ∂ƒ2 / ∂L2
                           1 / ∂K 1
                                      =       2 / ∂K 2
                                                               =                    =
                       ∂                  ∂                        ∂ƒ1 / ∂K1            ∂ƒ2 / ∂K2

 6. The marginal rate of substitution between factor supply and the consumption
    of a commodity for a household has to equal the marginal productivity of that
    factor in the production of that commodity

                                          ∂ / ∂L/ K                     ∂ƒj
                                                           j
                                                               =
                                              ∂ /∂                     ∂L/ Kj


2.2   Constraints on Tax Rates

 1. MRS are the same across households if tax rates on goods are the same for
    all households
                          ∂ 1/ ∂ 1
                                 1
                                     ∂ 2/ ∂ 2
                                            1
                                                q1 (1 + t1 )
                                   =          =                         (40a)
                          ∂ 1/ ∂ 1
                                 2   ∂ 2/ ∂ 2
                                            2
                                                q2 (1 + t2 )
 2. MRFS are the same across households if tax rates on factor incomes are the
    same for all households

                             ∂   1 / ∂L1          ∂    2 / ∂L2                (1 − t )
                                              =                        =                            (40b)
                             ∂   1 / ∂K 1         ∂    2 / ∂K 2            r(1 − tr )

 3. MRTS are the same across production processes since households pay the tax
    (otherwise condition 6. had to hold)

                                      ∂ƒ1 / ∂L1            ∂ƒ2 / ∂L2
                                                      =                       =                     (40c)
                                      ∂ƒ1 / ∂K1            ∂ƒ2 / ∂K2                r
                             Optimal Commodity Taxation                          26



 4. MRT are the same across production processes since households pay the tax
    (otherwise condition 5. had to hold)
                                 ∂ƒ2 / ∂L2       ∂ƒ2 / ∂K2        q1
                                             =                =               (40d)
                                 ∂ƒ1 / ∂L1       ∂ƒ1 / ∂K1        q2
 5. MRT equals MRS if tax rates on both goods are equal
                                      q1        q1 (1 + t1 )
                                            =                                 (40e)
                                      q2        q2 (1 + t2 )
 6. MRTS equals MRFS if tax rates on both factors are equal
                                                  (1 − t )
                                           =                                   (40f)
                                       r         r(1 − tr )
 7. MRS between factor supply of a factor and consumption of a good equals the
    marginal productivity of that factor in the production of that good if tax rates
    on factors are the negative tax rates on goods
                                           (1 − t )
                                                      =                       (40g)
                                      q1 (1 + t1 )        q1

  • in sum, the constraints are: t   = tr = −t1 = −t2 := t

2.3   Tax Revenue under Pareto-Efficiency

  • Tax revenues under the constraints presented are given as:

                             R = ( L + rK)(t) + (q1 + q2 )(−t)                 (41)

  • since we assume a linear homogeneous production function, the sum of the in-
    comes must equal the sum of the goods produced
  • there are no net revenues
  • taxing profits
       – Extending the model and allowing for profits shows that taxing profits does
         not have any distortionary effects in the short run, when households take
         profit income as exogenously given
       – But when making the employment decision (worker vs. entrepreneur) endo-
         genous in the long run, taxes on profits are distortionary in the sense that
         they bias this decision towards becoming a worker.
  • Result: A tax system that preserves Pareto-efficiency cannot generate
    any revenues.

2.4   Non-distortionary Tax Systems

  • any tax system that doesn’t change marginal behavior is non-distortionary
  • if income is fixed and all income is consumed, a proportional tax on all goods is
    non-distortionary
  • if income is fixed and there is only a fixed consumption bundle available, taxing
    that boundle is non-distortionary
  • if leisure can be taxed, a proportional tax on both leisure and work is non-
    distortionary
27                                  Lion Hirth: Taxation



     • if “not consuming” can be taxed, a proportional tax on consumption and “not
       consuming” is non-distortionary
     • the crucial assumption for the result derived in section (2.3) are
          – non-taxability of leisure
          – endogeneity of labor supply
          – income taxes are not available
          – constant returns to scale
          – taken together, these assumptions effectively rule out any lump-sum
            taxation
          – in this sense, we get out of the model what we already assumed




3     Theory of the Second Best

     • this subsection draws heavily of Lipsey and Lancester’s (1956) article in the Re-
       view of Economic Studies, The General Theory of the Second Best
     • the General Theory of the Second Best
          – a Pareto efficient allocation requires the fulfillment of all optimality condition
            simultaneously
          – if an additional constraint (as the revenue requirement in combination with
            the no lump-sum taxes assumption) prevents the allocation from attaining
            Pareto-efficiency, in general all other optimality conditions have to change
            to attain a second best solution
          – there is no way how to judge a priori in which direction and by what amount
            the conditions change
          – especially, it is not true that a situation where more (but not all) PE-conditions
            are fulfilled is superior to a situation where less conditions are fulfilled
          – or, as Stiglitz (1987) puts it, “counting the number of distortions is no way
            to do welfare analysis!”
          – this implies that introducing a second (or third) distortion might be beneficial
     • implications for the analysis of optimal taxes
          – we might be able to counterbalance the distortion introduced by taxation
            with introducing another tax or by deviating from the rules we have derived
            for the first-best solution
          – what follows in the subsequent sections is essentially a second best analysis
            that shows that in general we have to deviate from the rules derived in
            section (2)
          – there it was shown that commodity taxes should be uniform
          – when introducing the assumption of non-taxability of leisure (and thereby
            ruling out the first best), it will be shown that uniform commodity taxes
            are not longer desirable
          – in sum, in a second best analysis we are not looking for non-distortionary
            taxes, but for optimal distorting taxes
     • why second best optimal taxes are almost always distortionary
          – it is no incidence that almost always we will find the second best tax struc-
            ture to be distortionary
          – starting from first best (non distorting) commodity taxes, introducing a small
            distortion is of second order (Envelope theorem)
          – but the effect on labor supply are of first order and thus dominate the second
            order effect
                               Optimal Commodity Taxation                              28



4     Homogeneous Households

    • the model
        – simple GE model
        – one (representative) household (this is equivalent to assume homogeneous
          households who are identical in their utility functions and their productivity)
        – = ( 1 , 2 , F)
        – n consumption goods, each covered by a specific tax τ , but no tax of F
          (“good number n+1”)
        – leisure cannot be taxed since it’s consumption doesn’t involve market tran-
          saction and thus isn’t observable by tax authorities
        – labor is the only production factor
        – L=T −F
        – perfect competition
        – constant returns to scale / linear homogeneous production functions
        – perfect competition and CRS imply that there are no profits, so that firms
          cannot bear the tax
        – wage      is fixed and untaxed
        – labor income and in addition exogenous income y
    • central findings
        – the Ramsey Rule states that the relative reduction in Hicksian de-
          mand should be equal for all goods to minimize efficiency loss of a tax
          system
        – this shows that in the second best optimum taxes should not be non-distortionary,
          but distort optimally
        – this result is hard to implement since Hicksian demand cannot be obser-
          ved
        – the first best is not attainable since we don’t allow for taxing leisure; putting
          it differently, for (n + 1) goods there are only n tax instruments available
        – using additional assumptions, this result can be made more tackable
        – assuming inelastic supply of labor or assuming homothetic prefe-
          rences we find that a uniform commodity tax rate is optimal
        – rewriting the Ramsey Rule in terms of wage elasticities makes clear that tax
          differentiation increases efficiency when it can indirectly tax the untaxa-
          ble good leisure by taxing its complements (and fixed labor supply as
          well as homothetic preferences just make this impossible)
        – the results are not robust for changes in the assumptions
        – → no consistent policy recommendation emerges
        – in general, a uniform commodity tax (VAT) is not even second-best



4.1   Ramsey’s Rule

    • Utility and Revenues
        – = ( 1 , 2 , F) (utility given the quantities of goods and leisure consumed)
        –     = (p1 , p2 , , y) (indirect utility given optimal choices for quantities and
            given prices, the wage rate and income)
        – note that ∂ / ∂p1 = ∂ / ∂τ1
        – revenues are assumed to be exogenously given: R = τ · = R > 0
    • the government is maximizing the utility of the household with respect with re-
29                                      Lion Hirth: Taxation



       spect to τ1 , τ2 , ..., τn , λ given its own budget constraint

                            L = (p1 , p2 , ..., pn ,    , y) + λ             τ ·   −R   (42)
                                                                             
                               ∂L      ∂                    n            ∂
                                   =        + λ       k+         τ ·          =0      (43)
                               ∂τk    ∂pk                   j=1
                                                                         ∂pk

          – there are (n − 1) FOCs like (43)
          – the term in the brackets is ∂R/ ∂τk and must be positive (if not, we are in
            an inefficient area of excess taxation, where higher taxes result in lower
            revenues), thus λ must be positive
          – in the case of excess taxation we couldn’t obtain an interior solution
          – the first term in the brackets ( k ) is positive, and it can be shown that the
            second (∂R/ ∂τk − k ) is negative
     • solving the system of FOCs results in two alternative interpretations for optima-
       lity
          – the ratio if utility loss and revenue gain have to be equal for all taxes / the
            additional tax revenue for every unit of utility lost due to a tax increase has
            to be equal for all taxes
                                             ∂R/ ∂τk   ∂R/ ∂τj
                                                     =                                (44a)
                                             ∂ / ∂τk   ∂ / ∂τj
         – the ratio of utility losses for two taxes has to equal the ratio of revenue gains
           for those two taxes
                                             ∂ / ∂τk   ∂R/ ∂τk
                                                     =                                 (44b)
                                             ∂ / ∂τj   ∂R/ ∂τj
     • there is an alternative way to derive this result:
         – for optimal taxation, two conditions have to hold simultaneously
                                                 ∂R             ∂R
                                         dR =          dτk +          dτj = 0           (45)
                                                ∂τk             ∂τj
                                                ∂               ∂
                                         dR =          dτk +          dτj = 0           (46)
                                                ∂τk             ∂τj

         – solving and setting equal results in the same result as above
         – the two condition imply that under optimality any change in a tax rate (while
           holding the others fixed) reduces revenues or reduces utility: there is no
           Pareto-improvement possible
     • Using Roy’s Identity
         – Roy’s identity:
                                             ∂ / ∂pk
                                           −         = k                            (47)
                                              ∂ / ∂y
         – substituting the identity in (44b) results in:
                                                 ∂R/ ∂τk           k
                                                             =                          (48)
                                                  ∂R/ ∂τj            j

         – the ratio of revenue gains has to equal the ratio of consumption (or output)
           levels of the goods taxed
         – this is hard to implement since changing a tax rate effects all quantities
           consumed and thus effects the revenues derived from all other taxes
                          Optimal Commodity Taxation                                                  30



• the Slutzky Equation
    – the Slutzky equation states that any change in quantity due to a price change
      (of the good or another good) can be separated into a substitution effect and
      an income effect
                                               ∂    H
                                   ∂   j            j       ∂   j       ∂Y
                                           =            +           ·
                                   ∂pk         ∂pk          ∂Y          ∂pk
                                   ∂   j       ∂ H
                                                 j          ∂   j
                                           =            −           ·   k                            (49)
                                   ∂pk         ∂pk          ∂Y
    – where H is the Hicksian demand (dependent on level of utility while Mars-
      hallian demand is dependent on level of income)
    – this holds for m = k and m = k: the effect of a price change on the same
      good can be separated, but the effect on all other goods too
    – the substitution effect is the change in demand for the utility level hold fix
      (and changing relative prices), the income effect is the change in demand
      holding relative prices fix (and changing income)
• using Slutzky in the optimality condition (43)
    – we assume that the marginal utility gain from income is constant: ∂ / ∂τk =
      −α k
    – substituting in (43) and solving for α k , using the Slutzky equation and col-
      lecting terms yields:
                                                                                       ∂
                                               α    k   =λ          k   +        τ ·
                                                                                       ∂pk
                                           α−λ                          ∂
                                       k                =       τ ·                                  (50)
                                               λ                    ∂p
                                                                     k                          
                                           α−λ                       ∂ H ∂ k
                                       k                =       τ ·  k −    ·               j
                                                                                                 
                                               λ                      ∂pj ∂Y
                                                                        ∂    H
                         α−λ               ∂   k                             k
                     k         +    τ ·                 =       τ ·
                          λ                ∂Y                            ∂pj
                                                                        ∂    H
                                                                             k
                                                   kb   =       τ ·                                  (51)
                                                                         ∂pj
    – with b being independent of which good k is looked at and of the same value
      for any good k
    – it can be shown that b has to be non-positive to result in positive government
      revenues
    – this equation holds only if λ > α > 0
• The Ramsey rule
    – assuming that the taxes are introduced, (51) can be rewritten as
                                                            H
                                               kb       = dXk                                        (52)
                 H
    – where dXk is the impact of changes in all taxes on the Hicksian demand
      (compensated demand) for good k
    – this implicitly defines the optimal tax structure
    – it states that the reduction of Hicksian demand has to be proportional to the
      Marshallian demand, and since both demands are the same (in levels), the
      reduction in demand should be proportional to the demand
31                                         Lion Hirth: Taxation



     • Results
         – The relative reduction in Hicksian demand due to the tax system
           has to be equal for all goods(Ramsey’s Rule)
         – we have shown in section (2) that a non-distortionary tax system rises prices
           proportionally (but doesn’t generate revenues)
         – this result cannot be extended to the second best
         – a second-best revenue generating tax system doesn’t rise prices
           proportionally, but reduces Hicksian demand proportionally
         – that means, in general, uniform commodity taxes cannot be even se-
           cond best
         – since it is probably empirically impossible to design a tax system according
           to the Ramsey rule, the main finding of this section is negative



4.2    Reformulating the Ramsey Rule

     • implementing the Ramsey Rule is difficult because Hicksian demand is not ob-
       servable
     • in this section, the result is rewritten in two different ways that allow an easier
       implementation (although it is still pretty problematic)
     • rewriting the Ramsey rule both in terms of income elasticities of demand and
       wage elasticities of Hicksian demand shows that in general, commodity taxes
       should be differentiated (and not uniform)
     • this result can be interpreted as an application of the “Theory of the Second
       Best”, since they are a violation of the finding in section (2)
     • since we have introduced another constraint (R > 0), the old optimality conditi-
       ons aren’t desirable anymore



a)    Income Elasticities of Demand
     • income elasticities of demand can be observed
     • intuition
          – welfare loss is caused be substitution effects only
          – the higher the share of overall reduction in demand for a good due to a pure
            income effect, the smaller the share for the substitution effect and thus the
            smaller the welfare loss
          – the higher the income elasticity of a good, the larger the share of the income
            effect
          – thus we try to maximize the income effect by taxing goods higher that have
            a high income elasticity of demand
     • we assume that all substitution effects are symmetric

                                             ∂    H           H
                                                  j       ∂   k
                                                      =                                           (53)
                                             ∂pk          ∂pj
                               ∂   j          ∂ j         ∂ k             ∂   k
                                       +    k         =           +   j                           (54)
                              ∂pk                ∂y       ∂pj              ∂y
                                                ∂ j       ∂ k             ∂ k             ∂   j
                                                      =           +   j           −   k           (55)
                                             ∂pk          ∂pj             ∂y              ∂y
                                     Optimal Commodity Taxation                                                                                32



     • this expression is substituted into (50)
                     α−λ               ∂   j
                 k         =    τj ·                                                                                                          (56)
                      λ                ∂pk
                     α−λ                ∂      k                ∂       k               ∂       j
                 k         =    τj ·               +        j               −       k                                                         (57)
                  λ                        ∂pj                  ∂y                      ∂y
                α−λ        1                ∂      k              ∂         j                                           1             ∂   k
                       −        ·τ         j           −            k                                               =           τ ·           (58)
                  λ        k                   ∂y                       ∂y                                                  k         ∂pj
     • defining the impact of all taxes on good k as above as dXk , we can rewrite the
       equation:
                               α−λ         1                            ∂       k   y               ∂   j   y       dXk
                                       −               τj       j                           −                   =                             (59)
                                λ          y                                ∂y          k           ∂y      j           k
                               α−λ         1                                        dXk
                                       −       (εk yR − z) =                                                                                  (60)
                                λ          y                                                k

     • where R is the tax revenue of the government and z is a constant, independent
       from good k
     • Result: the higher the income elasticity for a good, the higher should be
       the tax-induced reduction in (Marshallian) demand
     • for both fixed labor supply (section a)) and homothetic preferences (section b))
       the income elasticities of demand are equal for all goods, hence goods are tax
       uniformly
          – with fixed labor supply, income is fixed; since there are no income elastici-
            ties, they can’t differ
          – with homothetic preferences, the consumption bundle doesn’t depend on
            the income, so all income elasticities are equal (they are all unity)

b)    Wage Elasticities of Hicksian Demand
     • the idea of representing the Ramsey Rule in terms of wage elasticities of Hicksian
       demand rests on the assumption that leisure is the untaxed good
     • intuition
          – commodity taxes have to (partly) replace the missing tax on leisure to mini-
            mize welfare loss
          – that is, a second distortion is introduced to counterbalance the first distortion
          – leisure is here interpreted as a consumption good with the price
          – note that wage elasticities are just another cross-price elasticity
          – good that are strong complements of leisure should be taxed higher because
            their consumptions makes the tax base smaller
          – that is, we should tax goods heavily that have a (highly) negative wage
            elasticity of demand (Corlett-Hague-Rule)
          – as Homburg argues, substitutes of leisure might be coffee, while liquor and
            watching movies are complements
          – this view is a strong support for differentiated taxes (in contrast to subsecti-
            ons a) and b))
              ∗ fixing labor supply means fixing leisure, which means there is no need to
                tax complements extra
              ∗ homothetic preferences mean by definition that all wage elasticities are
                zero, thus there are no complements to leisure by definition of the utility
                function (there aren’t any substitutes either)
33                                           Lion Hirth: Taxation



     • for this analysis, we restrict our model to two goods instead of n
     • equation (51) then collapses to just two equations; collecting terms brings us to:

                                                             ∂    H          ∂       H
                                                                  1                  1
                                               1b    = τ1             + τ2
                                                             ∂p1             ∂p2
                                                             ∂ H
                                                               2
                                                                             ∂ H
                                                                               2
                                               2b    = τ1             + τ2
                                                             ∂p1             ∂p2
                             ∂   H   ∂   H       ∂   H   ∂   H                 ∂         H           ∂   H
                                 1       2           1       2                           2               1
                        τ1                   −                     =b            1           −   2           (61)
                             ∂p1 ∂p2             ∂p2 ∂p1                             ∂p2             ∂p2
                             τ1 εH εH − εH εH = bp1 εH − εH
                                 11 22   12 21       22   12
                                                                                                             (62)

     • if we assume normal goods, the own price elasticities have to be negative and,
       since we only have two goods, this implies that the cross price elasticities have
       to be positive
     • this means that the right brackets are negative, and, since b is negative, the left
       brackets have to be positive
     • since τ = pt/ (1 + t) we can simplify

                                     t1 / (1 + t1 )                   εH − εH
                                                                       22   12
                                                         =b                                                  (63)
                                     t2 / (1 + t2 )              εH εH − εH εH
                                                                  11 22   12 21

     • it can be shown (with some effort) that this equals the sum of three elasticities:
       the (own) price elasticities of good 1, 2 and the wage elasticity. This is the rule
       of free time complenetarity:

                                      t1 / (1 + t1 )             ε11 + ε22 + ε1
                                                             =                                               (64)
                                      t2 / (1 + t2 )             ε11 + ε22 + ε2

     • if both wage elasticities are equal, the term collapses to unity and we have an
       uniform tax rate
     • in this case, there is no possibility to tax a complement of leisure, since both
       goods are equally good complements (or substitutes)
     • since both εH and εH are negative, the higher the wage elasticity for a good,
                    11       22
       the smaller the tax rate should be (the elasticity of complements is negative)
     • Result: complements of leisure should be taxed at a higher rate

4.3    Special Cases: Additional Restrictions

     • implementing the Ramsey Rule is difficult because Hicksian demand is not ob-
       servable
     • we have seen that rewriting it helps, but the results are still hard to implement
     • in this section, we impose three additional assumption to simplify the result
     • note that all three assumptions are fairly strong and no robust result appears

a)    Fixed Labor Supply
     • assumption of fixed labor supply might be suitable in certain demographic groups
       (e.g., “prime age” men)
     • this assumption makes leisure consumption fixed, too
     • this effectively removes the consequences of non-taxability of leisure
                               Optimal Commodity Taxation                            34



     • taxing labor income means no distortion, since there is no work-leisure decision:
       L=L
     • that means, taxing labor income is a non-distortionary lump-sum tax that
       preserves Pareto-Efficiency: it cannot be avoided by changing the behavior
     • as we have shown, taxing labor is equivalent to a uniform tax on commodity
       (because this commodity tax doesn’t distort the consumption decision)
     • a tax on labor income, a uniform commodity tax, or any combination of these
       two are equivalent: we have an infinite number of possible lump-sum taxes



b)    Homothetic Preferences
     • Homotecitiy and Separability of the utility function
         – utility function can be divided into a function of the consumption bundle and
           leisure: (C( 1 , ..., n ), F) (separability)
         – the sub-utility function of the consumption bundle is homogeneous of degree
           z (has not to be unity)
         – this implies that the partial derivatives are homogeneous of degree z − 1
     • the household’s maximization problem
         – in the budget constraint, leisure is explicitly modeled as a consumption good

                                      T+y−      F−    p     =0                     (65)

         – given the homothetic utility function it can be shown that regardless of the
            level of consumption (the size of the consumption bundle) the composition
            of the bundle is fixed
         – that is, the household will always spend the same share of its income on
            consumption of a certain good
         – the separability assumption guarantees that for a given overall tax burden
            the tax rates can affect the composition of the consumption bundle, but not
            the work-leisure decision
         – this is because the marginal utility of leisure doesn’t affect the marginal
            utility of leisure directly
     • Result: uniform tax rates on all commodities are optimal
     • as in the case of inelastic labor supply, this result is driven by the assumption
       that changing the tax structure cannot affect the work-leisure decision
     • for homothetic preferences, both wage elasticities of Hicksian demand and inco-
       me elasticities are zero



c)    Zero Cross-Price Elasticities
     • Ramsey Rule states that quantities should fall proportionally to their Hicksian
       demand
     • that implies that goods with a high price elasticity should observe a small price
       increase, that is, a small tax rate
     • but the Ramsey Rule states the quantity reduction due to the whole tax sys-
       tem (that is, the price change of all goods) should be proportional, while price
       elasticities are defined in terms of the own price only
     • in this section we assume that the Hicksian demand for a good is only affected
       by the price change of the same good, that is, there are no cross-price effects
       (no substitution effects): εjk = 0 for all j = k
35                                   Lion Hirth: Taxation



     • since ∂ j ∂pk = 0 for all j = k, equation (50) then collapses to

                                             α−λ             ∂    k
                                         k         = τk ·
                                          λ                  ∂pk
                                        α−λ        τk
                                               =      εkk
                                         λ         pk
                                        α−λ          tk
                                               =             εkk                       (66)
                                          λ        1 + tk

     • since α and λ are independent from the good k, this is a constant for all goods,
       the inverse elasticity rule holds:

                                          tj (1 + tk )       εk
                                                         =                             (67)
                                          tk (1 + tj )       εj

     • the ad valorem tax rates have to be inversely proportional to the price
       elasticities of demand
     • if price changes don’t effect the demand of other goods, then redu-
       cing Hicksian demand proportionally is equal to reducing Marshallian
       demand proportionally
     • this is done by taxing price elastic goods less than price inelastic goods
     • note that this stands in sharp contrast to the results derived in the preceding
       subsections
     • the inverse elasticity rule was for decades the most general result derived from
       optimal commodity taxation, before Ramsey’s (1927) paper was discovered again



5     Heterogeneous Households

     • here we drop the assumption that all households are equal in their utility functi-
       ons
     • to get any results, we have to work with a social welfare function
     • here we use a Bergson-Samuelson-type of welfare function
     • the analysis gets that complex, that we consider only the special case of zero
       cross-price elasticities
     • central findings
          – goods consumed more by high-income households should be taxed
            more heavily
          – there is a trade-off between so called efficiency and equity



5.1    General Result

     • we use a model with N households, n goods and specific taxes on all goods, but
       no income tax
     • taxes are uniform across households (that means, no discrimination between
       consumers is possible)
     • the objective function is the welfare (being a function of N indirect utilities) with
                               Optimal Commodity Taxation                                                            36



      the government’s budget constraint reads like this:
                       W = W( 1 , 2 , ..., N )
                         = (p1 , p2 , ..., pn , , y)
                       R=R
                                                                                                              
                                                                                n                N
                       L = W(   1,       2 , ...,       N) + λ
                                                                                        τj              j   − R   (68)
                                                                                j

  • the objective function is maximized with respekt to τk , and the marginal indirect
    utility of income is assumed to be constant (∂ / ∂y = α1 ), so that Roy’s identity
    reads: ∂ / ∂pk = ∂ / ∂τk = −α k
                                                            
                          ∂L   N ∂W ∂              n     ∂ j
                             =          + λ k +      τj     =0                  (69)
                         ∂τk     ∂ ∂τk             j
                                                         ∂pk
                                     n         ∂                N       ∂W
                                                    j
                                λ         τj                =                   α        k       −λ      k          (70)
                                     j
                                               ∂pk                      ∂

  • at this point of the analysis, we’re stuck

5.2   Zero Cross-Price Elasticities

  • to make the analysis easier, we have to assume zero cross price elasticities (as
    we did before as an additional restriction in the case of one household)
  • the left handy side simplifies (similar to equation (50))
  • further, we use the fact that τk / pk = tk / (1 + tk ) and the definition of price elasti-
    cities to get
                                          ∂                 N   ∂W
                                               k
                                λτk                =                     α          k    −λ          k
                                          ∂pk                   ∂
                                    τk ∂           pk               N        ∂W
                                               k                                             k
                                λ                           =           α                        −λ
                                    pk ∂pk              k                    ∂               k

                                     tk                         N   α ∂W                 k
                                              εkk =                                              −1                 (71)
                                1 + tk                                  λ ∂              k

  • since the own-price elasticity is negative and the sum must be positive, a larger
    expression in the sum implies a smaller tax rate
  • if utility is decreasing with income and the marginal changes of utility of low-
    utility households are valued more by society than those of high-utility house-
    holds, both α and ∂W/ ∂ are large for low-income households and small for
    high-income households
  • Result: goods consumed mainly by households with higher income should
    be taxed higher
  • again, this is an argument against uniform commodity tax rates
  • for homogeneous households (or one representative household), α = α,
    ∂W/ ∂ = ∂W/ ∂ , and         k = k , so that we can derive the well-known inverse
    elasticity rule as in equation (67)
                                               tk (1 + tm )                     εm
                                                                            =
                                               tm (1 + tk )                         εk
37                                  Lion Hirth: Taxation



6     The Production Efficiency Theorem



     • introducing another distortion?
         – in this chapter it has been shown that the requirements of a Pareto-efficient
           allocation cannot be fulfilled at the presence of commodity taxes, unless
           labor supply is fixed
         – the theory of the second best shows that it might be beneficial to establish
           a second distortion to counterbalance the distortionary effects of taxes
         – we have shown that indeed this is done by taxing complements of leisure
           higher (and by taxing goods with a higher income elasticity higher)
         – another idea would be to tax firms differently (e.g., according to their labor
           intensity)
         – the “Production Efficiency Theorem” states that production shouldn’t
           be taxed differently
     • importance of the Production Efficiency Theorem
         – the theorem was derived in a seminal paper by Diamond and Mirrlees (1971,
           AER)
         – it is “perhaps the most important result of the theory of taxation” (Stefan
           Homburg 2007, p. 181)
         – in middle of a second best world where “nothing can be said” a positive and
           robust result emerges: don’t tax intermediate goods!
     • intution of the production efficiency theorem
         – firms might be taxed differently by different taxes on intermediate goods
           (taxing factor inputs differently across sectors is equivalent to taxing inputs
           differently)
         – one might think that taxing (or subsidizing) firms according to the labor in-
           tensity of production might counterbalance the labor supply reducing effect
           of commodity taxation
         – but this implies that the marginal rates of technical substitution vary across
           firms (since input costs vary)
         – this brings the economy away from the production possibility frontier and
           can’t be beneficial
         – to frame it differently: introducing a second distortion in the commodity de-
           cision had the price of bringing us away from the optimal consumption point,
           but had the benefit of increasing production (moving the PPF to the upper
           right)
         – introducing a distortion on the production side has the costs of bringing us
           down from the PPF, but has no benefit
         – the deeper reason for this is that the tax system distorts consumption decisi-
           on (consumption vs. leisure), but not production decisions (labor vs. capital)
         – since there is no distortion, we can’t counterbalance it; the first best FOC
           holds
         – on a more abstract level this means: taxes should be located as close as
           possible to the objective function of the taxpayers
     • formal model
         – suppose the production of consumer goods with two intermediate goods         1
           and 2 as inputs (with prices s1 and ss )
                            Optimal Commodity Taxation                                                           38



    – taxing the intermediate goods yields for profit maximization:

                   π = qƒ ( 1 , 2 ) − s1 (1 + t1 ) 1 − s2 (1 + t2 ) 2
                   ∂π                  ∂π ∂ 1     ∂π ∂ 2
                       = ƒ ( 1, 2) +          +           = ƒ ( 1, 2) =                                         (72)
                   ∂q                  ∂ 1 ∂q     ∂ 2 ∂q
                   ∂π                ∂π ∂ 1     ∂π ∂ 2
                       = −sz z +             +           = −sz z ; for z = 1, 2                                 (73)
                   ∂tz               ∂ 1 ∂sz ∂ 2 ∂sz
    – this is because the second and third terms of both conditions are zero accor-
      ding to the Envelope-Theorem
    – since profits are zero both before and after introduction of the taxes:
                                                   ∂π             ∂π
                                        dπ =            dq +
                                                           dtz = 0
                                              ∂q       ∂tz
                                         dq    ∂π/ ∂tz   sz z
                                             =         =       >0                                               (74)
                                         dtz   ∂π/ ∂q

    – for the welfare analysis, we look at the case of a representative household
      who consumes two consumer goods (which can be taxed with τ1 and τ2 ) and
      taxes on the intermediate goods that are used to produce the two consump-
      tion goods

                R = t1 s1   1   + t2 s2     2   + τ1    1   + τ2      2

                L = (p1 , p2 ,          y) + λ     t1 s1     1   + t2 s2    2   + τ1        1   + τ2   2   −R   (75)

    – FOCs are pretty messy and skipped here; substituting and rewriting them
      results in the condition:
                                                                  ∂   1             ∂   2
                                   sz   z   = sz   z   + t1 s1            + t2 s2                               (76)
                                                                  ∂tz               ∂tz
    – this condition is only fulfilled if t1 = t2 = 0
• Results:
    – intermediate goods shouldn’t be taxed
    – there is no need to complement a system of commodity (consumer good)
      taxes with taxes on intermediate goods
    – in contrast to the Ramsey rule, this result can be implemented directly
    – since tariffs on inputs can be interpreted as taxes on intermediate goods,
      this result is also an argument against tariffs on intermediates
• empirical interpretations
    – it was claimed above that the production efficiency theorem is one of the
      central if not the most important result of the theory of optimal taxation
    – so it is straightforward to as: “has this been implemented in empirical tax
      systems?”
    – Homburg argues that both the principle of taxing only value added (“Netto-
      prinzip”) as well as input tax deduction (“Vorsteuerabzug”) are in line with
      the production efficiency theorem
    – similarly, training can be seen as a intermediate good that doesn’t cause
      any utility directly, thus expenses for education and training should be de-
      ductible from taxes
39                                   Lion Hirth: Taxation



IV      Optimal Income Taxation

     • historical role of income taxes
         – monetary income taxes appear fairly late (1799 in England, 1869 the first
            time in Germany (in Hessen), and in the US)
         – note that both in the US and in England this happened in the context of
            extreme revenue needs due to large wars
         – first, people need monetary incomes
         – second, trade taxes, inflation taxes, and some consumption taxes are easier
            to collect
         – third, if only few people receive monetary incomes, they often have the po-
            litical power to prevent income taxation
         – during the last decades, the share of income taxes in total revenues has
            been declining on most countries of the world (developed as well as less
            developed), while there was a remarkable increase in revenues of indirect
            (commodity) taxes, mainly through VAT
     • a short history of optimal income taxation literature
         – at least since the late 19th century economist argue for progressive taxation
            on fundamental principles
         – the English economists Francis Ysidro Edgeworth and Arthur Cecil Pigou were
            the main contributors in this field
         – both argued that diminishing marginal utility in consumptions in combination
            with utilitarian (and other) welfare functions implied progressive taxation
         – with identical utility function, diminishing marginal utility and a special utili-
            tarian welfare functions, social welfare is obviously maximized when dispo-
            sable incomes are equalized
         – both Edgworth and Pigou ignored the incentive effect of taxation (the effect
            of income taxes on labor supply)
         – the “New Welfare Economics” if the 1930s argued that the question of whe-
            ther taxes should be progressive is a philosophical one and limited themsel-
            ves to characterizing Pareto-efficient allocations
         – the first one to recognize the incentive effect of income taxation was the
            Scot James Mirrlees in his 1971 seminal article (for his “contributions to the
            economic theory of incentives under asymmetric information” he won the
            1996 Nobel prize together with William Vickrey)
         – the the 1980s Joseph Stiglitz contributed much to what he calls the “New
            New Welfare Economics”
         – he argues that the fundamental problem in taxation is the lack of information
            that doesn’t allow the government to make lump-sum redistributions
         – this implies that the whole Ramsey tradition-analysis of commodity taxation
            is flawed since here lump-sum taxation is excluded completely by assump-
            tion - in fact (argues Stiglitz) only household-specific lump-sum taxes aren’t
            feasible
     • general ideas of today’s optimal income tax literature
         – potential distortions of the consumption decision are neglected as commo-
            dity taxes are not available
         – only the work-leisure decision can be distorted by taxes
         – households are assumed to differ in their productivity and thus in their wa-
            ges, while they are identical in their utility functions
         – but tax authorities cannot observe productivity (in contrast to firms) and
            thus have to tax based on observed income
                            Optimal Income Taxation                            40



    – household-specific lump-sum taxes are not available (since tax authorities
       cannot identify households: that is, informational asymmetries make household-
       specific lump-sum unavailable)
    – but general lump-sum taxes are available in the sense that zero marginal
       rates are possible while having a positive burden
    – the government wants to tax differentially to make income gaps smaller or
       even equalize incomes (justified, for example, with a Bergson-Samuelson-
       type of welfare function)
    – differential tax burdens for households imply that household have an incen-
       tive to change their behavior to reduce their tax burden (and this change
       reduces welfare)
    – this becomes the central problem in optimal income taxation analysis: how
       to prevent mimicking
    – if we could prevent mimicking costless, we could tax without distortion and
       equalize incomes without costs
    – but prevent mimicking is costly in terms of distortion (we have to introduce
       a positive marginal tax rate)
    – the additional distortion (besides reducing labor supply) arises but from mi-
       micking
    – in other words, people don’t try to avoid taxes by work less and reduce tax
       payment automatically, but by behaving like a different household to pay a
       lower “lump-sum” tax
• the income tax function available is highly flexible
    – in general, income taxes are variable in absolute value and at the margin
    – it is possible to tax a household positively while setting the marginal rate to
       zero
    – that is, the tax function can have any functional form
    – indeed, the optimal tax function in general is highly non-linear, not differen-
       tiable and very complex
    – it is hard to see any income tax in the world that is organized like this (in-
       deed, it is hard to imagine to build one in a democratic process of policy-
       making)
    – this assumption is partly made for analytical purposes (to clarify the pro-
       blem): even if we allow for (general) lump-sum taxes, distortions arise
• organization of part IV
    – first, the individual effects of marginal and absolute taxation of the labor
       supply is analyzed (this is closely related to the analysis of effects of wage
       changes on labor supply)
    – second, labor supply is assumed to be fixed and it is shown that income
       taxes are lump-sum taxes and incomes can be equalized (if this is wished by
       society) without effecting efficiency
    – third, under endogenous labor supply this is not longer the case: to avoid
       mimicking, disposable incomes cannot be equalized. This is the basic model
       of the optimal income taxation analysis.
    – fourth, the analysis is generalized to a continuum of households
    – fifth, cases of both more general and more restricted tax functions are looked
       at
    – sixth, the analysis is generalized by first allowing for home production and
       second looking at tax shifting
    – seventh, both income and commodity taxation is allowed for. It is shown
       that if utility functions are identical and separable, commodity taxes are not
41                                   Lion Hirth: Taxation



            needed. If they are not, they have to be used to obtain a second-best result.
     • this part of the script draws as well on Laszlo Goerkes’ lecture as on Joe Siglitz’
       article in the 1987 edition of the Handbook of Public Economics


1     Wages, Taxation, and Labor Supply
     • total differentiating ( , L ) results in a upward sloping and convex indifference
       curve in the income-consumption space

                                       d              ∂ / ∂L
                                             =−                >0
                                       dL             ∂ /∂
                                                 d2
                                                         >0
                                             d(L )2
     • assume that households pay an income tax T depending on their wage earnings
       T( L ), that is weakly increasing with income and has a constant marginal rate
       (for example, a proportional tax rate)
     • maximizing the household’s utility with respect to consumption and labor under
       the budget constraint results in the condition that the (negative) marginal ra-
       te of substitution between consumption and labor has to equal the change in
       disposable income (since the price of x is normalized to unity)

                                                  −λ =0                              (77a)
                                        L   +λ        (1 − T ) = 0                   (77b)
                                             L −T−           =0                      (77c)

                                        −        =      (1 − T )                     (77d)
                                             L

     • we want to solve the system for L and differentiate the term with respect to
       T , T, and    - but we can’t, since is not specified
     • by totally differentiating the three FOCs and solving the system (either by substi-
       tuting or using Kramer’s rule) we can get partial effects without solving explicitly
       for L
                                                                          
                          L        −1        d          0  0       0         dT
                                  (1 − T )  dL  = λ    0 −λ (1 − T )  dT       (78)
                                                                          
            L           LL
             −1       (1 − T )      0        dλ         0  1 −L (1 − T )     d

     • a pure increase of the marginal tax rate (holding tax level T constant)
       decreases labor supply, and changes in the level of taxation have am-
       biguous effects on labor supply
     • a (non-pure) fall in the marginal tax rate has a positive substitution effect and
       an ambiguous income effect (allowing for a change of the tax level due to the
       change in the marginal tax rate)
     • a rise in the wage rate is qualitatively the same as a fall in the marginal tax rate
     • as it is well know, a rising wage rate has a positive substitution effect and an
       ambiguous income effect
          – price of leisure is going up relatively to the price of consumption, which
            decreases leisure and thus increases labor supply
          – if leisure increases due to the rising income, depends on the question if
            leisure is a normal good
                                 Optimal Income Taxation                                42



        – if it is a normal good, the income effect on labor supply is negative (more
          income means more leisure which is less labor) while the substitution effect
          is positive (a higher relative price means less leisure which is more labor),
          thus the overall effect is ambiguous


2    Fixed Labor Supply
    • historically, in the analysis of optimal income taxation, labor supply has often be
      assumed to be constant
    • obviously, and as mentioned before, with fixed labor supply there is no efficiency
      loss due to taxation because households can’t change their behavior: the income
      tax is a lump-sum tax
    • in this case, “the other” objective of society can be reached perfectly: if utilities
      have equal weight in the welfare function, disposable incomes are equalized

                          =   ( )
                       L( 1 , 2 ) = W( 1 , 2 ) + λ(z1 −   1   + z2 −   2   − R)       (79)
                       ∂L     ∂W ∂
                            =        −λ=0
                       ∂      ∂ ∂
                       ∂ 1    ∂ 2
                            =
                       ∂ 1    ∂ 2
                         1 = 2

    • this implies that the marginal tax rate for household 2 is 100% and for household
      1 is 50%: T2 = 1, T1 = 0.5
    • if the revenue needs are bigger than twice the difference of the households in
      gross income, T1 becomes negative


3    Variable Labor Supply
    • general idea
        – there are two households that differ in their wages and thus in their choices
          of consumption and labor supply
        – this assumption is equivalent to households that differ in the relative prices
          of two consumption goods
        – since the analysis involves equity issues, a social welfare function is maxi-
          mized
        – any income taxation is allowed: tax burden and marginal rate can be set
          freely for each individual household
        – this very flexible assumptions cause strange results for the optimal tax func-
          tions (such as zero marginal rates with positive tax burden is hard to imagine
          and doesn’t exist empirically)
    • central findings
        – if labor supply is endogenous, the tax structure has to be set in a way that
          the high wage household doesn’t behave like the low wage household (mi-
          mic), since this can’t be optimal
        – the high wage household is not taxed at the margin, while the low wage
          household faces positive marginal taxes
        – incomes are not equalized, but income gaps are reduced
43                                  Lion Hirth: Taxation



3.1    Model

     • objective function
         – the government’s preferences are captured by a Bergson-Samuelson social
           welfare function W = W( , 2 )
         – the concavity of the function can be interpreted as the society’s aversion
           against inequality
         – determining the marginal conditions for Pareto efficiency would give qualita-
           tively the same results
     • two households
         – the two households differ in their productivity and thus in their wages, their
           labor supply and their incomes: 2 > 1 > 0, z = L
         – utility is a function of consumption and labor and utility functions are equal:
              = ( ,L ) = ( ,z / )
         – totally differentiating the utility function shows that the indifference cur-
           ves are increasing in the consumption-income space, that they are convex
           and that the high productivity household has a flatter indifference curve:
           d / dz = − / ( L )
     • tax system
         – wages are the only income source and households are taxed regarding to
           their gross income only; commodity taxes are not feasible
         – the government can neither observe the type (productivity) of the house-
           hold, nor the labor supplied, only gross earnings
         – thus taxes are based on gross income only: T = T(z )
         – T is weakly increasing in z : T ≥ 0
         – gross income is z =       L , all income is spend on one consumption good (or
           a bundle), so that = z − T(z )
         – the income tax structure is not specified, thus the budget constraint in for-
           mulated as R = z1 − 1 + z2 − 2
         – that effectively allows for lump-sum taxation, since it allows for positive tax
           burden in combination with zero marginal tax
         – the only remaining problem is that not each household can be taxed in a
           lump-sum manner individually, since household characteristics cannot be
           observed (only gross income)


3.2    Self-Selection Constraint

     • it is assumed that the welfare function is convex, that is, government tries to
       equalize disposable (after-tax) incomes; this rules out a Pareto-efficient tax sys-
       tem
     • households have two ways to reduce tax burden
          – reduce labor supply to reduce gross income and thus taxes (only attractive
             if taxes are strictly increasing with income)
          – mimic the other household to benefit from its lower taxes (only attractive if
             taxes are lower for the other household)
          – both changes of behavior reduce welfare since they occur only to save taxes
          – optimizing welfare can be understood as finding an efficient trade-off bet-
             ween these two negative effects
     • mimicking cannot be efficient
          – if a household is mimicking, it is fixing its income, and thus also its labor
             supply
                                Optimal Income Taxation                            44



     – that means, they can’t be a inner solution of utility maximization and thus
       they can’t be optimal
     – graphically, the mimicking household won’t have the utility function tangent
       at the budget constraint
     – in the case of mimicking and marginal rates of zero, taxes are effectively
       lump-sum, since there is no way to change behaviour that would reduce the
       tax burden
     – refraining the household from mimicking results in a Pareto-improvement
     – graphically, starting from a mimicking solution and moving household 2
       along its (flatter) indifference curve to the right leaves both households un-
       change in utility but increases tax revenues
     – in other words, a “self-selection equilibrium” is always Pareto-superior to a
       “pooling equilibrium” (this is only true in the case of two households)
• only the high-wage household 2 has an incentive to mimic
     – T2 ≥ T1 is assumed in the model (above)
     – household 1
         ∗ for household 1, mimicking household 2 would mean supply more labor,
            that is a loss in utility
         ∗ the additional consumption cannot compensate for this loss if the original
            values were chosen optimally
         ∗ in addition, her tax payment (weakly) increases, that is an additional
            decrease in utility
         ∗ overall utility has to decrease: there are no incentives to mimic
     – in contrast, household 2
         ∗ for household 2 there is a reduction in consumption when mimicking
            household 1
         ∗ that is insufficiently compensated by a reduction in labor
         ∗ but there might be a gain due to lower tax payments that makes the net
            effect of mimicking on utility positive
     – → thus, only household 2 has incentives to mimic
• Stiglitz (1987) argues like this:
     – maximizing a special utilitarian welfare function requires that the marginal
       utilities in consumption are equal for both households ( 1 = 2 )
     – the marginal rate of substitution between consumption and leisure has to
       equal the wage (good price is normalized to unity): L /           =   (that is,
       consumption and thus incomes are equalized)
     – this implies that the high productivity household 2 has a higher marginal
       utility of leisure, which in turn implies that he supplies more labor
     – that is, household 2 is actually worse off (in absolute terms)
     – then, obviously, he has an incentive to mimic household 1
     – “Jeder nach seinen Fähigkeiten, jeder nach seinen Bedüfnissen” (Karl Marx
       in the “Kritik des Gothaer Programms”)
• the relevant self-selection constraint (SSC) thus is:
                                 ∗
                                z2                  ∗
                                                   z1                   ∗
                                                                       z1
                          ∗                  ∗               m   ∗
                     2    2
                            ,        ≥   2   1
                                               ,        :=   2   1
                                                                   ,             (80)
                                 2                  2                   2

• in the following analysis the SSC is assumed to be binding; for this to hold the
  welfare function has to be sufficiently convex (that is to weight equality heavily)
• if the SSC were not binding, lump-sum taxation without distortions would be
  possible (indeed, in a limited range this is the case, a sufficient convex welfare
  function makes sure that the welfare optimum doesn’t lie in this area)
45                                                Lion Hirth: Taxation



3.3      First Results

     • some results can be obtained before starting the formal analysis
     • the SSC is binding by assumption (that is, λ1 below is positive)
     • the same is true for the budget constraint (λ2 is also positive)
     • marginal tax rates at 100 percent or higher can’t be optimal: in this case house-
       holds are strictly better off when reducing their labor supply and thus utility and
       tax revenues are reduced
     • negative marginal tax rates cannot be optimal either: direct transfers (uncondi-
       tional subsidies) are Pareto-improving, since households don’t have to increase
       their labor supply (above the optimum) and revenues are unaffected
     • a tax burden of more than the gross income doesn’t make sense since the assu-
       med no taxation of zero income
     • a negative tax burden might be optimal


3.4      Optimization

     • welfare is maximized under the self selection constraint and the government’s
       budget constraint

                                                                  m
                   L =W(    1,       2 ) + λ1 ( 2             −   2
                                                                    ) + λ2 (R − R)
                                              z1                       z2
          L( , z , λ ) =W       1        1,               ,   2     2,
                                                  1                      2
                                                      z2            m
                                                                            z1
                         + λ1        2        2,              −     2    1,        + λ2             z1 −   1   + z2 −   2   −R
                                                          2                   2
                                                                                                                             (81)

     • optimizing results in four FOCs (plus the constraints)

                                                                         ∂   m
                                    ∂L        ∂W ∂            1              2
                                         =                        − λ1            − λ2 = 0                                  (82a)
                                ∂ 1           ∂ 1∂            1          ∂   1
                                                                              ∂   m
                                ∂L            ∂W ∂            1   1               2
                                                                                       1
                                         =                             − λ1                    + λ2 = 0                     (82b)
                                ∂z1           ∂ 1 ∂L1              1             ∂L1       2
                                ∂L            ∂W ∂ 2                     ∂   2
                                         =                        + λ1           − λ2 = 0                                   (82c)
                                ∂ 2           ∂ 2∂            2          ∂   2
                                ∂L            ∂W ∂            2   1           ∂   2    1
                                         =                             + λ1                    + λ2 = 0                     (82d)
                                ∂z2           ∂       2   ∂L2      2          ∂L2      2

     • note that increasing the net or the gross income of household 1 is costly (is
       making the SSC more binding) since it makes mimicking more attractive for hou-
       sehold 2
     • increasing net or gross income of household 2 in turn relaxes the SSC
     • obviously, increasing is costly in terms of foregone government revenues, whi-
       le the opposite is true for increasing z
     • under the specific budget constraint used here (taxes defined as difference bet-
       ween gross and net income), the tax structure is only defined implicitly in the
       optimality conditions
     • the conditions don’t define the entire tax function, but only characterize the
       conditions at the two points the households will choose
                                  Optimal Income Taxation                                              46



• solving the FOCs for household 2 gives:
                                                           2L
                                                                   = −1
                                                       2     2
                                                           2L           2
                                                   −               =                                  (83)
                                                           2           1
    – this is the well known result that the MRS (the ratio of marginal rates of two
      consumption goods) has to equal the ratio of prices
    – comparing to (77d) shows that this holds only for zero marginal income tax
      T2 = 0 (we talk about the marginal tax rate at the point although, as men-
      tioned above, the tax function will be in general not differentiable)
    – this is in line with the often derived result that any positive marginal income
      tax reduces labor supply
    – this result is identical to the maximization of income taxes for only one hou-
      sehold holding its utility fixed (maximizing the vertical line between a indif-
      ference curve and the 45 degree line in the z-x-space)
• the optimality condition for household 1 is less intuitive and pretty messy. Again,
  we use (77d)
                                              λ1                   1+           1L

                                                       =       m
                                                                            1
                                                                                m
                                                                                     1
                                                                                                      (84)
                                          W    1               2   1+           2L
                                                                            m
                                                                            2        2

                                              1L
                                  1+                   = 1 + (T1 − 1) = T1                            (85)
                                          1        1

    – since all other five terms in (84)are positive, the right numerator (1+( 1L / 2                         1 ))
      has to be positive, too
    – this means that household 1 is taxed positively at the margin
    – in contrast, if the SSC were not binding, λ1 would be zero and the marginal
      tax rate would be zero, too
• combining the FOCs of both households we can see that
                                                                        ∂       m
                      ∂W ∂        1       ∂W ∂         2                        2
                                                                                         ∂   2
                                      −                     = λ1                     +           >0   (86)
                      ∂   1   ∂   1       ∂    2   ∂   2                   ∂    1        ∂   2

    – given plausible assumptions about the welfare function and the utility func-
      tions (as in the las sub-section; marginal utilities have the same weight,
      decreasing utility of consumption) this implies that 2 > 1
    – that means, incomes are not equalized
• results and interpretation
    – the marginal income tax for the high productivity household is zero
      while the marginal rate for the low productivity household is positi-
      ve
    – this is because there is a trade-off: higher marginal tax rates for household 1
      decrease its labor supply but at the same time relax the SSC and thus allow
      higher absolute taxation of household 2 (without inducing mimicking), but
      income gaps are reduced
    – disposable incomes are not equalized (since this would cause mimicking)
    – if there is no problem with mimicking (meaning the SSC isn’t binding), there
      is no trade-off and both household are not taxed marginally
    – the whole analysis is closely related to perfect price discrimination of a mo-
      nopolist
47                                     Lion Hirth: Taxation



4        Continuous Households
     •   the results derived for two households are not general
     •   it’s hard to derive any results for a large-n or continues households model
     •   one reason is that for n households there are (n − 1)! SSC that have to hold
     •   it is not the case that a general rule of decreasing marginal tax rates can be
         derived
     •   in contrast, T behaves in general non-monotonically and is not differentiable
     •   moreover, often a partial pooling equilibrium is optimal
     •   often, quasi-linear preferences are assumed, but even for this special case, very
         little can be said (often cited as the main result, for example, is that the highest
         productivity household shouldn’t be taxed at the margin)
     •   under these preferences, very low incomes won’t be taxed marginally either
     •   the optimal marginal tax T rate depends of four variables and is lower
             – the larger the fraction of the population that pays that marginal tax rate
             – the smaller the shadow price of the SSC λ (which is hard to analyse, and in
               the case of utilitarian welfare function first increases and then decreases).
               This result implies that the marginal rate for the highest earning household
               should be zero.
             – the higher the wage of the tax payers at that marginal rate
             – the more elastically labor supply responds
     •   mathematical problems
             – a convex tax function (which is at least over a region the case if negative tax
               rates for poor households are negative) induces randomized wage payments
             – circumstances where the tax function is non-differentiable correspond preci-
               sely to those where a partial pooling equilibrium is optimal
             – if the tax function is partially convex (and since utility functions are also
               convex), there might be multiple tangencies
     •   nevertheless, Mirrlees estimated his model empirically for the US and calculated
         a optimal tax function that was remarkably close to linear


5        Different Tax functions
     • this section of the script draws almost exclusively on Siglitz (1987)
     • the tax function T, or “tax schedule”, relates before-tax (gross) to after-tax (net)
       income
     • one of the central lessons of the last decades of taxation literature is the lesson
       that what is optimal depends crucially on the assumption of what types of taxes
       are allowed (compare the different results of Ramsey-type analysis - no lump-
       sum taxes allowed - and Stiglitz-type analysis, where only household-specific
       lump-sum taxes are not allowed for)
     • so far, we have at the same time limited the tax function strongly and allowed it
       to be very flexible
          – the tax function was limited because we allowed it only to be a function of
            the wage income: T = T( L)
          – this excluded, for example, random taxation
          – the tax function was very general because any functional form was allowed
            (indeed, it was shown that the optimal income tax is highly non-linear)
          – in real world, we observe tax functions that a much more simple, either
            for practical reasons (administration, collection and monitoring costs) or for
            political reasons (negative marginal rates for high income earners wouldn’t
                               Optimal Income Taxation                               48



         be easy to argue for in a democracy)
  • in this section, both generalizations and restrictions are discussed



5.1   Random Taxation

  • in ex ante randomization the government assigns individuals randomly to one of
    two tax functions
  • in ex post randomization the individuals are assigned only after they have an-
    nounced their productivity
  • ex ante randomization is always beneficial if the welfare as a function of tax
    revenues W(R) is convex
  • ex post randomization is beneficial, for example, if household 2 is much more
    risk averse than household 1
  • the chance of loosing much (when paying the high taxes) makes it for household
    2 less attractive to mimic (because earning little and paying much is a scaring
    scenario for household 2 while not so much for household 1)
  • note that if individuals are risk averse, ex post randomization has the costly
    effect of introducing risk on both households
  • random taxation violates the principle of horizontal equity
  • one interesting result of the debate is that the principle of horizontal equity may
    in fact be inconsistent with Pareto efficiency (which rises doubts on Pareto effi-
    ciency, too)
  • a second lesson is that it is not a trivial question what the set of available taxes
    is (and results depend crucially on this decision)



5.2   Linear Tax

  • not only empirically observed income taxes are much more simple than the op-
    timal highly non-linear optimal tax, most debate in recent years has been about
    simplifying it further (both in Germany and the US, flat taxes (linear income ta-
    xes) are discussed)
  • problems of non-linear taxes
       – income averaging (for example, intertemporally or between couples) comes
         an issue
       – the unit of taxation becomes important
       – as noted above, decreasing marginal rates (convex tax functions) provide
         incentives to pay random wages
       – taxing at the source is much more difficult (if there is more than one source
         of income)
       – administrative costs
       – for these reasons and the political feasiblility of highly non-linear or random
         taxation it might be reasonable to focus on linear taxes
  • optimal linear income tax
       – compared to other issues of the subject, this problem is a fairly simple one
       – all households receive a lump-sum payment and pay a marginal rate T on
         all income z
                                         = + (1 − T )z                             (87)

       – for a continuum of households with the distribution F( ) along their produc-
49                                   Lion Hirth: Taxation



           tivity, the optimization problem is as follows:

                 L=    W( ( (1 − T ), ))dF( ) + λ            T L( ) dF( ) − − R       (88)

     • three general results
         – for R = 0, the optimal tax entails > 0, which implies that T > 0: the dead
           weight loss due to marginal taxation is overcompensated by the welfare gain
           due to income redistribution (obviously, the first result holds also for R ≤ 0)
         – if R is very large, becomes negative (and is R becomes very large, so that
           an increase of T decreases revenues, has to generate all additional income
           needs)
         – the optimal income tax can be written in a remarkably simple formula:
                                    T           co (W    λ
                                                             +T    L , Y)
                                           =−                                         (89)
                                   1−T                   YεH dF
                                                           L

             ∗ W     / λ is the net social marginal value of income: marginal utility of
               income relative to marginal value of government revenues multiplied
               with the marginal welfare of utility
             ∗ L is the change of labor supply due to a change in lump-sum payments:
               that is, how labor supply reacts to a pure income effect (sign is not de-
               termined in general)
             ∗ εL is the compensated elasticity of labor supply
             ∗ the covariance can be seen as a marginal measure of inequality
             ∗ thus, the marginal tax rate should be higher for a larger measure of
               marginal inequality and for a smaller weighted average of compensated
               elasticity of labor supply


6     Additional Generalizations
6.1    Home Production

     • home production is modeled here as a final consumption c good that is produced
       with labor and a commodity (where labor is either used for market production to
       buy the commodity or for home production)
     • productivity in home production h differs from market productivity     , but is
       proportionally related: h = k
     • the trick in the model is that the home production function (Cobb-Douglass) is
       set in such a way that besides this relationship (h = h( )) there is no way
       to interfere from labor supply decision to home productivity (again: this is an
       artifact of the production function used)
                                    α                    α α
                             c=h        (1 − L)1−α = h       L (1 − L)1−α             (90)
                                                L∗ = α
                                                                                      (91)
     • for a certain range of α, this implies redistribution towards the high productivity
       individuals
     • intuition ???
     • Stiglitz argues that since the is a much stronger social agreement on redistribu-
       tion than what utilitarian ethic implies, the utilitarian approach is a questionable
       guide to policy
                                      Optimal Income Taxation                             50



6.2    Tax shifting

    • much of the traditional tax theory (and part II of this scriptum) has dealt with tax
      incidence, that is tax shifting
    • this has been ignored completely in the analysis of optimal taxation so far
    • implicitly it was assumed that before-tax prices and wages are not effected by
      taxation
    • if we allow for tax shifting (that is, endogenize the before-tax incomes), there is
      a new channel for redistribution!
    • we can not only use taxes to change disposable incomes by transferring income
      from one to another, but by changing the market outcome in the first place
    • it results that if the labor households of different productivity aren’t perfect sub-
      stitutes (which seems to be a plausable assumption), the marginal tax rate on
      the most productive household should be negative
    • the smaller the elasticity of substitution, the higher the marginal tax rate for the
      low productivity household (although it is always positive)
    • that is, the less substitutable different types of labor are, the more the govern-
      ment relies on general equilibrium effects for redistribution (in the extreme case
      of perfect substitutes, general equilibrium effects cannot be used and we’re back
      in the standard analysis)




7     Commodity Taxation in a Atkinson-Stiglitz framework
    • the theory of the second best implies that it might be beneficial to introduce a
      second distortion (by taxing commodities differently) to counterbalance the first
      distortion (due to income taxation / mimicking)
    • to make the analysis interesting, we have to introduce a second good and allow
      for differentiated taxation
    • the budget constraint reads R = z1 + z2 − 1 − 2 − 1 − 2
                                                   1   1      2   2
    • that means, the tax structure is not restricted at all: everything is allowed for
        – “shopping center entrance fee”-taxes (zero marginal commodity taxes that
           are positive in absolute value)
        – differentiated taxes on the same good for different households
        – commodity taxes that depend on the consumption quantities of this good or
           other goods, too
        – mimicking applies to commodity taxes, too: if household 2 behaves like hou-
                                     1     1
           sehold 1, she also pays t1 and t2
    • the government maximizes welfare:

                                                             m
                              L =W(       1,   2 ) + λ1 ( 2  2
                                                              − ) + λ2 (R − R)
                      j                              z1                   z2
                  L( , z j , λ )   =W    1
                                             1
                                             1
                                               , 1,
                                                  2
                                                           , 2      2
                                                                    1
                                                                      , 2
                                                                        2
                                                       1                    2
                                                           z2                       z1
                                    + λ1   2 2
                                                    2
                                                    1
                                                      , 22
                                                                 − m  2
                                                                           1
                                                                           1
                                                                             , 1,
                                                                               2
                                                            2                        2
                                    + λ2 z 1 + z 2 − 1 − 2 − 1 − 2 − R
                                                       1      1     2     2
                                                                                         (92)


    • the six FOCs (plus the constraints) resemble the condition of the one good ana-
51                                                  Lion Hirth: Taxation



       lysis in (82):

                                                                             ∂   m
                                ∂L              ∂W ∂         1                   2
                                    1
                                        =                    1
                                                                 − λ1            1
                                                                                      − λ2 = 0                     (93a)
                                ∂   j
                                                ∂    1   ∂   j               ∂   j
                                                                             ∂   m
                                ∂L              ∂W ∂         2                   2
                                    2
                                        =                    2
                                                                 + λ1            2
                                                                                      − λ2 = 0                     (93b)
                                ∂   j
                                                ∂    2   ∂   j               ∂   j
                                                                                  ∂ m
                                ∂L              ∂W ∂         1   1                  2
                                                                                              1
                                        =                                − λ1                     + λ2 = 0         (93c)
                                ∂z 1            ∂ 1 ∂L1 1                         ∂L1         2
                                                                                  ∂ m
                                ∂L              ∂W ∂ 2 1                            2
                                                                                              1
                                        =                                + λ1                     + λ2 = 0         (93d)
                                ∂z 2            ∂    2   ∂L2         2            ∂L2         2

     • again, increasing j increases welfare while increasing zj reduces it due to higher
       labor supply; increasing 1 or z 1 tightens the SSC while increasing 2 or z 2 rela-
       xes it; and increasing j tightens the BC while the opposite is true for increasing
       zj
     • combining the three optimality conditions for household 2 results in

                                                         ∂   2           ∂   2
                                                             2
                                                                 =           2
                                                                                 :=       2                         (94)
                                                         ∂   1           ∂   2
                                                                     2L               2
                                                                 −           =                                      (95)
                                                                     2            1

         – the first formulation states that marginal utilities for all goods have to be the
            same, as derived in the model with one good in (3.4)
         – this implies that goods for household 2 can only be taxed at the same mar-
            ginal rate (which might or might not be zero)
         – the second condition implies that income cannot be taxed at the margin
            (recall (77d))
     • to interpret the conditions for household 1, we have to assume identical and
       separable utility functions: (h ( j ), z / )

                                                                                  ∂   m
                          ∂h1                                                                         ∂h1
                                        1            W           1           λ1       2
                                                                                          + λ2              + λ2
                          ∂ 1               1                1       1            ∂   1
                                                                                                      ∂ 1
                            1
                                =       1
                                                =                1
                                                                         =            1
                                                                                      m           =     1
                                                                                                                    (96)
                          ∂h1                                                     ∂                   ∂h1
                                            2        W       1       2       λ1       2
                                                                                          + λ2              + λ2
                          ∂ 1
                            2                                                     ∂   1               ∂ 1
                                                                                                        2
                                                                                      2


         – the equation has to be read from inner to outer equality signs
         – the left and the right expression are only equal if the MRS is not affected by
           taxation, which implies that both commodities have to be taxed at the same
           marginal rate for household 1
     • results and interpretation
         – there is no need for commodity taxation if utility functions are iden-
           tical and separable
         – this is because commodity taxes cannot help making the SSC less binding
         – the results derived for optimal income taxation hold: T1 > 0 and
           T2 = 0
         – if the assumption of identical and separable utility function is dropped, howe-
           ver, the good valued highly by household 1 should be taxed higher
           when consumed by household 1
                           Optimal Income Taxation                            52



     – This makes mimicking less attractive since it makes the labor-consumption
       bundle of 1 less attractive for 2.
• note that the result for Ramsey-type commodity taxation analysis are completely
  different from the results of this (Atkinson-Stiglitz) type of analysis
• this comes only from the fact that Ramsey excluded all lump-sum taxes while
  here only household-specific lump-sum taxes are excluded
53                                   Lion Hirth: Taxation



V      Tax Evasion

     • Tax evasion, tax avoidance and change of relative prices
         – tax evasion is often defined as “violations of the law” or “illegal and intentio-
           nal actions to reduce tax obligations” while tax avoidance is changing one’s
           behavior to reduce taxes within the legal framework
         – but the distinction - and the separation from normal consumption and input
           adjustments due to changing relative prices - is not that clear-cut
         – the boarder between legal and illegal in taxes is often a bargaining process
           and determined by courts
         – avoidance defined as “behavior that reduces taxes while leaving the con-
           sumption basket unchanged” (as some authors do) runs into problems if the
           income effect of tax avoidance causes a change in relative quantities con-
           sumed
         – here, tax evasion is defined as being risky: not being observed reduces tax
           payments while being caught means paying more taxes than initially obliga-
           ted
     • costs of tax evasion
         – in the literature tax evasion is generally seen as a bad (welfare reducing)
           action
         – direct costs are caused by the reduction of provision of public goods that
           reduces welfare of all consumers (while the evader’s utility increases)
         – (obviously, if the provision of public goods was excessive, evasion might
           increase welfare)
         – both sides - tax authorities and evaders - spend real resources to detect
           evasion and prevent evasion, respectively
         – tax authorities have to adjust the tax system to prevent evasion; this
           means another constraint is added that moves the system further away from
           Pareto efficiency and welfare optimum
         – tax evasion causes by definition uncertainty, which reduces welfare in a
           world of risk-averse households
         – further negative effects might arise, say eroding belief in the legal system or
           negative consequences on the political culture or on the voluntary provision
           of local public goods (of course, these are not further investigated here)
     • measurement and empirical estimations
         – inherently hard to measure due to incentives not to declare evasion openly
         – in the US, the tax authorities estimate that 16% of the legal tax burden is
           evaded
         – from this, only 16% is detected and recovered
         – 80% of evasion comes through underdeclaration of incomes, the rest through
           overdeclaration of expenditures
         – only 1% of taxes on wages and salaries are evaded, but 43% of business
           incomes
     • legal situation in Germany
         – there is a distinction between “Steuerstraftaten” and “Steuerordnungswid-
           rigkeiten”
         – prison sentences are hardly ever used (although there stands up to 10 years
           for professionally committed tax evasion): the highest penalty was probably
           3.5 years for Steffi Graf’s dad Peter
         – the maximal penalty is 1.8 million euros (360 “Tagessätze” times 5000 eu-
           ros)
                                           Tax Evasion                                    54



        – fines are set by judges, which means that there is no clear-cut rule for the
          fine (in contrast to the model employed below)


1    Basic model
    • tax evasion is modeled as a rational gamble of risk-averse households (tax eva-
      sion of firm is modeled only slightly different since they are often assumed to be
      risk-neutral)
    • household pay a linear tax of the form T(y) = (y−t0 )t, where to is a tax exemption
      and t is a constant tax rate (linear tax)
    • households decide on the fraction α of their income that they declare
    • if not caught, they receive income y e = y − T(αy) = y − (yα − to )t
    • they are detected with probability z (first assumed as exogenous, later endoge-
      nized)
    • if detected, they have to pay the full amount of taxes plus a fine (Fy(1 − α)t β )
    • the fine depends both in the income not declared (for β = 0) and the amount of
      taxes evaded (β = 1) (but is not a linear combination)
    • if being detected, income is y d = y − T(y) − Fy(1 − α)t β
    • risk aversion is modeled by assuming strictly concave utility functions
    • the household maximizes expected utility (von Neumann-Morgenstern are assu-
      med) with respect to α
                EU = (1 − z) (y e ) + z (y d )
                EU = (1 − z) (y − (yα − to )t) + z y − (y − to )t − Fy(1 − α)t β        (97)
                ∂EU
                    = −(1 − z) (y e ) + z (y d )Ft β−1 = 0                              (98)
                 ∂α
    • the cost of declaring more taxes is a higher tax payments if not detected (first
      term), the gain of declaring more is a lower fee in the case of detection (second
      term)
    • corner solutions
        – α is set to unity (this implies e = y d ), if (zFt β−1 + z − 1) ≥ 0
        – this implies that a high detection probability z or a high fine F lead to decla-
           ration of the full income (which is pretty intuitive)
        – other corner solution (α = 0) cannot be derived nicely
    • the second derivative is always negative for a linear tax
    • totally differentiating (97) gives us the indifference curve in the y d − y e -space
                                    dy d        (1 − z)   (dy e )
                                           =−                       <0                  (99)
                                    dy e          z   (dy d )
    • the indifference curve is always decreasing (and strictly convex if households are
      strictly risk averse)
    • graphical analysis
         – this indifference curve is convex if households are strictly risk averse
         – the feasible combinations of y e and y d are a line in the y d − y e -space (“fea-
           sability line”)
         – for t0 = 0, the line has the slope −Ft β−1
         – it is a line (constant slope) because both y e and y d are linear in α
         – in the case of (risk-neutral) firms the “indifference curve” is linear, so that
           the feasibility constraint has to be modeled concavely (by a convex fine
           function)
55                                      Lion Hirth: Taxation



2     Comparative Statics
     • in this section, we analyze how α ∗ changes when F, z, y, t and to changes
     • that is, we are interested how the voluntarily declared fraction of the income
       varies for changes in parameters
     • analytical procedure
          – we want to solve for α ∗ and differentiate the term with repect to F, z, y, t and
            to
          – but since we haven’t specified the utility functions, we cannot solve for α ∗
            (this is a pattern that shows up over and over again when working with
            unspecified function - see section (1), but there we had a system of FOCs)
          – instead, we differentiate ∂EU/ ∂α
          – for a change in a parameter, the EU-function changes
          – we ask: what is the slope of the new EU-function at the point α ∗ ?
          – if it is positive, the new α ∗ has to lie to the right, that is, α ∗ raises (the
            opposite is true if the slope is negative)
          – we always assume an interior solution (0 < α < 1)
     • central findings
          – pretty obviously, α ∗ rises for a higher F and z
          – the effect of a rising y, t or t0 depends on the risk aversion and is
            often ambigeous


2.1    Fine F

     • differentiating (98) with respect to F results in:

                       ∂(∂EU/ ∂α ∗ )
                                       = zt β       (y d ) −     (y d )Fy(1 − α)t β > 0   (100)
                            ∂F
     • a higher fine increases the gains of honesty because of two effects
          – the income loss is higher if detected (first term in brackets)
          – the marginal gain from more honesty rises because income is reduced more
            if detected (second term in brackets) - this is because strict risk aversion has
            been assumed
          – the costs of honesty (first term in (98) are unaffected, since higher fines
            have no effect if being not detected
     • graphically, an increase of F is increasing the slope of the feasability line
     • differentiating (97) with respect to F shows that the level of expected utility is
       reduced unambiguously - this makes perfectly sense since there is no way how
       a higher fine could make the household better off
     • empirical testing is hard since here incentive effects of fines matter - and most
       people don’t know how big the fines are
     • Result: an increase in F rises both α ∗ (the share) and yα ∗ (the amount)
       of taxes declared


2.2    Detection Probability z

     • differentiating (98) with respect to z results in:

                             ∂(∂EU/ ∂α ∗ )
                                                =     (y e ) +    (y d )Ft β−1 > 0        (101)
                                   ∂z
                                          Tax Evasion                                         56



  • a higher detection probability increases the gains of honesty and reduces the
    costs of honesty
       – the gain (paying a smaller fine) is more probable
       – the cost (higher tax payment) is less probable
  • graphically, an increase of z is making the indifference curve flatter, as differen-
    tiating (99) makes clear
  • differentiating (97) with respect to z shows that the level of expected utility is
    reduced unambiguously - there is no way how more controls could make the
    household better off
  • empirical testing is hard since here incentive effects of fines matter - and most
    people don’t know how big the probability of controls and detection are
  • further, people don’t even know on what z depends (high incomes, funny decla-
    rations, randomness, ...) - and tax authorities don’t reveal it to not support tax
    evasion
  • Result: an increase in z rises both α ∗ (the share) and yα ∗ (the amount)
    of taxes declared


2.3   Income y

  • recall that y was exogenously given
  • the effect of a change in y on α ∗ is a lot less obvious than the effect of changing
    F or z
  • differentiating (98) with respect to y results in an ambigeous result:

      ∂(∂EU/ ∂α ∗ )
                      = −(1 − z)    (y e )(1 − αt) + z   (y d )(1 − t − F(1 − α)t β )Ft β−1 (102)
           ∂y

  • we substitute for Ft β−1 according to (98) and extract (1 − z) (y e ) to get
                                                                                    
                                    (y e )              (y d )
      EUαy = −(1 − z) (y e ) −            (1 − αt) − −        (1 − t − F(1 − α)t β )
                                   (y e )               (y d )
            = −(1 − z)      (y e ) r (y e )(1 − αt) − r (y d )(1 − t − F(1 − α)t β )      (103)

  • absolute risk aversion
       – r = − / is the Arrow-Pratt measure of absolute risk aversion
       – if r is constant or rising with income, α ∗ will rise with income
       – if r falls with income, the change of α ∗ is not determined
  • relative risk aversion
       – to get the relative measure of risk aversion rr = yr , we multiply with y
         and set t0 to zero (so that the terms in parenthesis collapse to y e and y d ,
         respectively
                              yEUαy = (1 − z) (y e )[rr (y e ) − rr (y d )]     (104)
      – if rr = −y / is increasing with income, α ∗ will be rise with income
      – if rr is constant with income, α ∗ will be unchanged if income is changed
      – if rr is decreasing with income, α ∗ will be fall with income
  • Results
      – the effects of a rising income on the share of taxes declared depend on the
        change of risk aversion due to the income rise
      – for increasing risk aversion, the share will increase
      – for decreasing risk aversion, the share will decrease
57                                      Lion Hirth: Taxation



         – for constant risk aversion the result depends on how we measure risk
           aversion: constant absolute risk aversion implies a increasing share, con-
           stant relative risk aversion means a constant share
         – the results do make some sense: if risk aversion increases, households are
           less willing to gamble and thus declare a higher share voluntarily to tax
           authorities
         – the opposite is true if risk aversion decreases


2.4    Tax rate t

     • the linear tax system can be changed by changing the tax exemption to or by
       changing the marginal tax rate t
     • here, a change of t is analyzed, in the subsequent subsection t0 is changed
     • changing t and t0 simultaneously in such a way that R remains constant can be
       interpreted as a change of the progressivity of the tax system
     • both the effects of a change of t0 and even more the effects of a change of t
       depend how the fine is defined, that is, how big β is

        ∂(∂EU/ ∂α ∗ )
                        =(1 − z)     (y e )(αy − to )
             ∂t
                         − zFt β−1      (y d )(y − t0 + y(1 − α)βFt β−1 + zFt β−2        (y d )(β − 1)
                                                                                                  (105)

     • income and substitution effect
         – there is an income and a substitution effect
         – income effect: rising the tax rate implies that income changes in both states
            of the world, and that means that marginal utilities change
         – substitution effect: for β < 1, the fine payment is proportional to tβ < t, while
            the utility loss due to a higher declaration is proportional to t
     • further restrictions: no tax exemption and β = 1
         – restricting β to 1 implies that the fine is a function of taxes evaded only:
            Fy(1 − α)t
         – this also implies that there is no substitution effect, since both fine and utility
            loss increase proportionally to t
         – mathematically, the double restriction is needed to relate       to the FOC (98),
            so that we can derive the Arrow-Pratt measures of risk aversion
         – with t0 = 0 and β = 1, (105) simplifies to

                         EZαt = (1 − z)     (y e )(αy) − zF    (y d )y(1 + (1 − α)F)
                              = (1 − z)y     (y e ) (1 + (1 − α)F)r (y d ) − r (y e )α          (106)

     • Results
         – for β = 1 and to = 0, a higher tax t will cause the share α ∗ to rise if absolute
           risk aversion is decreasing or constant
         – for β = 1 and to > 0, a higher tax t will cause the share α ∗ to rise if absolute
           risk aversion is constant (if it varies, nothing can be said)
         – if β = 0, the effect of a higher t on α ∗ is ambiguous


2.5    Tax exemption

     • a change in t0 is very similar to a change in income y
                                  Tax Evasion                                   58



• for increasing absolute risk aversion, the share α ∗ rise with increasing tax ex-
  emption t0
• for constant absolute risk aversion, it will will remain constant
• for falling absolute risk aversion, it will wall with rising tax exemption
59                                   Lion Hirth: Taxation



VI      On this script
     • this script was written during the winter term 2007/08 and reflects the structure
       of the lecture during this term
     • it should be understood as a complement to Goerke’s lecture notes and the ad-
       ditional literature rather than a substitute: my idea was to compile a short sum-
       mary than can be used in class and to look up formulas and results quickly
     • I highly recommend to read the following additional literature
          – Stefan Homburg’s (2007) Allgemeine Steuerlehre should be read in advan-
             ced to get some intuition and empirical examples (it’s easy reading, and it’s
             in German)
          – Lipsey and Lancester’s (1956) short article in the Review of Economic Stu-
             dies, The General Theory of the Second Best should be read before starting
             with optimal taxation theory
          – Joe Stiglitz’ (1987) article in the Handbook of Public Economics should be
             read after working through section IV of the lecture since it is rather technical
             and has a broader scope; nevertheless it is incredibly rich and helpful
          – the books by Salanie (2003), Kotlikoff & Summers (1987), and Myles (1995)
             are pretty technical and didn’t help me too much
     • I tried to stick as closely as possible to the notation used in class, but sometimes
       I do deviate (with some justification, I believe):
          – more often than not, household-specific items (goods, factors) are indexed
             with a superscript while firm-specific items are indexed with subscripts. I did
             this to avoid double subscripts as much as possible.
          – sometimes I skip indexes at all
          – sometimes I use 2 households (or firms) instead of n
     • the ordering and naming of sections and subsections is close to the structure of
       the lecture, but not identical
     • the script is written in TEX, the code is available on request at lion.hirth@gmail.com
     • hyperref-features are included in the .pdf version, so you can jump to sections
       and equations by clicking on the numbers, and can use the tree to jump to sec-
       tions quickly
     • the script is by far not free from errors, ranging from typos and bad translations to
       layout problems and fundamental misunderstandings; if you find errors, please
       write me an email!
     • the layout is optimized for printing out two pages on a sheet with odd pages on
       the left
     • This document is published under GFDL. That means you can do whatever you
       want with it (copy it, change it, distribute it, ...) as long as your work is released
       under the same open license again.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:20
posted:9/13/2011
language:English
pages:59