IRS Publication #1299

Reviews
Shared by: Ryan Colwell
Categories
Stats
views:
108
rating:
not rated
reviews:
0
posted:
10/31/2007
language:
English
pages:
0
STATISTICS OF INCOME Special Studies in Federal Tax Statistics, 2006 Innovative Uses of Longitudinal Panels, Information Documents, and Time-Series Analysis To Study the Impact of the U.S. Tax System Measuring, Monitoring, and Evaluating Internal Revenue Service Data Broad Quality Issues in Organizations Survey-Based Estimation Tax Benefits and Administrative Burdens, Recent Research from the IRS Statistical Dissemination and Communication Special Studies in Federal Tax Statistics 2006  Selected Papers Given in 2006 at Annual Meetings of the American Statistical Association and Two Other Professional Conferences Compiled and Edited by James Dalton and Martha Eller Gangi* Statistics of Income Division Internal Revenue Service *Prepared under the direction of Thomas B. Petska, Director, Statistics of Income Division Preface T his is the sixth edition of the IRS Methodology Report series Special Studies in Federal Tax Statistics, 2006. The papers included in this volume were presented in 2006 at the Joint Statistical Meetings of the American Statistical Association (ASA) held in Seattle, Washington, the National Tax Association’s Annual Conference on Taxation held in Boston, Massachusetts, and the United Nations Statistical Commission and Economic Commission for Europe Conference of European Statisticians held in Geneva, Switzerland.  he final section presents a paper on imT proving customer utility on a centrally administered, shared Web site. Nine of the articles in this volume were prepared by authors for publication in the 2007 Proceedings of the American Statistical Association (ASA). Therefore, the format conforms basically to that required by the ASA, with the exception that we have not imposed a strict page limitation. Hence, in some cases, additional explanatory material may be included that is not available in the Proceedings. The contents of the papers included here are the responsibility of the authors, who followed ASA's peer review guidelines for Proceedings papers and then sought additional comments from colleagues either within the SOI Division or elsewhere within IRS. Views expressed are also the responsibility of the authors and do not necessarily represent the views of the Treasury Department or the Internal Revenue Service.  Content This year’s compilation has been divided into six areas of interest:  The volume begins with four papers on the innovative uses of longitudinal panels, information documents, and time-series analysis;  The second section presents three papers on IRS samples, surveys, and performance measurements;  The third section contains a paper on tying Web site performance to mission achievement;  The fourth section includes a paper on strategies to estimate a measure of heteroscedasticity; T  he fifth section contains three papers on special tax provisions for family-owned farms and closely held businesses, corporation life cycles, and the Free File Program;  Acknowledgments The editors of this collection, James Dalton and Martha Eller Gangi, would like to thank Paul Bastuscheck, Heather Lilley, and Lisa Smith for their invaluable contribution in laying out all the papers in this volume and Bobbie Vaira for her assistance in the publishing process. Thomas B. Petska Director Statistics of Income Division Internal Revenue Service March 2007 - iii - Special Studies in Federal Tax Statistics 2006 Contents Page ....................................................................................................................................... Preface iii 1  Innovative Uses of Longitudinal Panels, Information Documents, and Time-Series Analysis To Study the Impact of the U.S. Tax System Analysis of the Distributions of Income, Taxes, and Payroll Taxes via Cross-Section and Panel Data, 1979-2004 by Michael Strudler, Tom Petska, Lori Hentz, and Ryan Petska ....3 Social Security Taxes, Social Security Benefits, and Social Security Benefits Taxation, 2003 by Peter Sailer, Kevin Pierce, and Evgenia Lomize ..................................................13 The Tax Year 1999-2003 Individual Income Tax Return Panel: A First Look at the Data by Michael E. Weber ..........................................................................................................19 Creativity and Compromise: Constructing a Panel of Income and Estate Tax Data for Wealthy Individuals by Barry W. Johnson and Lisa M. Schreiber ..............................29 2  Measuring, Monitoring, and Evaluating Internal Revenue Service Data Monitoring Statistics of Income (SOI) Samples by Joseph Koshansky .............................39 Customer Satisfaction Initiatives at IRS’s Statistics of Income: Using Surveys To Improve Customer Service by Ruth Schwartz and Beth Kilss ......................................49 Performance Measurement within the Statistics of Income Division by Kevin Cecco ..................................................................................................................61 3  Broad Quality Issues in Organizations Tying Web Site Performance to Mission Achievement in the Federal Government by Diane M. Milleville .......................................................................................................71 -v- Contents Page 4  Survey-Based Estimation Comparing Strategies To Estimate a Measure of Heteroscedasticity by Kimberly Henry and Richard Valliant ...........................................................................81 5  Tax Benefits and Administrative Burdens, Recent Research from the IRS Factors in Estates’ Utilization of Special Tax Provisions for Family-Owned Farms and Closely Held Businesses by Martha Eller Gangi, Kimberly Henry, and Brian G. Raub ....................................................................................................................93 Corporation Life Cycles: Examining Attrition Trends and Return Characteristics in Statistics of Income Cross-Sectional 1120 Samples by Matthew L. Scoffic ................103 An Analysis of the Free File Program by Michelle S. Chu and Melissa M. Kovalick ..... 115 6  Statistical Dissemination and Communication Standing Out in a Crowd: Improving Customer Utility on a Centrally Administered, Shared Web Site by Barry W. Johnson.............................................................................125 Index of IRs Methodology RePoRts on statIstICal Uses of adMInIstRatIve ReCoRds............131 Special Studies in Federal Tax Statistics, 2006 Online Special Studies in Federal Tax Statistics, 2006 is available online on the IRS Internet site at: http://www.irs.gov/taxstats/productsandpubs/article/0,,id=168008,00.html. -vi- 1  Innovative Uses of Longitudinal Panels, Information Documents, and Time-Series Analysis To Study the Impact of the U.S. Tax System Strudler  Petska   Hentz   Petska Sailer Pierce Weber Lomize Johnson  Schreiber Analysis of the Distributions of Income, Taxes, and Payroll Taxes via Cross-Section and Panel Data, 1979-2004 D Michael Strudler, Tom Petska, and Lori Hentz, Internal Revenue Service, and Ryan Petska, Ernst and Young LLP 1986 significantly lowered individual income tax rates, and the latter also substantially broadened the income tax base. The tax law changes effective for 1991 and 1993 initiated rising individual income tax rates and further modifications to the definition of taxable income [2]. Law changes effective for 1997 substantially lowered the maximum tax rate on capital gains. The newest law changes, beginning for 2001, lowered marginal rates and the maximum tax rate on long-term capital gains, as well as decreased the maximum rates for most dividends. With all of these changes, the questions that arise are what has happened to the distribution of individual income, the shares of taxes paid, and average taxes by the various income-size classes? In order to analyze changes in income and taxes over time, consistent definitions of income and taxes must be used. However, the Internal Revenue Code has been substantially changed in the last 26 years—both the concept of taxable income and the tax rate schedules have been significantly altered. The most commonly used income concept available from Federal income tax returns, Adjusted Gross Income (AGI), has changed over time making it difficult to use AGI for intertemporal comparisons of income. For this reason, an income definition that would be both comprehensive and consistent over time was developed [3]. The 1979 Retrospective Income Concept was designed to include the same income and deduction items from items available on Federal individual income tax returns. Tax Years 1979 through 1986 were used as base years to identify the income and deduction items, and the concept was subsequently applied to later years including the same components common to all years. The calculation of the 1979 Retrospective Income Concept includes several items partially excluded from AGI for the base years, the largest of which was capital gains [4]. The full amounts of all capital gains, as well as all dividends and unemployment compensation, were included in the income calculation. Total pensions, annuities, IRA distributions, and rollovers were added, ifferent approaches have been used to measure the distribution of individual income over time. Survey data have been compiled with comprehensive enumeration, but under reporting of incomes, inadequate coverage at the highest income levels, and omission of some key sources of income jeopardize the validity of results. Administrative records, such as income tax returns, may be less susceptible to under reporting of income but exclude certain nontaxable income types and can be inconsistent in periods when the tax law has been changed. Record linkage studies have capitalized on the advantages of both approaches, but are costly and severely restricted by the laws governing interagency data sharing. This paper is the seventh in a series examining trends in the distribution of individual incomes and tax burdens based on a consistent and comprehensive measure of income derived from individual income tax returns [1]. In the previous papers, we demonstrated that the shares of income accounted for by the highest income-size classes clearly have increased over time, and we also demonstrated the superiority of our comprehensive and consistent income measure, the 1979 Retrospective Income Concept, particularly in periods of tax reform. In this paper, we continue the analysis of individual income and tax distributions, adding for 8 years (1996-2003) Social Security and Medicare taxes to this analysis and using panel data (for 1996-2003). The paper has three sections. In the first section, we briefly summarize this measure of individual income derived as a “retrospective concept” from individual income tax returns. In the second section, we present the results of our analysis of time series data. We conclude with an examination of Gini coefficients computed from these data. u Derivation of the Retrospective Income Concept The tax laws of the 1980s, 1990s, and early 2000s made significant changes to both the tax rates and definitions of taxable income. The tax reforms of 1981 and -3- Strudler, PetSka, Hentz and PetSka including nontaxable portions that were excluded from AGI. Social Security benefits (SSB) were omitted because they were not reported on tax returns until 1984. Also, any depreciation in excess of straight-line depreciation, which was subtracted in computing AGI, was added back. For this study, retrospective income was computed for all individual income tax returns in the annual Statistics of Income (SOI) sample files for the period 1979 through 2004. Loss returns were excluded, and the tax returns were tabulated into income-size classes based on the size of retrospective income and ranked from highest to lowest. Percentile thresholds were estimated or interpolated for income-size classes ranging from the top 0.1 percent to the bottom 20 percent [5]. For each size class, the number of returns and the amounts of retrospective income and taxes paid were compiled. From these data, income and tax shares and average taxes were computed for each size class for all years. (or entry level) of each income-size class, and a clear pattern emerged. While all of the income thresholds have increased over time, the largest increases in absolute terms, and on a percentage basis, were with the highest income-size classes. For example, $233,539 were needed to enter the top 0.1 percent for 1979, and $1,639,047 were needed for entry into this class for 2004. This represents more than a 600-percent increase. Also, $79,679 of retrospective income were needed to enter the top 1-percent size class for 1979, and $363,905 were needed for entry into this size class for 2004, an increase of 357 percent. For the top 20 percent, the threshold increased by 179 percent, and, for the bottom 20 percent, the increase was only 139 percent. Since much of these increases is attributable to inflation, we computed constant dollar thresholds, using the Consumer Price Index [6]. What is most striking about these data are the changes between 1979 and 2004 for the various income-size percentile thresholds (see Figure A). For example, the threshold for the top 0.1 percent grew (using a 1982-1984 base) from $321,679 for 1979 to $867,680 for 2004, an increase of 170 percent. Similarly, the threshold for taxpayers in the 1-percent group rose from $109,751 for 1979 to $192,644 for 2004, an increase of just over 75 u The Distribution of Income and Taxes With this database, we sought to answer the following questionshave the distribution of individual incomes (i.e., income shares), the distribution of taxes (i.e., tax shares), and the average effective tax rates (i.e., tax burdens) changed over time? As a first look at the data, we examined the income thresholds of the bottom Figure A—Constant Dollar Income Thresholds, 1979-2004 (1982-84=100) 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 Top 0.1% Top 1% Top 5% Top 10% Top 20% -4- 20 03 20 04 analySiS of tHe diStributionS of income, taxeS, and Payroll taxeS Figure B—Income Shares by Income Percentile Size Classes, 1979-2004 60.00 50.00 40.00 30.00 20.00 10.00 0.00 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 Top .1% .1-1% 1-10% 10-20% Bottom 80% percent. However, the thresholds for each lower percentile class show smaller increases in the period; the top 20-percentile threshold increased only 7.2 percent, and the 40-percent and all lower thresholds declined. Income Shares The share of income accounted for by the top 1 percent of the income distribution has climbed steadily from a low of 9.58 percent (3.28 for the top 0.1 percent) for 1979 to a high of 21.55 (10.49 for the top 0.1 percent) for 2000. With the recession and, then, the stagnating economy of 2001 and 2002, this share declined for 2 years but has increased from then to 19.65 percent (9.06 for the top 0.1 percent) for 2004. While this increase has been mostly steady, there were some significantly large jumps, particularly for 1986, due to a surge in capital gain realizations after the passage, but prior to implementation, of the Tax Reform Act of 1986 (TRA). The top 1-percent share also increased rapidly for 1996 through 2000, when sales of capital assets also grew considerably each year. Notable declines in the top 1percent share occurred in the recession years of 1981, 1990-1991, and 2001. This pattern of an increasing share of total income is mirrored in the 1-to-5-percent class but to a considerably lesser degree. For this group, the income share increased from 12.60 percent to 15.19 percent in this period. The -5- 5-to-10-percent class’s share of income held fairly steady over this period, going from 10.89 percent for 1979 to 10.99 percent for 2004. The shares of the lower percentile-size classes, from the 10-to-20-percent classes to the four lowest quintiles, show declines in shares of total income over the 26-year period (see Figure B). Tax Shares—Income Tax The share of income taxes accounted for by the top 1 percent also climbed steadily during this period, from 19.75 percent (7.38 for the top 0.1 percent) for 1979, then declined to a low of 17.42 percent (6.28 for the top 0.1 percent) for 1981, before rising to 36.30 percent (18.70 for the top 0.1 percent) for 2000 (see Figure C). The corresponding percentages for 2000 for the 1-percent and 0.1-percent groups are 37.68 percent and 19.44 percent, respectively, accounting for the 2000 tax rebate, which is discussed below. For the recession year of 2001 and the subsequent year (2002) with its large decline in net gains from the sale of capital assets, these shares declined to 32.53 percent for the top 1 percent and 15.06 percent (15.25 percent including the rebate of the child tax credit) for the top 0.1-percent group (32.95 percent and 15.25 percent, respectively, including a rebate of a portion of the child tax credit). These have since increased to 35.73 percent for the top 1-percent group and 17.16 percent for the top 0.1 percent. As with incomes, there were some years with unusually large increases, though a common 20 04 Strudler, PetSka, Hentz and PetSka Figure C—Income Tax Shares by Income Percentile Size Classes, 1979-2004 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 < Top .1% .1-1% 1-10% 10-20% Bottom 80% feature for these years was double-digit growth in net capital gains [7]. The 1-to-5 percent size class exhibited relatively modest change in its share of taxes, increasing from 17.53 percent to 20.50 percent in the period. The 5-to-10 percent class, and all lower income-size classes, had declining shares of total tax. Average Tax Rates—Income Tax What is most striking about these data is that the levels of the average tax burdens increase with income size in most years (the only exceptions being 1980 through 1986 for just the highest group). The progressive nature of the individual income tax system is clearly demonstrated. Despite the fact that the overall average tax rate remained virtually the same for 1979 and 2001, the average rate for all but the very lowest size class actually declined (see Figure D) [8]. While this at first appears to be inconsistent, it is clear how this did in fact occur—over time, an increasing proportion of income has shifted to the upper levels of the distribution where it is taxed at higher rates (see Figure B). For 2003, the average tax rate fell to 11.63 percent, the lowest rate over the 26 years of this study. For 2004, this increased slightly to 11.81 percent. In examining the average tax data by income size, four distinct periods emerge. First, the average tax rates were generally climbing up to the implementation of the Economic Recovery Tax Act (ERTA) effective for 1982. This was an inflationary period, and prior to indexing of personal exemptions, the standard deduction, and tax brackets, which caused many taxpayers to face higher tax rates. (Indexing became a permanent part of the tax law for Tax Year 1985 [9].) Also, this period marked the recovery from the recession in the early 1980s. Similarly, average taxes also climbed in the period after 1992, the period affected by the Omnibus Budget and Reconciliation Act (OBRA). This was not surprising for the highest income-size classes, ones affected by the OBRA-initiated 39.6-percent top marginal tax rate, but the average tax rate increases are also evident in the smaller income-size classes for most years in the 1993-to-1996 period as well. For the majority of intervening years (i.e., 1982 through 1992), average tax rates generally declined by small amounts for most income-size classes, although the period surrounding the implementation of the 1986 Tax Reform Act (TRA) gave rise to small increases in some classes. Despite the substantial base broadening and rate lowering initiated by TRA, for most income-size classes, the changes to average rates were fairly small. -6- 20 03 20 04 analySiS of tHe diStributionS of income, taxeS, and Payroll taxeS Figure D—Average Tax Rates by Size Classes, 1979-2004 35.00 30.00 25.00 20.00 15.00 10.00 5.00 Top .1%R Top 0.1% 1-.25%R 1-.25% .25-.5%R .25-.5% .5-1%R .5-1% 1-5%R 1-5% 5-10%R 5-10% 10-20%R 10-20% 20-40%R 20-40% 40-60%R 40-60% 60-80%R 60-80% Low 20%R Low 20% 0.00 19 79 19 80 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 However, it should be kept in mind that individuals can and do move between income-size classes. The rates for the top 0.1 percent clearly show the effects of the 1986 capital gain realizations, in anticipation of the end of the 60-percent long-term gain exclusion, which began in 1987. The average tax rate for this income-size class dropped for 1986, but it rose sharply for 1987, before dropping again for each of the next 3 years. To assess what happened, it is important to look at the underlying data. The substantial increase in capital gain realizations for 1986 swelled the aggregate income and tax amounts for upper income classes and also raised the income thresholds of these top classes. However, since much of the increase in income for these size classes was from net long-term capital gains, which had a maximum effective tax rate of 20 percent, it is not surprising that the average tax rate for these top size classes declined. Next, we consider if those years are affected by the Taxpayer Relief Act of 1997 (1997 through 2000), when the top rate on long-term capital gains was reduced significantly from 28 percent to 20 percent. For 1997, the first year under this law, when the lower rates were only partially in effect, the average tax rate fell for the -7- top 0.1-percent group of taxpayers but increased for all other groups. However, for 1998, the first full year under lower capital gain rates, all groups above and including the 40-to-60-percent class had reduced average tax rates (while the lowest two quintiles had virtually the same average tax rates). For all groups (except for the 20-to-40 and the 60-to-80-percent groups in 1999), the average rates returned to increasing for both 1999 and 2000. The Economic Growth and Tax Relief Reconciliation Act of 2001 (EGTRRA) further reduced marginal tax rates over several years. One of these reductions was the introduction of a 10-percent bracket on the first $6,000 ($12,000 if married filing a joint return) of taxable income. In an attempt to fuel a recovery from recession, this reduction was introduced retroactively in the form of a rebate based on Tax Year 2000 filings. Therefore, we simulated the rebate on the Tax Year 2000 Individual File to see its effects on average tax rates. When the rebate (estimated at $40.5 billion) is taken into account, the average rates for 2000 decreased for all groups, except for the top 0.1 percent and the 1-to-5 percent, reversing the prerebate increases. Tax Year 2001 was a mixture of increases and decreases in average tax rates by income group. Most groups paid higher average taxes; however, the 1-to-5-percent and 5-to-10-percent Strudler, PetSka, Hentz and PetSka groups paid lower average taxes along with the bottom 20-percent group. For 2002, when the 10-percent rate applied to all returns and all rates above 15 percent were reduced by one-half of 1 percentage point, the average tax rate fell for every group. Further, as the economy stagnated, another rebate of $400 per child was sent to individuals who received a child tax credit for that year. This was in lieu of receiving the additional amount for 2003 as part of the increased child tax credit provided by the Jobs and Growth Tax Relief Reconciliation Act of 2003 (JGTRRA). Simulating this on Tax Year 2002, we estimated that $14.2 billion were sent to taxpayers further reducing average taxes for 2002. The individuals who gained the most from this rebate were in the 5-to10-percent group through the 40-to-60-percent group. For 2003 and 2004, with further reductions in marginal rates, capital gain rates (to 15 percent), and the introduction of the same rates for qualified dividends, average tax rates decreased further to 11.63 percent and 11.81 percent, respectively. These were the lowest averages over the 26 years of this study. Further, aside from the 0.1-percent group in 1986 and the 0.5-to-1-percent group in 1991, all groups had their lowest average rates in these 2 years. their Federal tax burden [10]. To broaden our analysis, we merged data from W-2s with individual income tax records for the years 1996-2003. Total Social Security taxes included self-employment taxes and taxes on tips reported on tax returns and two times the Social Security taxes (representing both the taxpayers’ and the employers’ shares) reported on W-2s. The employers’ share of this tax was added into retrospective income, as well. Also, in order to have a better income concept over time, we altered retrospective income by including total Social Security benefits. As stated above, this was not included in income because it was not on older (pre1984) tax returns, but, since this part of our study began with 1996, we were able to relax this constraint. Including Social Security taxes (see Figure E), an interesting trend occurred. Through 2000, the tax share of all the higher income groups up to the 5-percent class increased each year, while the share of all the groups above the 20-percent class went down. However, after 2000, the top 0.1-percent group paid a decreasing share each year, while individuals in the 20-40-percent class paid an increasing share each year. The tax shares of other groups varied between the years. Overall, the top 20 percent paid a lower tax share (68.03 percent) in 2003 than they did in 2000 (70.27 percent), but this share was still higher than they paid in 1996 (66.21 percent). This occurred despite the fact that the share of the top 0.1-percent group declined from 9.30 percent for 1996 to 9.02 percent for 2003. Tax Shares—Income Plus Social Security Tax For individual taxpayers, Social Security taxes compose a fairly large portion (about 40 percent for 2003) of Figure E—Tax Shares (Including Social Security Taxes) by Percentile Size Classes, 1996-2003 Year Total 1996 100.00 1997 100.00 1998 100.00 1999 100.00 2000 100.00 2000 Rebate 100.00 2001 100.00 2002 100.00 2002 Rebate 100.00 2003 100.00 % change in share < .1% 1 - .25% 25 - .5% .5 - 1% Top 1% 9.30 3.59 3.55 4.44 20.88 9.69 10.39 11.24 12.32 12.65 9.95 9.08 9.17 3.75 3.82 3.91 3.96 4.06 3.74 3.58 3.62 3.64 3.65 3.82 3.92 4.01 3.57 3.56 3.60 4.57 4.61 4.70 4.70 4.80 4.64 4.60 4.65 21.66 22.46 23.66 24.90 25.52 21.90 20.82 21.03 1-5% 5-10% 10-20% Top 20% 20-40% 40-60% 60-80% Low 20% 16.40 12.29 16.64 66.21 19.82 10.23 3.19 0.55 16.35 16.63 17.05 16.99 17.26 17.16 17.47 17.64 12.10 12.11 12.06 11.87 11.95 12.51 12.87 12.89 16.36 16.13 15.85 15.58 15.54 16.44 16.96 16.91 66.46 67.34 68.62 69.34 70.27 68.01 68.12 68.47 19.38 18.78 18.23 17.69 17.34 18.59 18.87 18.71 10.27 9.96 9.48 9.26 8.89 9.74 9.60 9.46 3.28 3.32 3.12 3.16 2.95 3.12 2.90 2.85 0.60 0.61 0.55 0.55 0.55 0.54 0.51 0.52 0.53 -3.64% 9.02 3.54 3.57 4.63 20.77 17.54 12.73 16.99 -3.01% -1.39% 0.56% 4.28% -0.53% 6.95% 3.58% 2.10% 68.03 19.08 9.58 2.78 2.75% -3.73% -6.35% -12.85% -8- analySiS of tHe diStributionS of income, taxeS, and Payroll taxeS Figure F—Combined Panel 'P': Average Tax Rates (Including Social Security Taxes) by Size Classes, 1996-2003 Year 1996 1997 1998 1999 2000 2001 2002 2003 All years % change 96-03 Total 22.78 22.76 21.83 22.37 22.44 22.13 21.55 20.14 21.94 -11.59% Top 5% 28.01 27.44 25.05 26.91 26.60 26.27 26.78 24.15 26.30 -13.78% 5-10% 24.73 24.34 23.78 24.19 24.13 24.06 22.85 21.55 23.66 -12.86% 10-20% 23.23 23.73 22.59 22.96 23.11 23.00 22.00 20.90 22.64 -10.03% 20-40% 21.82 21.87 21.00 21.34 21.50 21.42 20.33 19.30 21.02 -11.55% 40-60% 19.53 19.86 19.33 19.25 19.38 19.38 18.41 17.72 19.06 -9.27% 60-80% Low 20% 16.53 8.91 16.89 9.23 16.76 9.53 16.86 9.88 17.32 10.92 17.17 10.31 16.22 10.01 15.78 10.61 16.68 10.02 -4.54% 19.08% u Average Tax Rates Including Social Security Taxes Using Panel Data For 1996 through 2003, we used a panel of individual tax returns that were selected at a 1-in-5,000 return random sample embedded in each year’s Individual Statistics of Income (SOI) sample. These returns were based on the primary taxpayer having certain Social Security number endings and are part of Social Security’s Continuous Work History Sample (CWHS). The reason for studying a panel of returns is to obtain a more well-rounded approach to analyzing tax returns over time. While “the rich” may appear to be getting greater concentrations of income over time, the composition of who “the rich” are may also be changing over time. By looking at the panel, we defined income groups from the combined data (indexed for inflation) over this time period. As with the 1996-2003 cross-sectional study, in order to have a better income concept over time, we altered retrospective income by including total Social Security benefits. Then, we analyzed how income and taxes changed in each of these years, classifying each year’s returns in quintile classes. In analyzing this panel over time, we classified returns into quintile classes for each of the 8 years, 1996 through 2003. We started with 120 million returns filed for 1996 and followed these returns. In analyzing this panel over time, we only included returns that were filed for each of the 8 years. This left us with 76.8 million returns out of the 120 million returns filed for 1996. Using inflation-indexed income, we then combined the income and taxes over time to create a “combined income and tax” for each of the tax returns. We then reclassified each return into percentile classes, with the 5-percent income class being the highest class analyzed (due to the high sampling variability at levels above this). Looking at average taxes for the combined income groups (see Figure F), while all groups’ average tax rated declined over the period between 1996 from 2003 by 11.6 percent, the largest decline was in the higher income groups. The average tax rate of the top 5-percent group went down by 13.8 percent (from 28.0 percent to 24.2 percent) and the 5-to-10-percent group by 12.9 percent. The rates fell for all groups below the 80-percent level. The bottom 20-percent group, however, paid 19.1 percent higher average tax rates in 2003 than in 1996 (from 8.9 percent to 10.6 percent). u Analysis of Gini Coefficients To further analyze the data, we estimated Lorenz curves and computed Gini coefficients for all years. The Lorenz curve is a cumulative aggregation of income from lowest to highest, expressed on a percentage basis. To construct the Lorenz curves, we reordered the percentile classes from lowest to highest and used the income thresholds as “plotting points” to fit a series of regression equations for each income-size interval in the 26 years, both before and after taxes. -9- Strudler, PetSka, Hentz and PetSka Figure G—Gini Coefficients for Retrospective Income, Before and After Taxes, 1979-2004 1979–2004 Year 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2000 Rebate 2001 2002 2002 Rebate 2003 2004 Gini Before Tax Gini After Tax 0.469 0.471 0.471 0.474 0.482 0.490 0.496 0.520 0.511 0.530 0.528 0.527 0.523 0.532 0.531 0.532 0.540 0.551 0.560 0.570 0.580 0.588 0.588 0.564 0.555 0.555 0.559 0.575 0.439 0.441 0.442 0.447 0.458 0.466 0.471 0.496 0.485 0.505 0.504 0.503 0.499 0.507 0.503 0.503 0.510 0.521 0.530 0.541 0.550 0.558 0.557 0.534 0.525 0.525 0.533 0.549 Difference 0.030 0.031 0.029 0.027 0.025 0.024 0.024 0.024 0.026 0.026 0.024 0.024 0.024 0.025 0.028 0.028 0.029 0.030 0.030 0.029 0.030 0.031 0.032 0.030 0.030 0.030 0.026 0.026 Percent Difference 6.3% 6.5% 6.2% 5.7% 5.1% 4.9% 4.9% 4.6% 5.1% 4.8% 4.6% 4.5% 4.6% 4.7% 5.2% 5.3% 5.4% 5.5% 5.4% 5.1% 5.2% 5.2% 5.4% 5.4% 5.3% 5.3% 4.7% 4.6% Figure G–Gini Coefficients for Retrospective Income, Before and After Taxes, Once the Lorenz curves were estimated for all years, Gini coefficients were calculated for all 26 years. The Gini coefficient, which is a measure of the degree of inequality, generally increased throughout the 26-year period signifying rising levels of inequality for both the pre- and posttax distributions. This result was not unexpected since it parallels the rising shares of income accruing to the highest income-size classes. Over this period, Figure G shows that the beforetax Gini coefficient value increased from 0.469 for 1979 to 0.588 (25.4 percent) for 2000, while the aftertax Gini value increased from 0.439 to 0.558 for a slightly higher percentage increase (25.5 percent). The economic downturn in 2001 and 2002 actually decreased the levels of inequality to 0.555 (pretax) and 0.525 (aftertax). For 2004, these rose back to 0.575 (pretax) and 0.549 (aftertax). - 10 - analySiS of tHe diStributionS of income, taxeS, and Payroll taxeS So, what has been the effect of the Federal tax system on the size and change over time of the Gini coefficient values? One way to answer this question is to compare the before- and aftertax Gini values [11]. Looking at this comparison, two conclusions are clear. First, Federal income taxation decreases the Gini coefficients for all years. This is not surprising in that the tax rate structure is progressive, with average rates rising with higher incomes so that aftertax income is more evenly distributed than beforetax income. A second question is whether the relationship between the beforetax and aftertax Gini coefficient values has changed over time. The aftertax series closely parallels the beforetax series, with reductions in the value of the Gini coefficient ranging from 0.024 to 0.032. The largest differences, which denote the largest redistributive effect of the Federal tax system, have generally been in the periods of relatively high marginal tax rates, particularly 1979-81 and for 1993 and later years. In fact, simulating the tax rebate for Tax Year 2000 results in the largest difference (0.032) over all the years. If this were the only change in marginal rates of the new tax law (EGTRRA), the results would have been to increase the redistributive effects of Federal taxes. However, for Tax Year 2001 and beyond, the marginal rates of higher income classes were reduced from 38.6 percent to 35 percent for 2004. To investigate further, the percentage differences between before- and aftertax Gini values were computed. These percentage changes in the Gini coefficient values, a “redistributive effect,” show a decline ranging from 4.5 percent (1990) to 6.5 percent (1980). As for the differences, the largest percentage changes are for the earliest years, a period when the marginal tax rates were high. The largest percentage reduction was for 1980, but the size of the reduction generally declined until 1986, fluctuated at relatively low levels between 1986 and 1992, and then increased from 1993 to 1996. However, coinciding with the capital gain tax reduction for 1997, the percentage change again declined for 1997 and 1998. Nevertheless, it increased for 1999, 2000, and 2001 (although the 2001 percentage increased slightly if the rebate is included with the 2000 data). For 2003 and 2004, this difference declined to 4.7 percent and 4.6 percent, respectively, approaching the 1990 level. So, what does this all mean? First, the high marginal tax rates prior to 1982 appear to have had a significant redistributive effect. But, beginning with the tax rate reductions for 1982, this redistributive effect began to decline up to the period immediately prior to TRA 1986. Although TRA became effective for 1987, a surge in late 1986 capital gain realizations (to take advantage of the 60-percent long-term capital gain exclusion) effectively lowered the average tax rate for the highest income groups, thereby lessening the redistributive effect. For the post-TRA period, the redistributive effect was relatively low, and it did not begin to increase until the initiation of the 39.6-percent tax bracket for 1993. But since 1997, with continuation of the 39.6-percent rate but with a lowering of the maximum tax rate on capital gains, the redistributive effect again declined. Data from 2003 and 2004 show that the new tax laws have continued this trend. Analysis of panel data shows that these trends are not quite as great as seen by looking at annual cross-section data, but the trends cited above are still apparent. u Endnotes [1] Strudler, Michael; Petska, Tom; and Petska, Ryan, A Further Analysis of the Distribution of Individual Income and Taxes, 1979-2002, 2004 Proceedings of the American Statistical Association, Social Statistics Section, 2004. Petska, Tom; Strudler, Mike; and Petska, Ryan, New Estimates of Individual Income and Taxes, 2002 Proceedings of the 95th Annual Conference on Taxation, National Tax Association, 2003. Strudler, Michael and Petska, Tom, An Analysis of the Distribution of Individual Income and Taxes, 1979-2001, 2003 Proceedings of the American Statistical Association, Social Statistics Section, 2003. Petska, Tom; Strudler, Mike; and Petska, Ryan, Further Examination of the Distribution of Income and Taxes Using a Consistent and Comprehensive Measure of Income, 1999 Proceedings of the American Statistical Association, Social Statistics Section, 2000. - 11 - Strudler, PetSka, Hentz and PetSka Petska, Tom and Strudler, Mike, The Distribution of Individual Income and Taxes: A New Look at an Old Issue, presented at the annual meetings of the American Economic Association, New York, NY, January 1999, and published in Turning Administrative Systems into Information Systems: 1998-1999. Petska, Tom and Strudler, Mike, Income, Taxes, and Tax Progressivity: An Examination of Recent Trends in the Distribution of Individual Income and Taxes, 1998 Proceedings of the American Statistical Association, Social Statistics Section, 1999. [2] Ibid. [3] Nelson, Susan, Family Economic Income and Other Income Concepts Used in Analyzing Tax Reform, Compendium of Tax Research, Office of Tax Analysis, U.S. Department of the Treasury, 1987. Hostetter, Susan, Measuring Income for Developing and Reviewing Individual Tax Law Changes: Exploration of Alternative Concepts, 1987 Proceedings of the American Statistical Association, Survey Research Methods Section, 1988. Internal Revenue Service, Statistics of Income— Individual Income Tax Returns, Publication 1304, (selected years). Mudry, Kyle and Parisi, Michael, Individual Income Tax Rates and Tax Shares, 2003, Statistics of Income Bulletin, Winter 2005-2006, Volume 25, Number 3. [4] See endnote 1. [5] For the years 1979 through 1992, the percentile threshold size classes were estimated by osculatory interpolation as described in Oh and Oh and Scheuren (see below). In this procedure, the data were tabulated into size classes, and the percentile thresholds were interpolated. For 1993 through 2004, the SOI individual tax return data files were sorted from highest to lowest, and the percentile thresholds were determined by cumulating records from the top down. Oh, H. Lock, Osculatory Interpolation with a Monotonicity Constraint, 1977 Proceedings of - 12 - the American Statistical Association, Statistical Computing Section, 1978. Oh, H. Lock and Scheuren, Fritz, Osculatory Interpolation Revisited, 1987 Proceedings of the American Statistical Association, Statistical Computing Section, 1988. [6] The CPI-U from the U.S. Department of Labor, Monthly Labor Review, was used for deflation of the income thresholds. [7] Internal Revenue Service, Statistics of Income— Individual Income Tax Returns, Publication 1304, (selected years). Mudry, Kyle and Parisi, Michael, Individual Income Tax Rates and Tax Shares, 2003, Statistics of Income Bulletin, Winter 2005-2006, Volume 25, Number 3. [8] Taxes, taxes paid, tax liabilities, tax shares, and average or effective tax rates are based on income tax, defined as income tax after credits plus alternative minimum tax (AMT) less the nonrefundable portion of the earned income credit (for 2000 and 2001, AMT was included in income tax after credits). However, for Figure F, tax includes Social Security and Medicare taxes less all of the earned income credit and refundable child credit. [9] Nelson, Susan, Family Economic Income and Other Income Concepts Used in Analyzing Tax Reform, Compendium of Tax Research, Office of Tax Analysis, U.S. Department of the Treasury, 1987. [10] Internal Revenue Service, Data Book 2003–Publication 55B. For Fiscal Year 2003, total Individual Income Taxes collected from withholding and additional taxes paid with tax forms filed were $987.2 billion, while total Social Security taxes were $647.9 billion. [11] A comparison of the before- and after-tax Gini coefficients does not exclusively measure the effects of the tax system in that the tax laws can also affect before-tax income. For example, capital gain realizations have been shown to be sensitive to the tax rates. Social Security Taxes, Social Security Benefits, and Social Security Benefits Taxation, 2003 Peter Sailer, Kevin Pierce, and Evgenia Lomize, Internal Revenue Service F or most of its 90-year existence, the Statistics of Income (SOI) Division of the Internal Revenue Service and its predecessor organizations have used data provided by taxpayers on Forms 1040 to fulfill the legal mandate to produce statistics on the operation of the individual income tax system. It was not until Tax Year 1989 that SOI started using the Information Returns Master File (IRMF), which contains electronic documents filed by the payers of income to individuals, to add further details to the tax return information. To date, the SOI Bulletin has featured articles on the distribution of salaries and wages from Forms W-2[1] and the accumulation of assets in Individual Retirement Accounts from Forms 5498[2], based on this rich source of administrative data. In this paper, the authors make a modest proposal for another set of statistics that could be produced from the IRMF which would shed light not only on the operation of the individual income tax and the Social Security tax systems, but also on the interaction of the two systems. The paper illustrates some of the analysis that could be produced with this file. unemployment compensation, rents, royalties, interest, dividends, and pension distributions from various Forms 1099. For 2003, total income (other than Social Security benefits) stood at $6.7 trillion. This is Figure 1—Computation of Social Security Impact Amount ($1,000) Total income before Social Security Additions, total Gross Social Security benefits Income tax reduction due to SECA Excess FICA credit Subtractions, total FICA tax (employer's portion) FICA tax (employee's portion) Self-employment tax Social Security tax on tips Repayments of SS benefits Tax on taxable benefits =Total income after Social Security 6,743,571,198 385,787,734 384,037,692 236,808 1,513,234 541,579,465 246,016,712 246,016,712 29,278,008 148,273 1,728,716 18,391,044 6,587,779,467  Components of the Social Security Impact Figure 1 starts from the total income of everybody touched by the Social Security system, either as a payer of Federal Insurance Contributions Act (FICA) or SelfEmployment Contributions Act (SECA) taxes, or as a recipient of Social Security benefits. The first line shows total income, which, for filers of tax returns, is the sum of all sources of income as shown on line 22 of Form 1040, or the equivalent lines of Forms 1040-A and 1040-EZ. For the purpose of this chart, the taxable portion of Social Security benefits has been excluded. One of the advantages of working with information documents is that they enable SOI to show information on individuals who have not filed (and may never file) income tax returns for a given year. For these individuals, total income can be computed by adding salaries and wages from Forms W-2, gambling winnings from Forms W-2G, and nonemployee compensation, - 13 - the amount for all participants in the Social Security system, whether as benefit recipients or payers of Social Security taxes. The Social Security system added $386 billion to this income—basically in the form of benefits payments—and took out $542 billion—mainly in Social Security taxes, but also in the taxation of the Social Security benefits it paid out. Figure 1 also shows the details of the additions and subtractions. The $386 billion in additions are almost entirely the Social Security pensions and survivor benefits paid out by SSA, plus two small technical adjustments—self-employed individuals who pay their own Social Security taxes (instead of having them withheld and matched by employers) are able to deduct one-half of their so-called “self-employment tax” from their total incomes on their tax returns. This, of course, reduces their regular income tax by, roughly, that amount times the marginal tax rate. So, taxpayers in the 33-percent tax bracket for 2003 got back on their income tax forms roughly one-sixth of the self-employment tax they paid Sailer, Pierce, and lomize into Social Security (33 percent of one-half the tax). In this tabulation, only that part of the self-employment tax that relates to retirement and survivor benefits, also known as SECA, is shown. Medicare taxes and payments are not part of this analysis. Another technical adjustment was needed for individual taxpayers who overpaid their FICA taxes because they worked for more than one employer in the course of a tax year. If the total amount of their salaries and wages from the two employers exceeded the maximum subject to the FICA tax ($87,000 for Tax Year 2003), the excess FICA tax over $5,349 could be shown as a tax payment on the tax return. This overpayment amounted to $1.5 billion for 2003. The largest subtraction from total income caused by the Social Security system is, obviously, the FICA tax, half of which is deducted from each employee’s salary or wage, and half of which, at least legally, is paid by the employer. If it is true, as economic theory holds, that employees eventually get paid what their marginal utility determines them to be worth, then the employer’s portion of Social Security taxes truly is a reduction in employees’ salaries; for that reason, it is shown as a subtraction from income in Figure 1. In any case, it does represent amounts going into the Social Security system. FICA tax data come from Forms W-2 filed by each employer. The self-employment tax is computed on Schedule SE of Form 1040. This is the Social Security tax paid by self-employed individuals. For purposes of this chart, the Medicare portion of this tax, also computed on Schedule SE, was not included. Social Security taxes on tip income that had not been collected by the employer, and that the waiter or other employee with tip income was supposed to report on his or her income tax return, represent a very small subtraction from total income. Since the additions include all payments of Social Security benefits, the small amount that was paid out in error (usually because the taxpayer earned too much money in some quarter to qualify), and had to be repaid by the recipient, is shown here as a subtraction. Finally, an $18-billion subtraction is shown in Figure 1 because some Social Security benefits are subject to the individual income tax. The amount of taxes thus raised is moved from the general fund to the Social Security trust fund, and, thus, these taxes do, in fact, go into the Social Security system.  Impact of Social Security Taxes and the Individual Income Tax Figure 2 shows the impact of the Social Security tax (both FICA and SECA) on workers and self-employed individuals at various income levels. For comparison purposes, the average income tax for these same individuals is shown as well. While income taxes keep rising with income, Social Security taxes level off at just over $13,000 per taxpaying unit when total income reaches $160,000. At the very lowest income levels, Social Security taxes actually tend to be higher than income taxes. Figure 2—All Individuals With Social Security Taxes, 2003: Average Tax by Size of Total Income and Type of Tax 140,000 Average Tax ($1) 120,000 100,000 80,000 60,000 40,000 20,000 0 0 40 80 120 160 200 240 280 320 360 400 440 480 Size of Total Income ($1,000) Social Security Tax Income Tax When the same data are displayed showing total income tax and Social Security taxes as a percentage of total income, as is done in Figure 3, it becomes dramatically clear that the income tax is a progressive tax (although not as progressive as it used to be), while Social Security taxes are (and always have been) regressive. For purposes of Figure 3, married couples filing jointly are shown as a single taxpaying entity. It was easier to combine the FICA and SECA taxes for the two taxpayers than it would have been to try to attribute some portion of the income tax to each of them. - 14 - Social Security taxeS, Social Security benefitS, and Social Security benefitS taxation, 2003 Figure 3—All Individuals with Social Security Taxes, 2003: Taxes by Type as Percent of Total Income 30% Tax as percentage of total income 25% 20% 15% 10% 5% 0% 0 50 100 150 200 250 300 350 400 450 Size of total income ($1,000) Social Security Tax Income Tax Figure 4—All Individuals with Social Security Benefits (SSB): SSB as % of Total Income, 2003 120% Percentage of Total Income 100% 80% 60% 40% 20% 0% 0 40 80 120 160 200 240 280 320 360 400 440 480 Size of Total Income (incl. SS Benefits) ($1,000) Average SS Benefits On the other hand, each nonfiler is shown as a separate unit, whether married or not, since the information documents do not reveal any information on marital connections. In the case of nonfilers, the proxy for total Federal income tax is Federal income tax withheld; since they had not filed by the end of the following year, tax withheld was, in fact, the total amount they had paid to the Federal Government. Figure 5—All Individuals with Social Security Benefits (SSB): Average SSB by Size of Total Income, 2003 30,000 25,000 20,000 15,000 10,000 5,000 0 0 60 120 180 240 300 360 420 480 Size of Total Income (including SS Benefits) ($1,000)  Distribution of Social Security Benefits It was noted previously that the impact of the FICA and SECA tax was highest on those in the lower-income classes—at least in proportion to income. Figure 4 shows that the distributions of Social Security benefits are also highest for lower-income individuals. Retirees with incomes greater than zero but under $10,000 derive 96 percent of their incomes from Social Security benefits. The percentage drops to 50 just under the $20,000 income level, and drops below 5 percent around the $400,000 income level. Figure 5 shows that, in terms of average Social Security benefits, the amounts rise steadily from the lowest income class until the benefits reach $20,000 for recipients with incomes around $150,000, and that the benefits then bounce around the $20,000 line for the rest of this distribution. In other words, the rich do not get any more in Social Security benefits than the mid- dle class, but, as was shown earlier, they do not put any more into Social Security than the middle class, either.  Overall Impact of the Social Security System Figure 6 shows two income distributions: The first (the solid line) is based on total income without any Social Security benefits included or Social Security taxes taken out; the second income distribution (dotted line) subtracts from total income all the Social Security taxes (including income taxes paid on Social Security - 15 - Sailer, Pierce, and lomize benefits), and adds in all the Social Security benefits. It is evident that the Social Security system does keep many people out of the abject poverty of the “Under $5,000” class. The “with Social Security” distribution shows just over 20 million reporting units in this class, as opposed to over 35 million in the “without Social Security” distribution. On the other hand, the “with Social Security” distribution shows significantly more filing units in the $10,000 to $20,000 income area than does the “without Social Security” distribution. Between $20,000 and $70,000, the “with Social Security” line runs just very slightly above the “without Social Security” line, and, after $70,000, it runs very slightly below the “without Social Security” line. Figure 6—All Reporting Units in the Social Security Systems: Distribution of Total Income, Tax Year 2002 Number of reporting units 40000000 35000000 30000000 25000000 20000000 15000000 10000000 5000000 0 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 system every year. Then, the average starts rising until it reaches positive territory for the 60 to 65 age group, and peaks just shy of the $11,000 mark for the 80 to 85-year-olds. Figure 7—Average Impact of Social Security System by Age of Participant, 2003 12,000 10,000 Average impact ($) 8,000 6,000 4,000 2,000 0 -2,000 -4,000 -6,000 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Age of participant (in case of joint returns, age of primary taxpayer)  Social Security Taxes and Other Forms of Retirement Savings SOI’s merged file of tax returns and information documents contains data on other forms of retirement savings—Forms W-2 show payments into 401(k) plans and similar programs in the Government and nonprofit sectors; Forms 5498 show payments into Individual Retirement Accounts, including Traditional and Roth IRA plans. Unfortunately, IRS does not have information on how much is being placed into defined benefit plans by various employers. The only evidence for those contributions is a checkmark in a box on the W-2. Therefore, the following analysis is confined to those taxpayers who do not have employer-provided defined benefit plans. Figure 8 shows that, for the lowest income taxpayers—those with earned incomes under $25,000—Social Security taxes represented the vast majority of their setasides for retirement. For example, in the $20,000 under $25,000 earned income class, Social Security taxes (again, counting both the employer and employee portions of FICA) amounted to 12.2 percent of earned income. Contributions to other types of retirement plans amounted to only 1 percent of earned income. Nonetheless, this means that these individuals were having 13.2 percent of their earned incomes set aside for retirement purposes, which is actually a pretty respectable propor- Size of total income ($1,000) Without Social Security With Social Security  Impact of the Social Security System by Age of Taxpayer SOI’s merged file of tax returns and information documents contains data on the age of the participants. For the purpose of Figure 7, Social Security benefits and Social Security taxes are combined into one variable, with benefits shown as positive amounts and taxes as negative amounts. The averages of these positive and negative amounts are shown for each age group (in 5-year increments). Figure 7 shows that the Social Security system has a positive impact on the very youngest children who come into contact with it, because they are getting survivor benefits. In the 15 under 20 age group, the effect turns negative, as people start working and paying Social Security taxes. During the peak earnings years of 35 to 55, participants tend, on average, to put between $4,500 and $5,000 into the - 16 - Social Security taxeS, Social Security benefitS, and Social Security benefitS taxation, 2003 tion, considering that the highest percentage shown in this chart is 15.1 percent, which applies to the $80,000 under $85,000 earned income class.  Endnotes [1] [2] See Sailer, Yau, and Rehula (2001-2002) and Yau, Gurka, and Sailer (2003). See Sailer and Nutter (2004) and Bryant and Sailer (2006).  Future Steps At SOI, we have started to collect these data for a panel of taxpayers beginning in 1999. In addition, we have been saving population data from the Information Returns Master File going back to 1995. So, if we combine 4 years of data selected retrospectively with prospective data from one of our 1999-base panels, we will have a data set with which we can follow participants in the Social Security system for 10 years; if we keep building on that, the panel will be available for analyzing equitable methods of adjusting the Social Security and income tax systems to keep Social Security solvent for future generations.  References Bryant, Victoria L. and Sailer, Peter J., “Accumulation and Distribution of Individual Retirement Arrangements, 2001-2002,” SOI Bulletin, Spring 2006, pp. 233-254. Sailer, Peter J.; Yau, Ellen; and Rehula, Victor, “Income by Gender and Age from Information Returns, 1998,” SOI Bulletin, Winter 2001-2002, pp. 83-102. Figure 8—Retirement Deferrals as Percentage of Earned Income, by Size of Earned Income, 2003 16% 14% Percentage of Earned Income 12% 10% 8% 6% 4% 2% 0% $2 $0 0, 0 $4 00 0, 0 $6 00 0, 0 $8 00 0, $1 00 00 0 $1 ,00 20 0 $1 ,00 40 0 $1 ,00 60 0 $1 ,00 80 0 $2 ,00 00 0 $2 ,00 20 0 $2 ,00 40 0 $2 ,00 60 0 $2 ,00 80 0 $3 ,00 00 0 $3 ,00 20 0 $3 ,00 40 0 $3 ,00 60 0 $3 ,00 80 0 $4 ,00 00 0 $4 ,00 20 0 $4 ,00 40 0 $4 ,00 60 0 $4 ,00 80 0 ,0 00 Private Retirement Deferrals Social Security Retirement Deferrals Size of Earned Income - 17 - Sailer, Pierce, and lomize Sailer, Peter J. and Nutter, Sarah E., “Accumulation and Distribution of Individual Retirement Arrangements, 2000,” SOI Bulletin, Spring 2004, pp. 121-134. Yau, Ellen; Gurka, Kurt; and Sailer, Peter, “Comparing Salaries and Wages of Women Shown on Forms W-2 to Those of Men, 1969-1999,” SOI Bulletin, Fall 2003, pp. 274-283. - 18 - The Tax Year 1999-2003 Individual Income Tax Return Panel: A First Look at the Data Michael E. Weber, Internal Revenue Service T his paper represents the Statistics of Income (SOI) Division’s first release of data from its Tax Year 1999 Panel of Individual Income Tax Returns. A previous ASA paper explained the history and development of this panel so that only a brief review of the panel’s history and design will be provided in this paper [1]. SOI’s mission is to produce and publish data on the operation of the Federal tax system. Policy analysis and the development of recommendations on the operation of the tax system are not part of SOI’s mission. SOI microdata files, tabulations, and articles are accepted as the nonbiased starting point for policy discussions by individuals of all ideological backgrounds. The fact that virtually all of SOI’s published tabulations are based on cross-sectional samples where the sampling frames and sampling techniques are established and well-known certainly helps SOI fulfill this mission. The publication of tabulations based on panel samples, however, presents a more complicated situation as will be discussed later. The purpose of this paper is to work through some of those complications and to arrive at a series of panel tabulations that can be viewed in the same unbiased light as the more standard SOI tabulations. Already today, income tax return panels provide policy organizations such as the Treasury Department’s Office of Tax Analysis (OTA) and Congress’s Joint Committee on Taxation (JCT) with powerful policy analysis tools that are not available to researchers outside of those organizations. But it is not OTA or JCT’s responsibility to provide voluminous amounts of tabular panel data to the public; it is SOI’s responsibility, and this paper is hopefully a first step in meeting that responsibility. subsample of the 1999 cross-sectional sample. The 1999 Edited Panel contains only 21 stratifications with sampling ranging from 100 percent to .05 percent. The base year of this panel represents a sample of tax returns. Subsequent years represent a sample of the returns filed by individuals listed as taxpayers on the 1999 base year return. This is a significant difference because it means that the base year sample unit can break apart into two returns through divorce or double the number of individuals in the unit through marriage. Even worse, a unit can divide into two returns through divorce and then, through a second marriage for each original taxpayer, end up representing four individuals. It is these changes that present problems in tabulating, presenting, and interpreting income tax return panel data. u Potential Solutions One solution to the changing marital status problem is to follow only the primary taxpayer listed on the tax return. The main problem with this approach is that approximately 95 percent of primary taxpayers listed on jointly filed returns are male, and, thus, a significant gender bias would be introduced into any analysis. Another possible solution to the changing filing status problem would be to follow both the primary and secondary taxpayers separately. The main problem with this approach is the complexity involved in trying to divide up income between the primary and secondary taxpayers on jointly filed returns. Even if the income could be divided correctly, the act of doing so has implications. For example, do married individuals make independent or joint economic decisions? If their incomes are divided, how is the joint decisionmaking aspect retained in the data? Finally, another possible solution is to simply examine only those panel units where the marital status has not changed. The main problem with this approach is that it excludes all taxpayers who, during the course of u Background Each year, the Statistics of Income Division produces a sample of individual income tax returns. The Tax Year 1999 sample included 176,966 returns sampled in 92 stratifications. The sampling rates ranged from 100 percent to .05 percent based on classifications of income and the type of forms and attachments included on each return [2]. The 1999 Edited Panel is an 83,434-return - 19 - Weber the study, either get married, divorced, or had a spouse die. If changes in a taxpayer’s marital status or the death of a spouse affect his or her economic well-being and decisionmaking process, then that information is lost under this approach. Obviously, none of these solutions is really adequate, and perhaps the best solution is to utilize all three and compare the results. Unfortunately, such an exercise is beyond the scope of this paper. But given time and resource constraints, and the basic structure of the panel, the easiest and quickest solution to implement is the third solution: examine only those panel units where the filing status has not changed. for all tables consistent, only panel units with returns present for all 5 years will be used. (Another solution would be to impute missing returns, but that is beyond the scope of this paper.) As Figure 1 shows, in 1999, the panel contained an estimated 127 million returns or panel units. But, as of 2003, only 106 million panel units had filed returns for all 5 years. Where did the 21 million panel units go? First, any single taxpayer who died during this time period obviously is part of the 21 million missing units, as are any 1999 filers who no longer met the filing threshold for any or all of the subsequent years. Another portion represents taxpayers who should have filed a return but did not. Often, these taxpayers file, but do so in a subsequent calendar year. Roughly 3 percent of the returns filed each year are for a previous tax year. In other words, the returns are eventually filed with the IRS, and generally within 2 years of the due date. Because of the way returns are selected for this panel, these returns will eventually be sampled and included in the panel file. But this presents SOI with an interesting publication issue. Should the tabulation of panel data be held up for 2 years while we await the addition of 3 percent of 1 year’s data? For example, the file used for this paper is only complete for the period 1999 to 2001. This is a topic for further research. u An Analysis of Panel Units That Did Not Change Marital Status from 1999 to 2003 The first step is to subset the file to only those panel units where there are returns present for all 5 years of the study. This is not a required step in analyzing panel data. For example, one might want to examine only two points in time, 1999 and 2003, in which case the file would only need to be subset to returns where both of those years were present. But for this paper, the 5-year average Adjusted Gross Income (AGI) is computed and used in subsequent tables, and, in order to keep the basis Figure 1—Derivation of 1999-2003 Edited Panel Sample Used in Subsequent Tabulations At least one Column (1) and Column (2) and return present only one return the same marital status in all years present in each year in all years TaxYear (1) (2) (3) 1999 127,029,487 127,029,487 127,029,487 1999 through 2000 120,887,311 119,794,388 114,807,823 1999 through 2001 115,810,399 113,770,493 104,860,374 1999 through 2002 111,048,409 108,251,388 96,043,680 1999 through 2003 105,938,164 102,549,251 87,617,774 Notes: * 2002 and 2003 data are for returns received by IRS through Calendar Year 2004. Additional returns for 2002 and 2003 were filed in Calendar Years 2005 and 2006. * Married filing separately returns have been removed in columns 2 and 3 to simplify processing. * Base-year prior-year returns (approximately 9,000 weighted returns) have been removed. * Base-year single panel members who married another panel member in a subsequent year (approximately 4,000 weighted returns) have been removed. 1999-2003 Edited Panel PANID Present PANID Present in all Years & Present all years Only one return - 20 - TaxYear Same Marital Status tHe tax year 1999-2003 individual income tax return Panel The second step is to subset the file to those panel units where a return is filed in every year and only one return is filed each year. As is shown in Figure 1, by 2003, this step removes another 3.4 million returns from the panel. These 3.4 million returns generally represent joint filers who divorced and where each taxpayer now files independently of his or her former spouse and couples who on at least one occasion during this 5-year period filed using a marital status of married filing separately. Note that it is possible to add items from a married couple’s two married filing separately returns to generate a combined return, but this process was not undertaken for this paper. The final step is to subset the file to those panel units where a return is filed in every year and only one return is filed each year and where the marital status does not change. As Figure 1 shows, 14.9 million panel units were removed in this step. Only 87.6 million panel units remain. They generally consist of taxpayers who married during the 1999-2003 period or married couples where one of the spouses died during this period. As Table 1 shows, in order to create the database that will be used for the subsequent tabulations in this paper, 31 percent of the panel units or base year returns, accounting for 19.4 percent of base year AGI, have been removed. Further research must be conducted to understand the impact of removing these panel units, including answering an important fundamental question: is it even legitimate to produce tabulations where 31 percent of the units have been removed. And if so, what data about the 31 percent should also be presented? deflated to 1999 levels using the price deflator applied in other SOI Individual taxation data [3]. It should be noted that returns filed by dependents are included in Table 2. If an individual can be claimed as a dependent by another taxpayer, yet has income sufficient to require the filing of a return, the individual is required to file a tax return that is separate from the return on which he or she was claimed as a dependent. In the sample design of this panel, as in the standard SOI individual cross-sectional samples, no attempt was made to create a separate sample stratum for dependent returns. Thus, if sampled, a dependent return represents a unique panel unit as does the return, if sampled, on which that individual was listed as a dependent. Dependents, however, may exhibit significant income changes when they move from dependent status to independent tax filer. For example, a college student earning $4,000 a year at McDonald’s may graduate and earn $40,000 in his or her first professional job. In Table 2, this situation cannot be separated from the case of an adult who is 35 years old and supporting a family who moves from an income of $4,000 in 1999 to $40,000 in 2003. Consequently, Table 3 excludes returns filed by base year dependents. This eliminates another 7.2 million panel units. But as can be seen from comparing both tables, the reduction in panel units is almost exclusively in the $1 under $10,000 AGI class. A possible concern with Table 3 is that it only presents two points in time. A taxpayer may have earned $50,000 in 1999 and $50,000 in 2003 indicating no real change in income. But what if the taxpayer earned only $10,000 in 2000, 2001, and 2002? The 5-year average income is significantly different than the income at the beginning and the end points of the study period. Consequently, Table 4 is classified by the 1999 AGI and by the 5-year average AGI (in 1999 dollars). As mentioned earlier in the paper, Table 4 is the reason why, in constructing the database of panel units to be used in this study, only panel units where a return was filed for the entire 5-year period were used. As noted earlier, another alternative would be to ease this restriction and develop an imputation method for the missing data. Such an approach was beyond the scope of this paper but should be explored in future research. Imputations of this nature u 1999-2003 Edited Panel Tables Table 2 is probably the most basic and straightforward panel tabulation that it is possible to produce. It is produced using the 87.6 million weighted panel units where each panel unit filed one and only one return for each year of the 5-year period under study and where each panel unit maintained the same marital status for the entire 5-year period. The panel units are classified by the AGI shown on the 1999 return and by the AGI shown on the 2003 return. The 2003 AGI amounts, as well as all other amounts shown in this paper, have been - 21 - Weber may become essential as the panel ages and more panel units are found to be missing at least one return over the course of the study and thus reducing the number of panel units available for tabulations such as Table 4. Finally, another way to present the 5-year average AGI is in terms of the percentage change from the 1999 AGI. This has been done in Table 5. Statistics Section, Government Statistics Section, American Statistical Association, Alexandria, VA. [2] For additional information on the sample design of the annual Complete Report sample, see Internal Revenue Service, Statistics of Income Individual Income Tax Returns, Publication 1304 (1999), “Section 2: Description of Sample.” AGI is shown in constant dollars, calculated using the U.S. Bureau of Labor Statistics consumer price index for urban consumers. U.S. Department of Labor, Bureau of Labor Statistics, Monthly Labor Review. u Endnotes [1] Weber, Michael (2005), “The 1999 Individual Income Tax Return Edited Panel,” 2005 Proceedings of the American Statistical Association, Social [3] - 22 - tHe tax year 1999-2003 individual income tax return Panel Table 1—1999-2003 Full Edited Panel and Limited Edited Panel Differences Full 1999-2003 Edited Panel Number of Size of AGI No adjusted gross income......................................... $1 under $10,000....................................................... $10,000 under $20,000.............................................. $20,000 under $30,000.............................................. $30,000 under $40,000.............................................. $40,000 under $50,000.............................................. $50,000 under $75,000.............................................. $75,000 under $100,000............................................ $100,000 under $200,000......................................... $200,000 under $500,000......................................... $500,000 under $1,000,000...................................... $1,000,000 under $1,500,000................................... $1,500,000 under $2,000,000................................... $2,000,000 under $5,000,000................................... $5,000,000 under $10,000,000................................. $10,000,000 or more................................................. Total........................................................................... Returns 1,016,365 26,210,180 23,966,960 18,359,111 13,368,846 9,812,207 16,897,458 7,755,507 7,188,685 1,891,017 355,710 88,847 38,160 57,547 14,176 8,711 127,029,487 Difference Number of Size of AGI No adjusted gross income......................................... $1 under $10,000....................................................... $10,000 under $20,000.............................................. $20,000 under $30,000.............................................. $30,000 under $40,000.............................................. $40,000 under $50,000.............................................. $50,000 under $75,000.............................................. $75,000 under $100,000............................................ $100,000 under $200,000......................................... $200,000 under $500,000......................................... $500,000 under $1,000,000...................................... $1,000,000 under $1,500,000................................... $1,500,000 under $2,000,000................................... $2,000,000 under $5,000,000................................... $5,000,000 under $10,000,000................................. $10,000,000 or more................................................. Total........................................................................... Returns 469,149 12,828,991 9,013,545 5,845,426 3,668,417 2,227,449 3,014,590 1,102,205 916,726 251,011 45,766 12,068 5,058 7,837 2,053 1,422 39,411,713 Amount of AGI (13,874,990) 61,349,284 132,599,506 144,237,142 127,144,988 99,027,042 182,512,574 94,321,971 118,481,487 71,761,851 30,922,895 14,611,433 8,705,708 23,435,069 14,064,871 33,815,615 1,143,116,446 Amount of AGI -49,057,319 132,336,387 357,434,358 453,687,690 464,230,987 438,993,580 1,031,747,639 666,429,881 944,083,593 546,818,812 241,057,746 107,343,480 65,801,348 172,372,870 97,281,129 215,765,177 5,886,327,358 Limited 1999-2003 Edited Panel Number of Returns 547,216 13,381,189 14,953,415 12,513,685 9,700,429 7,584,758 13,882,868 6,653,302 6,271,959 1,640,006 309,944 76,779 33,102 49,710 12,123 7,289 87,617,774 Number of Returns 46.2% 48.9% 37.6% 31.8% 27.4% 22.7% 17.8% 14.2% 12.8% 13.3% 12.9% 13.6% 13.3% 13.6% 14.5% 16.3% 31.0% Amount of AGI -35,182,329 70,987,103 224,834,852 309,450,548 337,085,999 339,966,538 849,235,065 572,107,910 825,602,106 475,056,961 210,134,851 92,732,047 57,095,640 148,937,801 83,216,258 181,949,562 4,743,210,912 Amount of AGI 28.3% 46.4% 37.1% 31.8% 27.4% 22.6% 17.7% 14.2% 12.5% 13.1% 12.8% 13.6% 13.2% 13.6% 14.5% 15.7% 19.4% Percentage Difference - 23 - Weber Table 2—Tax Year 1999 filers present in 2000, 2001, 2002, and 2003 with no change in marital status by 1999 AGI class and 2003 AGI class in 1999 dollars 2003 AGI Class Number of Returns $1 No 1999 AGI Class No adjusted gross income................... $1 under $10,000................................. $10,000 under $20,000........................ $20,000 under $30,000........................ $30,000 under $40,000........................ $40,000 under $50,000........................ $50,000 under $75,000........................ $75,000 under $100,000...................... $100,000 under $200,000.................... $200,000 under $500,000.................... $500,000 under $1,000,000................. $1,000,000 under $1,500,000.............. $1,500,000 under $2,000,000.............. $2,000,000 under $5,000,000.............. $5,000,000 under $10,000,000............ $10,000,000 or more........................... Total.................................................... Total 547,216 13,381,189 14,951,380 12,513,684 9,700,429 7,584,758 13,882,868 6,653,302 6,271,958 1,640,006 309,944 76,779 33,102 49,710 12,123 7,289 87,615,738 AGI 214,867 323,254 162,757 92,699 53,742 43,461 52,788 31,444 38,883 31,180 8,161 2,259 1,405 2,340 872 635 1,060,748 under $10,000 102,604 6,080,426 2,767,133 963,453 428,201 220,012 224,307 82,435 66,738 12,752 2,629 750 225 468 70 17 10,952,219 $10,000 under $20,000 61,466 4,256,066 6,933,350 2,558,577 965,912 457,085 444,776 123,614 76,925 20,562 3,949 733 676 540 143 53 15,904,426 $20,000 under $30,000 40,622 1,739,812 3,317,825 4,851,252 1,927,920 705,715 678,126 184,404 109,986 25,463 1,908 450 468 631 127 37 13,584,747 $30,000 under $40,000 35,825 560,508 1,053,740 2,620,841 3,214,677 1,416,150 1,067,007 205,300 132,428 20,003 2,698 1,412 450 475 207 47 10,331,771 $40,000 under $50,000 22,864 191,054 401,280 769,204 1,843,690 2,228,147 1,861,221 346,463 182,893 20,508 3,565 959 225 833 84 54 7,873,045 $50,000 under $75,000 30,292 177,047 230,442 495,761 1,043,636 2,162,117 6,739,803 1,700,573 692,323 91,604 12,991 2,428 1,195 1,256 375 127 13,381,971 $75,000 under $100,000 10,536 30,662 45,944 97,250 150,678 262,504 2,222,558 2,591,549 1,146,460 104,496 9,220 1,855 953 1,465 312 131 6,676,572 2003 AGI Class Number of Returns $100,000 under 1999 AGI Class No adjusted gross income............................................ $1 under $10,000.......................................................... $10,000 under $20,000................................................. $20,000 under $30,000................................................. $30,000 under $40,000................................................. $40,000 under $50,000................................................. $50,000 under $75,000................................................. $75,000 under $100,000............................................... $100,000 under $200,000............................................. $200,000 under $500,000............................................. $500,000 under $1,000,000.......................................... $1,000,000 under $1,500,000....................................... $1,500,000 under $2,000,000....................................... $2,000,000 under $5,000,000....................................... $5,000,000 under $10,000,000..................................... $10,000,000 or more.................................................... Total............................................................................. $200,000 15,653 14,322 33,182 47,811 63,304 78,236 549,973 1,311,996 3,354,523 501,362 43,631 8,873 4,008 4,605 904 436 6,032,817 $200,000 under $500,000 7,457 7,982 4,994 14,661 8,651 9,153 35,273 68,024 438,767 678,530 97,450 17,849 5,833 7,173 1,691 849 1,404,338 $500,000 under $1,000,000 3,382 56 728 2,170 17 6,861 5,471 23,075 109,881 89,525 19,466 4,990 7,306 1,081 622 274,631 2,177 170 2,030 6,785 11,920 19,717 9,878 4,937 4,783 850 380 63,886 $1,000,000 under $1,500,000 253 5 $1,500,000 under $2,000,000 875 2,173 5,039 6,552 4,195 3,084 4,720 638 299 27,575 6,644 5,668 5,007 3,532 10,053 2,280 988 34,525 5 $2,000,000 under $5,000,000 342 5 $5,000,000 under $10,000,000 153 49 1,247 588 748 2,175 1,802 960 7,721 $10,000,000 or more 27 12 1,031 78 372 887 686 1,653 4,746 - 24 - tHe tax year 1999-2003 individual income tax return Panel Table 3—Nondependent Tax Year 1999 filers present in 2000, 2001, 2002, and 2003 with no change in marital status by 1999 AGI class and 2003 AGI class in 1999 dollars 2003 AGI Class Number of Returns $1 No 1999 AGI Class No adjusted gross income................. $1 under $10,000............................... $10,000 under $20,000...................... $20,000 under $30,000...................... $30,000 under $40,000...................... $40,000 under $50,000...................... $50,000 under $75,000...................... $75,000 under $100,000.................... $100,000 under $200,000.................. $200,000 under $500,000.................. $500,000 under $1,000,000............... $1,000,000 under $1,500,000............ $1,500,000 under $2,000,000............ $2,000,000 under $5,000,000............ $5,000,000 under $10,000,000.......... $10,000,000 or more......................... Total.................................................. Total 496,602 7,291,321 14,138,652 12,401,452 9,658,255 7,572,662 13,866,782 6,647,392 6,263,968 1,638,337 308,924 76,553 32,989 49,572 12,113 7,286 80,462,859 AGI 195,761 151,389 144,810 82,708 47,747 41,454 50,781 29,474 36,913 31,180 8,161 2,259 1,405 2,340 872 635 827,890 under $10,000 87,897 3,227,229 2,564,971 937,417 408,131 220,012 220,367 82,435 66,738 12,752 2,459 750 225 468 70 17 7,831,937 $10,000 under $20,000 57,329 2,438,563 6,730,330 2,544,513 959,854 455,078 444,776 123,614 74,955 20,562 3,779 733 676 540 143 53 13,855,497 $20,000 under $30,000 36,345 932,795 3,128,836 4,829,287 1,925,940 703,708 676,097 182,434 109,986 24,907 1,908 450 468 614 127 37 12,553,939 2003 AGI Class Number of Returns $100,000 under 1999 AGI Class No adjusted gross income........................................... $1 under $10,000........................................................ $10,000 under $20,000............................................... $20,000 under $30,000............................................... $30,000 under $40,000............................................... $40,000 under $50,000............................................... $50,000 under $75,000............................................... $75,000 under $100,000............................................. $100,000 under $200,000........................................... $200,000 under $500,000........................................... $500,000 under $1,000,000........................................ $1,000,000 under $1,500,000..................................... $1,500,000 under $2,000,000..................................... $2,000,000 under $5,000,000..................................... $5,000,000 under $10,000,000................................... $10,000,000 or more................................................... Total............................................................................ $200,000 15,653 10,325 29,184 43,795 63,304 76,229 549,973 1,311,996 3,352,553 501,362 43,461 8,873 4,008 4,587 899 436 6,016,638 $200,000 under $500,000 7,274 6,002 2,996 14,661 8,651 9,153 35,273 68,024 438,767 677,973 97,450 17,849 5,777 7,173 1,691 849 1,399,564 $500,000 under $1,000,000 3,382 56 728 172 17 6,861 5,471 23,075 109,881 89,525 19,466 4,990 7,306 1,081 622 272,633 2,177 170 2,030 6,785 11,920 19,717 9,878 4,937 4,783 850 380 63,886 $1,000,000 under $1,500,000 253 5 . 2,173 5,039 6,552 4,195 3,084 4,720 638 299 27,575 6,644 5,668 4,837 3,532 9,984 2,280 986 34,284 $1,500,000 under $2,000,000 875 5 $2,000,000 under $5,000,000 342 5 $5,000,000 under $10,000,000 153 49 1,247 588 748 2,140 1,797 959 7,680 $10,000,000 or more 27 12 1,031 78 372 887 686 1,653 4,747 $30,000 under $40,000 34,034 264,682 937,373 2,606,777 3,208,625 1,416,150 1,067,007 203,330 132,428 20,003 2,698 1,412 450 475 207 47 9,895,700 $40,000 under $50,000 18,587 114,827 345,113 761,172 1,843,690 2,226,113 1,861,221 346,463 182,893 20,508 3,565 959 225 833 84 54 7,726,307 $50,000 under $75,000 28,153 122,804 214,340 485,712 1,041,619 2,160,083 6,733,728 1,700,573 690,242 91,048 12,652 2,371 1,139 1,256 375 127 13,286,222 $75,000 under $100,000 10,536 22,649 39,967 95,232 150,678 262,504 2,220,524 2,591,549 1,146,460 104,496 9,051 1,855 953 1,465 312 131 6,658,361 - 25 - Weber Table 4—Nondependent Tax Year 1999 filers present in 2000, 2001, 2002, and 2003 with no change in marital status by 1999 AGI class and average 1999-2003 AGI class in 1999 dollars 1999-2003 Average AGI Class Number of Returns $1 No 1999 AGI Class No adjusted gross income............... $1 under $10,000............................. $10,000 under $20,000.................... $20,000 under $30,000.................... $30,000 under $40,000.................... $40,000 under $50,000.................... $50,000 under $75,000.................... $75,000 under $100,000.................. $100,000 under $200,000................ $200,000 under $500,000................ $500,000 under $1,000,000............. $1,000,000 under $1,500,000.......... $1,500,000 under $2,000,000.......... $2,000,000 under $5,000,000.......... $5,000,000 under $10,000,000........ $10,000,000 or more....................... Total................................................ Total 496,602 7,291,321 14,138,652 12,401,452 9,658,256 7,572,662 13,866,782 6,647,392 6,263,968 1,638,337 308,924 76,553 32,989 49,572 12,113 7,286 80,462,860 AGI 244,713 77,293 32,600 6,453 6,394 4,796 11,631 177 2,802 4,541 821 12 100 215 61 12 392,619 5,060,111 56 14,675,103 under $10,000 103,300 3,733,190 1,103,155 92,175 16,461 6,173 2,140 556 2,081 766 56 $10,000 under $20,000 53,236 2,783,108 9,341,917 2,054,290 318,777 77,148 37,306 8,136 556 619 5 5 13,267,454 $20,000 under $30,000 38,361 502,121 2,968,718 7,156,297 1,905,998 448,226 219,033 25,391 2,140 1,113 56 $30,000 under $40,000 16,612 122,257 472,937 2,465,181 4,903,657 1,434,585 597,781 96,145 31,397 2,098 170 10,142,820 $40,000 under $50,000 14,740 40,331 125,178 425,060 1,930,748 3,551,698 1,822,815 147,910 48,824 3,194 8,110,499 13,877,882 6,781,986 $50,000 under $75,000 11,493 22,531 65,591 158,102 516,779 1,926,940 9,240,646 1,577,060 342,598 15,341 783 17 10 $75,000 under $100,000 3,829 4,288 18,796 26,034 46,365 98,253 1,716,485 3,755,049 1,077,802 35,019 56 1999-2003 Average AGI Class Number of Returns $500,000 under 1999 AGI Class No adjusted gross income......................................... $1 under $10,000...................................................... $10,000 under $20,000............................................. $20,000 under $30,000............................................. $30,000 under $40,000............................................. $40,000 under $50,000............................................. $50,000 under $75,000............................................. $75,000 under $100,000........................................... $100,000 under $200,000......................................... $200,000 under $500,000......................................... $500,000 under $1,000,000...................................... $1,000,000 under $1,500,000................................... $1,500,000 under $2,000,000................................... $2,000,000 under $5,000,000................................... $5,000,000 under $10,000,000................................. $10,000,000 or more................................................. Total.......................................................................... 281,535 2,072 11,365 77,690 139,607 30,609 9,257 10,437 46 $1,000,000 260 170 5 17 $1,000,000 under $1,500,000 262 556 81 6,204 22,584 20,003 8,023 9,177 1,109 11 68,010 2,030 1,533 4,134 6,481 5,487 7,508 1,835 15 29,238 $1,500,000 under $2,000,000 39 170 5 3,476 4,335 4,389 5,117 18,720 4,949 2,116 43,214 $2,000,000 under $5,000,000 107 5 $5,000,000 under $10,000,000 116 24 1,216 308 505 2,337 3,306 2,054 9,867 . 242 544 765 3,069 4,658 6,157,419 $10,000,000 or more 12 25 $100,000 under $200,000 4,628 6,200 7,486 17,683 12,503 20,056 201,309 1,013,742 4,405,489 455,894 11,977 356 79 17 $200,000 under $500,000 4,894 2,098 170 556 4,062 17,629 21,156 336,803 1,030,824 123,159 14,340 4,174 537 31 5 1,560,438 - 26 - tHe tax year 1999-2003 individual income tax return Panel Table 5—Tax Year 1999 nondependent filers present in 2000, 2001, 2002, and 2003 with no change in marital status by 1999 AGI class and average 1999-2003 AGI class in 1999 dollars 1999-2003 Average Indexed AGI Percentage Change from 1999 AGI Negative 1999 AGI Class $1 under $10,000........................................... $10,000 under $20,000.................................. $20,000 under $30,000.................................. $30,000 under $40,000.................................. $40,000 under $50,000.................................. $50,000 under $75,000.................................. $75,000 under $100,000................................ $100,000 under $200,000.............................. $200,000 under $500,000.............................. $500,000 under $1,000,000........................... $1,000,000 under $1,500,000........................ $1,500,000 under $2,000,000........................ $2,000,000 under $5,000,000........................ $5,000,000 under $10,000,000...................... $10,000,000 or more...................................... Total............................................................... -100% 77,293 32,600 6,453 6,394 4,796 11,631 177 2,802 4,541 821 12 100 215 61 12 147,906 75%-100% 14,570 12,008 13,994 8,433 10,187 6,914 8,692 10,159 13,940 5,717 2,841 2,312 4,484 2,058 1,832 118,139 50%-75% 88,039 171,947 172,635 163,984 117,338 263,230 152,127 231,935 125,620 47,946 18,023 8,540 14,488 3,835 2,199 1,581,886 25%-50% 318,916 1,116,009 1,051,765 827,310 659,362 1,052,621 571,149 822,798 363,488 72,676 18,383 7,411 10,533 2,263 1,234 6,895,918 Positive 1999 AGI Class $1 under $10,000.............................................................................. $10,000 under $20,000..................................................................... $20,000 under $30,000..................................................................... $30,000 under $40,000..................................................................... $40,000 under $50,000..................................................................... $50,000 under $75,000..................................................................... $75,000 under $100,000................................................................... $100,000 under $200,000................................................................. $200,000 under $500,000................................................................. $500,000 under $1,000,000.............................................................. $1,000,000 under $1,500,000........................................................... $1,500,000 under $2,000,000........................................................... $2,000,000 under $5,000,000........................................................... $5,000,000 under $10,000,000......................................................... $10,000,000 or more......................................................................... Total 0 .1-25 % 1,178,392 4,650,586 4,973,418 3,894,720 3,289,105 5,958,052 2,684,583 2,169,316 361,686 49,727 9,823 3,828 4,661 1,150 564 29,229,611 25%-50% 952,202 1,884,650 1,385,268 880,707 523,171 781,723 303,025 343,000 110,951 17,816 3,943 1,848 2,541 558 256 7,191,660 50%-75% 637,414 929,415 438,329 255,018 116,643 152,324 74,243 88,947 52,451 7,575 2,999 829 1,235 199 113 2,757,735 75%-100% 542,640 486,283 175,801 81,405 57,437 62,679 24,924 50,879 18,622 6,455 1,008 509 615 116 57 1,509,432 100% 2,447,793 670,422 223,016 93,464 41,206 70,011 46,047 86,443 32,412 12,208 2,775 1,519 1,758 300 166 3,729,540 0 .1-25 % 1,034,061 4,184,732 3,960,772 3,446,821 2,753,416 5,507,597 2,782,426 2,457,690 554,625 87,984 16,747 6,094 9,042 1,572 853 26,804,431 Total 7,291,321 14,138,652 12,401,452 9,658,256 7,572,662 13,866,782 6,647,392 6,263,969 1,638,337 308,924 76,553 32,989 49,572 12,113 7,286 79,966,259 1999-2003 Average Indexed AGI Percentage Change from 1999 AGI Note: This table exclude filers with "No adjusted gross income" for Tax Year 1999. - 27 - Creativity and Compromise: Constructing a Panel of Income and Estate Tax Data for Wealthy Individuals* Barry W. Johnson and Lisa M. Schreiber, Internal Revenue Service T he Statistics of Income Division (SOI) of the IRS collects statistical data from all major Federal tax and information returns that are used by both the Congressional and Executive branches of the Government to evaluate and develop tax and economic policy. Among these are annual studies of Form 1040, U.S. Individual Income Tax Return, and Form 706, United States Estate (and Generation–Skipping Transfer) Tax Return. Form 1040 is filed annually by individuals or married couples to report income, including wages, interest, dividends, capital gains, and some types of business income. In 1987, SOI undertook a major revision of the sample of Forms 1040 included in its annual studies in order to include a panel component, along with the usual cross-sectional sample. Cross-sectional samples provide reliable coverage of population totals and support annual budget projections as well as a wide range of other research; panels are more useful for estimating behavioral responses to hypothetical tax law changes. The new sample design was created to include all members of a tax family (primary and secondary filers and their dependents) in the panel, and represented the cohort of tax families filing returns in 1988 for Tax Year 1987. It included 39 strata based on income, filing status, and total receipts from businesses and farms (see Czajka and Schirm, 1991; Schirm and Czajka, 1991). For the base year, the initial SOI Form 1040 sample included 114,700 returns, 88,000 of which were panel members, not counting returns filed by dependents, which were added at a later time. In 1994, the sample for SOI’s annual estate tax studies was changed so that data from any Form 706 filed for a deceased 1987 Family Panel member would be collected. A Federal estate tax return, Form 706, must be filed for every U.S. decedent whose gross estate, valued on the date of death, combined with certain lifetime gifts made by the decedent, equals or exceeds the filing threshold applicable for the decedent’s year of death. The return must be filed within 9 months of a decedent’s death, although a 6-month extension is often requested and granted. All of a decedent’s assets, as well as the decedent’s share of jointly owned and community property assets, are included in the gross estate for tax purposes and reported on Form 706. Also reported are most life insurance proceeds, property over which the decedent possessed a general power of appointment, and certain transfers made during life. Assets are valued on the day of the decedent’s death, although an estate is also allowed to value assets on a date up to 6 months after a decedent’s death if market values decline. Special valuation rules and a tax deferral plan are available to an estate that is primarily composed of a small business or farm. Expenses and losses incurred in the administration of the estate, funeral costs, the decedent’s debts, bequests to a surviving spouse, and bequests to qualified charities are all allowed as deductions against the estate for the purpose of calculating the tax liability. u The Tax Family Concept The initial unit of observation for the SOI 1987 family panel was defined as a tax family, which included a taxpayer, spouse, and all dependents (not limited to children) claimed by either. Thus, a tax family could represent single filers (widowed, divorced or separated, or those who were never married), as well as married filers and their dependents. Dependents did not need to live in the same household as the parent to be included in the tax family; however, information on dependents whose incomes fell below the filing threshold was generally not available unless reported on the parent’s return. Coresident family members who were not claimed as dependents were not included in the tax family. An interesting complication of the tax family concept is the treatment of married couples who, for various reasons, *Johnson, Barry W. and Schreiber, Lisa M. (2006), “Creativity and Compromise: Constructing a Panel of Income and Estate Tax Data for Wealthy Individuals,” American Statistical Association, Proceedings, Section on Survey Research Methods, (forthcoming). - 29 - JoHnSon and ScHreiber elected to file separately. For the purposes of the SOI panel, only the partner whose separately filed return was selected into the sample in 1988 was included in the panel; the only way for both spouses of a married couple filing separately in 1988 to have been permanently included in the family panel was for returns filed by each spouse to have been independently selected. Thus, the tax family differs significantly from the more common “household” measure used by many national surveys (Czajka and Schirm, 1993) [1]. the primary filer of a return are available to most users of this dataset. Special permission was required to gain access to tables that link the actual SSN with the masked version. Combining data from SOI and the CDW, a total of 72,373 income tax returns filed for Tax Years 19872003 were available for the FPDD. Ideally, an income tax return would be available for every tax period between 1987 and a decedent’s year of death. For 98.2 percent of decedents, this was the case. For 1.3 percent of all decedents, only 1 return was missing from the time series 1987 through the last full year prior to death, leaving only a handful of decedents for whom more than 1 return was missing from the panel [5]. A panel sample of income tax filers, the elements of which have at their core two common factors, that of being sampled based on 1987 reported income and that of having an estate tax return filed sometime after that, poses interesting analytical challenges. Two of these relate to selecting appropriate reference periods and determining how to treat changes in tax family composition over time. In addition, the selection criteria for inclusion in the FPDD changed during the sample period due to changes in the estate tax filing threshold, which ranged from $600,000 in gross assets in 1994 to $1.5 million in 2004. Another important consideration is that only a decedent’s share of a married couple’s assets is reported on an estate tax return, while income tax returns for married couples who file jointly report income attributable to both partners. Because income tax data were obtained from two different sources, there are also variations in the available data items from different tax years, subtle differences in data definitions, and differences in data quality. Finally, with a few exceptions, only income subject to taxation is reported on a tax return, and that reported income may be subject to both accidental and intentional misreporting by the taxpayer. The FPDD includes individual income tax data for Tax Period 1987 for all sampled tax families by definition. It also includes an estate tax return for at least one member of each tax family. This suggests two relevant reference periods for research purposes, either 1987 or the year of death reported on the estate tax return. Selecting 1987 as the reference period is advantageous for u The Data Between 1987 and 2004, there were 6,614 Federal estate tax returns filed for 1987 Family Panel members or visitors [2]. Of these, 5,659 estate tax returns were identified as having been filed for permanent 1987 Individual Family Panel members who died between 1994 and 2004 [3]. These 5,659 decedents form the core of the SOI Family Panel Decedent Data Set (FPDD) [4]. Individual income tax data were collected by SOI for the 1987 Family Panel from Tax Year 1987 through Tax Year 1996. SOI data consist of both the set of data items that are collected for administrative processing of Form 1040 and all attachments, as well as many more detailed data items required for complex statistical and economic analysis of taxpayer behavior. In addition, data collected by SOI are extensively tested and adjusted to minimize nonsampling error related to taxpayer mistakes and errors introduced during the data transcription process. For tax years after 1996, SOI continued to collect administrative data related to the Family Panel members, but due to problems of panel drift decided to discontinue SOI processing of panel member returns, electing instead to develop new panels based on lessons learned from this initial exercise. The most convenient source of the administrative data for 1997 to 2004 is the Compliance Data Warehouse (CDW) maintained by the IRS Office of Research. The CDW houses, among other things, a complete archive of administrative data for Form 1040 and selected attachments in a normalized relational database. Its primary purpose is generalized statistical research on taxpayer behavior, so that very little information which can be used to identify individual taxpayers is available. In fact, only a four-digit name control and a masked Social Security number (SSN) for - 30 - creativity and comPromiSe: conStructing a Panel of income and eState tax data some research because the probability of being selected into the file is known, making it theoretically possible to produce population estimates from the file. However, since wealth valuation data in the file are for deaths between 1994 and 2004, the time series of income data vary from about 7 years to 17 years, which might be limiting for certain types of analysis. Because one of the prime features of the FPDD is the connection of income to wealth, the date of death—that is, the date for which wealth data are available—is also an attractive reference period. The income stream that would be most relevant in this case would be income reported in the years immediately prior to death. Focusing on income in this way would be appropriate for studying changes in income sources and savings habits as individuals approach the end of their lives, and analyzing the relationship between wealth and realized income. Given that years of death in the FPDD range from 1994-2004, a disadvantage of this approach is the difficulty of controlling for intertemporal differences in economic conditions that affect rates of return and therefore influence portfolio allocation decisions. This dynamic nature of portfolio allocation decisions, often indicated by the realization of capital gains, also makes it difficult to align income earned in one period with assets observed in another, even when the two periods are relatively close. Longitudinality introduces problems with the tax family concept because, over time, a filing unit may change composition, which is usually accompanied by changes in filing status (Czajka and Radbill, 1995). For example, married persons divorce, single persons marry, couples who customarily file jointly may elect to file separately and vice versa, dependent filers may file independently, or one spouse of a married couple may die. Tax families for married persons can be particularly complex. As a result, an individual might appear in the panel as: a primary filer on a joint return married to an original panel member or visitor (spouse who entered the panel after 1988); a married primary filer on a separate return whose spouse may or may not be in the panel; a secondary filer on a joint return (married to an original panel member or to a visitor); and as a single filer. The longer the time series is carried forward, the greater the possibility for combinations of these events to occur. There are a number of strategies for handling these changes in tax family composition. The most straightforward is to limit analysis to only those filing units that do not change over time. However, this approach tends to introduce a bias since the more stable filing units will tend to have more stable incomes. A second approach is to focus analysis on person level data, imputing income for each individual in the tax family. Figures 1 and 2 show panel members grouped into two broad categories, single filers and joint filers, in order to examine changes in filing status over time [6]. Looking first at each panel member’s filing status in 1987, Figure 1 shows that, overall, filing status changed for 24.6 percent of all filers between 1987 and the year prior to death [7]. There was slightly more stability for single filers, only 15.2 percent of whom filed a joint return at some point during the period; 26.4 percent of joint filers became single filers sometime between 1987 and death. Figure 2 shows each panel member’s filing status in the year prior to death and compares it to income tax returns filed for earlier tax periods. Only filers for whom a Form 1040 was available for at least 7 years prior to death were included in the figure [8]. Using this criterion, filing status was constant for 85.1 percent of all panel members over the 7 years preceding death. Individuals who were single filers at death were much more likely to have changed filing status in the Figure 1—Filing Status Stability, Using 1987 as Reference Year Filing status Single Joint Total Return Filing status unchanged 1987 to 1 year prior to death present 1987 Number Percentage 881 747 84.8 4,778 3,518 73.6 5,659 4,265 75.4 Figure 2—Filing Status Stability Using Year of Death as Reference Year Return filed Number of years prior to death Percentage Filing filing status unchanged year prior to unchanged status death for 7 years 3 5 7 Single 1,865 1,586 1,370 1,186 63.6 Joint 3,744 3,681 3,630 3,588 95.8 Total 5,609 5,267 5,000 4,774 85.1 - 31 - creativity and comPromiSe: conStructing a Panel of income and eState tax data years preceding death than those who were joint filers. Only 63.6 percent of all individuals who were single filers in the year prior to death had been single over the 7 years examined, reflecting both couples for whom one spouse died and those who divorced or separated during the period. Almost 95.8 percent of individuals who were joint filers at death had been married for at least the previous 7 years. percent, of the 4,778 panel decedents who were joint filers in 1987 were male. The mean and median ages of females in the FPDD were 65 and 66, respectively, in 1987 and 76 and 78 at death. The mean and median age for males in 1987 were 63 and 64, respectively, and 75 and 76 at death. These statistics indicate that many of the decedents in the FPDD were at or nearing retirement in 1987, the inception of the panel. For all filing units whose filing status did not change between 1987 and the year prior to death, reported adjusted gross income (AGI) declined over this period, which is not surprising given that most individuals in the panel were transitioning from work into retirement over the period covered by the panel. For single filers, mean AGI declined from almost $2.0 million in 1987 to $980,000 at death. Figure 3 shows that this decline was an overall flattening and downward shift of the AGI distribution for these filers, with relatively little change for those in the lower percentiles and with the largest differences in the middle of the distribution. Median AGI, for example, declined from about $580,000 in 1987 to almost $200,000 in the year prior to death, a decrease of 65.6 percent. A similar pattern is shown in Figure 4 for joint filers, for whom mean AGI declined from $2.2 million to $1.7 million between 1987 and the death of u Descriptive Statistics Despite the limitations and challenges discussed in the previous section, the FPDD gives a unique opportunity to learn more about the way that incomes change as people age and contemplate the end of their lives and also provides a snapshot of the wealth that was the source of a portion of that income. This section briefly describes individuals in the FPDD. For this analysis, filing units are again examined in two broad groups, single filers and joint filers, all estimates are unweighted, and all money amounts have been converted to constant 2001 dollars [9]. There are 5,659 decedents in the FPDD. In 1987, the base year of the panel, 881 were single filers, 48.2 percent of whom were female. The majority, 64.3 Figure 3—Income Distribution in 1987 and Year Prior to Death, Single Filers* 6,000 Adjusted gross income (in thousands) 5,000 4,000 3,000 2,000 1,000 0 10th 20th 30th 40th Median 60th 70th 80th 90th Percentiles of the income distribution * Dollar amounts are unweighted and in constant dollars. - 32 - 1 year prior to death 1987 income JoHnSon and ScHreiber Figure 4—Income Distribution in 1987 and Year Prior to Death, Joint Filers* 6,000 5,000 4,000 3,000 2,000 1,000 0 10th 20th 30th 40th Median 60th 70th 80th 90th Percentiles of the income distribution * Dollar amounts are unweighted and in constant dollars. 1 year prior to death 1987 income Adjusted gross income (in thousands) one partner. Median AGI for joint filers declined nearly 60.0 percent, from almost $930,000 to about $370,000, while AGI for those in the 90th percentile declined less over the period, about 35.0 percent. Figures 5 and 6 decompose AGI into major components for selected years over the 7-year period preceding a panel decedent’s year of death [10]. For single filers, overall, median values for wages, taxable interest and dividends, and income from noncorporate businesses decreased as individuals aged. Median values for tax-exempt interest, derived from investments in bonds issued by State or local governments, also declined, overall, for the 7-year period shown in Figure 5. However, for wealthier decedents, those with $5 million or more in gross assets at death, income from tax-exempt bonds increased over this period. For all single decedents, taxable Social Security, combined with pension and annuity income, increased over time, while gains from sales of capital assets were relatively stable. Figure 6 shows that, while the income distributions for single and joint filers exhibit similar downward shifts over time, the sources of these declines differ between the two groups. For joint filers, income from wages, as well as interest and dividends from taxable investment assets, declined over the 7 years preceding the death of one spouse, but income from most other sources was either stable or increased over this period. Most notable was the relative stability in tax-exempt income for joint filers, overall. For the wealthiest joint filers, however, those where one spouse owned $10 million or more in gross assets at death, tax-exempt income increased by 40 percent over the period examined. For these wealthy filers, income from noncorporate businesses increased by almost 27.0 percent over time. Figures 5-6 showed that, as panel members aged, the share that wage income contributed to AGI decreased, while the patterns of change in income from other sources varied somewhat, depending on filing status and wealth class. It has been noted that the realization of income derived from assets is a more or less voluntary event. Wealthy individuals, those for whom return on investments makes up a relatively large source of income, have the ability to allocate their portfolios in order to take maximum advantage of preferences built into the tax code, to reduce risk, and to vary income significantly according to their own consumption needs. According to Steuerle (1985), the voluntary nature of capital income recognition implies that “taxes paid and benefits received will vary tremendously among persons in fairly identical circumstances.” He goes on to state that, because of the voluntary nature of income recognition, using income - 33 - JoHnSon and ScHreiber Figure 5—Changes in Income Composition, Selected Years Prior to Death, Single Filers* 160000 140000 120000 100000 80000 60000 40000 20000 0 Median values 7 years 5 years 3 years 1 year Taxable Tax-exempt Noncorporate Net capital interest and interest business gain or loss dividends income Taxable Social Security, pensions, and annuities Wages * Dollar amounts are unweighted and in constant dollars. Figure 6—Changes in Income Composition, Selected Years Prior to Death, Joint Filers* 160000 140000 Median values 120000 100000 80000 60000 40000 20000 0 Taxable interest and dividends Tax-exempt Noncorporate interest business income Net capital gain or loss Taxable Social Security, pensions, and annuities Wages 7 years 5 years 3 years 1 year * Dollar amounts are unweighted and in constant dollars. - 34 - creativity and comPromiSe: conStructing a Panel of income and eState tax data as a classifier in statistical analyses will be inaccurate or misleading for many purposes. For many decedents, income reported on a tax return in the year prior to death will be closely correlated with the assets reported on an estate tax return filed at death [11]. It is, therefore, possible to estimate rates of return on various asset classes. Rates of return are estimated as income attributable to each class of assets as reported on Form 1040 and its attachments in the last year prior to death, divided by the value of those assets reported on Form 706. Figure 7 shows median values for estimated rates of return for all capital assets, for investment assets that produce taxable income, and for tax-exempt bonds. For single filers with gross assets under $1 million, the rate of return on capital was 4.27 percent. This rate declined for individuals in higher wealth classes, and was just 2.13 percent for single filers with $10 million in gross assets at death. Likewise, rates of return on investments that produced taxable interest or dividends declined with gross asset size. It is interesting to note, however, that the rate of return on tax-exempt investments was fairly stable for single filers, regardless of their wealth. These trends, when combined with those seen previously in Figures 5 and 6, suggest a systematic reordering of the portfolio, over time, favoring tax-exempt income sources over those that produce taxable Figure 7—Selected Rates of Return One Year Prior to Death, by Size of Gross Assets Asset Size of gross assets Single 2.74 4.27 3.27 2.40 2.13 2.92 3.83 3.08 2.58 2.65 5.72 5.77 5.84 5.72 5.65 Joint 2.84 4.31 3.52 2.48 1.85 2.15 3.01 2.37 2.20 1.77 5.12 5.72 5.49 5.17 4.40 Return on All capital assets Under $1 million $1 million, under $5 million $5 million, under 10 million $10 million or more Return on All taxable bonds Under $1 million and stocks $1 million, under $5 million $5 million, under 10 million $10 million or more Return on tax- All exempt bonds Under $1 million $1 million, under $5 million $5 million, under 10 million $10 million or more income. For joint filers, rates of return show a similar pattern across wealth classes, although there was more variation across wealth categories for rates of return on tax-exempt bonds than was seen for single filers [12]. u Conclusion Panel data consisting of income reported by wealthy taxpayers provide important opportunities to study the ways in which income changes over time. When paired with wealth data from Federal estate tax returns, the resulting data set provides a rare opportunity to learn more about the relationship of wealth to realized income, which is an important consideration in many public policy debates, and about changes in income that occur as people near the ends of their lives. These data, however, present many challenges to researchers, a number of which have been explored in this paper. Techniques for dealing with problems that arise due to the longitudinality of the data set, differences in reporting units on income and estate tax returns for joint filers, the dynamic nature of investment portfolios, and many other challenges must be explored before the full potential of the FPDD can be realized. However, the preliminary statistics presented in this paper suggest that there is much that can be learned by addressing these issues using even the most basic assumptions. u Endnotes [1] Dependents are not included in the analysis presented in this paper. [2] Estate tax returns filed prior to 1994 were identified by matching panel member SSNs to the IRS Master File. Due to the limited amount of estate tax data available from the Master File for these pre-1994 decedents, they are not included in the FPDD. [3] Estate tax returns were filed for an additional 57 panel members, but they were missing key documentation or schedules at the time of SOI processing and had to be rejected. [4] Visitors to the panel were not included in the final dataset since income data were only available for - 35 - JoHnSon and ScHreiber the period of time that they were associated with an original panel member. [5] Missing returns can occur either because a taxpayer was not required to file in a given year, or because of an error in reporting a taxpayer’s SSN. The latter occurred mainly in the case of secondary SSNs in the 1987 panel. After the period covered by this study, the IRS implemented processing improvements that have reduced these types of errors. [6] The category “single” includes filers who were unmarried, widowed, and married individuals who elected to file separately since the data on these returns should reflect income attributable to one individual. [7] The year prior to death is used because a return filed for the year of death would usually reflect income earned during only that portion of the year during which a decedent was alive. [8] “Seven years” is used since that is the maximum number of full-year income tax returns that would be available for 1987 panel members who died in 1994. [9] Values were converted to constant dollars using the GDP chain-type price index. Source: Bureau of Economic Analysis. [10] Only those panel members whose filing statuses did not change over the 7 years preceding their years of death are included in Figures 5 and 6. [11] In some cases, assets that generated income reported in the year prior to death may have been sold and the proceeds either consumed or invested differently prior to reporting on Form 706; however, no attempt to adjust the data was made for this analysis. [12] For joint filers, asset values reported for the decedent spouse were doubled in an attempt to approximate the full value of a married couple’s asset holdings. This approach will likely overstate the combined asset holdings, in aggregate, causing rates of return to be understated somewhat. u References Czajka, John L. and Radbill, Larry M. (1995), “Weighting Panel Data for Longitudinal Analysis,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Washington, DC. Czajka, John L. and Schirm, Allen L. (1991), “CrossSectional Weighting of Combined Panel and Cross-Sectional Observations,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Washington, DC. Czajka, John L. and Schirm, Allen L. (1993), “The Family That Pays Together: Introducing the Tax Family Concept, with Preliminary Findings,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Washington, DC. Schirm, Allen L. and Czajka, John L. (1991), “Alternative Designs for a Cross-Sectional Sample of Individual Tax Returns: The Old and the New,” Proceedings of the Section on Survey Research Methods, American Statistical Association, Washington, DC. Steuerle, Eugene (1985), “Wealth, Realized Income, and the Measure of Well-Being,” in David, Martin and Smeeding, Timothy, editors, Horizontal Equity, Uncertainty, and Economic Well-Being, University of Chicago Press, Chicago, IL. - 36 - 2  Measuring, Monitoring, and Evaluating Internal Revenue Service Data Koshansky Schwartz  Kilss Cecco Monitoring Statistics of Income (SOI) Samples Joseph Koshansky, Internal Revenue Service F or most of its 90-year history, the main function of the Statistics of Income (SOI) Division has been the collection of information for the Department of Treasury and Congress [1]. One of the beneficial practices of a Federal statistical agency, according to the Committee on National Statistics, is its continual development of more useful and timely data, including operational statistics, the latter objective even noted in Internal Revenue Code 6108(a) [2]. SOI has sought ways to improve the quality and timeliness of its tax return information while fulfilling the requests of its primary customers. Over time, it incrementally improved not only the statistical abstraction of information from Federal tax returns, but also the statistical operations associated with producing such information. Moreover, among its various processing tasks, SOI identified the monitoring of its samples of returns from the point of selection to the point of delivery back to the warehouse storage facilities as an essential part of its strategy in achieving its mission. Because SOI functions within a larger bureaucracy, one of its recurring challenges is coordination among the different staffs laboring at tasks at different phases of the SOI workflow process [3]. For example, in May 2006, the Internal Revenue Service (IRS) awarded a contract to a private company to manage the files function at the IRS submission processing centers [4]. This company will store and maintain all the paper documents taxpayers file at each center for an established period after the completion of IRS “pipeline” processing. It will ship the documents to one of the Federal Records Centers at the end of this period, and fulfill requests from IRS offices that need to examine tax and information returns for either administrative or statistical purposes [5]. SOI is one of the major “downstream” requesters of these stored documents since it produces its mandated annual income, financial, and tax information from weekly samples of Federal tax and information returns, which the IRS usually processes during the previous week [6]. A concern this particular competitive sourcing initiative raises is whether SOI will control within 2 weeks - 39 - of selection all of the documents in its weekly samples, and not lose some of the returns to other IRS functions requesting by chance the same return [7]. On the other hand, the company may introduce new inventory methods or delivery techniques with benefits to SOI, such as interchanges of record information about the pulled returns with one of the SOI databases. Of course, this is not the first time SOI has faced a challenge associated with changes in the way the IRS accepts, controls, and processes tax and information returns. Differences in objectives frequently occur between “pipeline processing” and “postpipeline processing” functions, such as SOI. Ironically, the company will return to an earlier mode of operation SOI replaced through its Total Quality Organization (TQO) initiatives in the early 1990s, shipping “cycles” (or large groups) of returns to the SOI edit sites, instead of program-specific workgroups that SOI units in files supplied to the SOI edit unit editors [8]. This paper is a case study of the infrastructure SOI developed to monitor its samples and deal with unexpected events in a bureaucratic setting. It focuses on what happens after the SOI sampling programs select returns for a project (or study). In addition, it provides an account of the SOI efforts to improve the monitoring of its samples of Federal tax and information returns, part of a “Golden Age” in SOI history. Can regular monitoring of the returns in the various samples decrease the length of time SOI controls returns, or reduce the length of time it finds missing returns in the samples, or reduce the length of time it delivers data to its primary customers? Based on interviews, participant observations, documents, and physical information, the paper shows how SOI operating procedures and information databases, and coordination among different staffs, monitor and verify the control and timely processing of specific sets of returns. In the first section of the paper, we provide a brief historical perspective about SOI consolidation efforts and technological advances. Then, we describe the SOI workflow process in the second section. In the third section, we spell out some of the SOI statistical operations and procedures that systematically monitor the SOI workflow process. The fourth section looks at koSHanSky the application of management and statistical concepts to the development of the SOI workflow process; and, then, we conclude with several findings and remarks on how SOI is shaping its future. emphasis on continuous improvement or “Kaizen,” as the Japanese call it…. Examples [include] more flexible and dynamic approaches to data capture, cleaning, and completion” [11]. From this analysis, Scheuren and others on his staff hypothesized that consolidating SOI editing operations at particular IRS service centers would free up resources (staffing, travel, and training), improve editing (abstraction) productivity and quality, and enhance its presence as a data producer within the community of Federal statistical agencies. In May 1990, SOI notified the now ten IRS service centers that it planned to consolidate edit processing for the SOI Corporation and Individual Tax Return programs in six service centers [12]. Four centers would only pull, control, and ship returns to one or more of the six processing centers (down to five in 1992) [13]. In general, the number of returns service centers processed for all of the SOI studies was much smaller than the volume of returns the centers processed for tax liability, administrative, and informational purposes. Competing with other functions for skilled tax examiners to work the SOI programs at the centers, as well as arguing about what IRS or SOI programs merited attention first, were frequent occurrences before the consolidation initiative. Concentrating the editing function at six service centers led to the formation of additional units of SOI editors (former tax examiners and data transcribers) at some of these sites and the growth in the volume of available work at all the sites [14]. Most of these edit units were now dedicated to processing only the returns in SOI samples year round. SOI ensured the volume in each of the six processing centers was sufficient to support an SOI edit unit working full-time on SOI work. Besides the formation of SOI edit units, SOI created “SOI control units,” at least in name, in each of the ten centers’ files warehouses to support its edit units. After regular pipeline processing, each of the centers stored for about 2 years its portion of the total population of returns that filers mailed each year. An SOI control unit consisted of a small group of service center employees, usually working in a miscellaneous unit in the files, whose major tasks were the control, processing, and shipping of returns in SOI samples to the SOI edit units and refiling returns after edit units completed pro-  Consolidation of Work and Technological Advances SOI performed most of its preliminary statistical abstraction, data transcription, and error correction in National Office, district offices (after World War II for a period of time before the expansion in the number of service centers across the country), and the few service centers in operation, but moved operations to the centers as their number increased. Service centers not only processed but also began storing the paper returns in support of other IRS programs, such as Examination, before final consignment to one of the Federal Records Centers. IRS personnel at the different SOI sites, who were available to edit SOI samples once regular pipeline processing work subsided or ended, used paper edit and error register sheets to abstract information from the returns, while National Office analysts produced aggregate statistics and tables from the perfected data for customers [9]. In the 1980s, under the direction of Fritz Scheuren, SOI adopted the Total Quality Organization (TQO) methodology to improve its operations at the service centers and in National Office, primarily in response to a request from analysts in the Office of Tax Analysis (OTA) and Joint Committee on Taxation (JCT) for earlier deliveries of SOI data. SOI analysts identified vital activities and formed cross-functional teams to work on these issues. The staffs in the different branches in SOI National Office looked for ways to develop work processes and data systems that could improve the quality and timeliness of the tax return information they produced for each of the SOI programs within the boundaries of regular IRS pipeline processing. The research included traveling to the service centers to meet with employees for the purpose of identifying, prioritizing, and recommending improvements in SOI control and processing of returns in its various samples [10]. According to Scheuren, “[t]he focus on process quality that Deming and Juran urge, while not really new, is having a revolutionary impact on us, especially in its - 40 - monitoring StatiSticS of income SamPleS cessing these returns. SOI discovered a truly dedicated group of employees, who shared their files expertise and experience in searching for and finding missing returns, as well as assisting National Office analysts in finding additional information about certain returns [15]. While one National Office cross-functional team was working on the consolidation initiative, other teams were developing new online computer applications and installing new hardware at the centers, solely dedicated to SOI processing. Beginning in 1991, SOI procured and installed hardware upgrades and telecommunication equipment for support of online editing, at the Cincinnati and Ogden service centers, and in National Office. Telecommunication lines connected online terminals for the editors in each of the processing centers to the SOI minicomputers in Cincinnati and Ogden, designated SOI minicomputer hub sites. The integration of editing, data transcription, and error correction into a single operation with these online terminals began with several smaller SOI studies (Partnerships, Exempt Organizations, Controlled Foreign Corporations, Foreign Tax Credit, and Individual Sales of Capital Assets) and expanded to the major Corporation and Individual Returns programs. Online editing brought significant improvements in productivity, timeliness, and quality because editors spent much less time waiting for nightly batch-mode feedback on errors and corrections and much more time processing completely sets of the same type of return [16]. Groups of tax examiners became experienced subject-matter experts on how filers completed forms, as well as knowledgeable about the content of the forms in question. Having honed their skills from frequent and consistent editing of a large number of the same type of return, they accelerated processing and improved the quality of the final product—perfected and more meaningful return information [17]. The availability of returns to edit on a continuous flow basis was an important concern now that service centers increased the size of their SOI edit staffs, and in some cases improved the grade structure, to deal with the increase in the volume of work. Would the edit units have enough work? Would the editors’ work habits outpace the delivery of new returns to process? Would waiting for work adversely affect the earlier training and skill levels of the editors? Managers in the SOI edit units identified one of the requirements for successful execution of the new plan as timely delivery of a sufficient amount of returns. Timely delivery of work supported the efforts of centers to commit employees to SOI projects the entire year, so long as SOI work was available. Consequently, another National Office team developed an online database application, called the SOI Automated Control System (SOIACS), to monitor, first the shipment of 1040 returns, then all returns [18]. A next-generation version of the application, now named STARTS, would facilitate the “systematic control” of 1040 returns some service centers would ship to other centers for edit processing, as well as the movement of returns between an edit unit and control unit within the same center [19]. Subsequently, when operational, the application had a computer terminal and printer located in the files of each of the ten service centers and the edit units [20]. It connected the control units with the edit units and both with National Office. Soon after implementation of the application, an edit unit manager’s need to know what returns to edit first (i.e., the editing priority) surpassed the need for timely delivery of returns because SOI began committing to deliver data to its customers by specific dates during the year. The centers needed meaningful information to answer this and other questions. For example, a question an SOI edit unit manager might raise is, “Which returns in the cycle (weekly pull) should we process first?” But a new SOI files clerk might ask, “If another IRS function has the return, can I pick another one on the same shelf (for SOI)?” SOI editors might ask, “What returns do I edit?” or “Where do I move this money amount?” An SOI National Office statistician might ask, “Can we ask the centers to locate the missing returns?” An SOI economist might ask, “Can the centers edit more of the Type XYZ returns (for example, Sample Code 20 or Cross-Sectional returns) before the deadline?” Finally, an SOI scanner might ask, “How do I replace the illegible page?” These questions demanded better monitoring not only of the physical location of the returns while en route to the edit units, but also better visualization of the metainformation of the returns—i.e., information that describes the information about a sampled return [21]. Now that SOI created an IT backbone to support its workflow process, managers asked for more details about what actually was in a cycle of returns [22]. - 41 - koSHanSky  SOI Workflow Process Compared to IRS administrative processing, which captures some information from all of the filed tax returns, SOI studies collect much more information from samples of returns through its transcription and editing. SOI editors add value to the administrative record information the IRS collects. This additional value makes it imperative to control and monitor the samples and continuously improve the entire SOI workflow process to guarantee consistency over time. Similarly, information about the processing tasks adds value to the corresponding returns that flow through the workflow process. The results of the efforts of the TQO teams in collecting information at each phase of the process about the processing tasks; the performers of these tasks; the relative order of the tasks; the possible synchronization of some of the tasks; the flow of information in support of the tasks; and the tracking of the tasks, was not only a better understanding of the process, but also a cache of aggregated information. The SOI workflow process is the general term for the movement of samples of “documents” or “containers of information” (e.g., paper returns, electronic records, and digitized images), through the SOI sampling, controlling, and editing processes [23]. Each of these three major subprocesses, or phases, relate to specific tasks that personnel at the service centers and in National Office execute to produce statistics for publication and delivery to customers. Both operating procedures and computer systems support the efforts of the people involved at each of the phases of the process. This convergence of procedures, databases, and people forms an underlying base, or infrastructure, for the functioning of the workflow process. The process begins when a project analyst adds a new tax or information form to an existing study or initiates a new study with an SOI customer. After the SOI sampling programs at the IRS computing center, or the Ogden Submission Processing Center, selects returns for a particular study, the programs then create sets of output files for loading into both IRS and SOI databases [24]. Phases of the process include selecting documents, pulling documents, monitoring the success rate of pulling documents, finding missing returns, storing documents, scanning documents, photocopying documents, ordering documents, shipping documents, editing documents, managing documents in the edit unit, and releasing documents back to files. The process involves constant change and update. For example, under the new competitive sourcing initiative, the SOI edit units at the centers will assume tasks the SOI control units once performed after the contractor begins managing the Files function at the centers. The infrastructure alleviates some of the problems associated with such a change.  SOI Monitoring Operations The Statistics of Income Automated Return Tracking System (STARTS) is the framework for management of returns and digitized records as they move through the various phases of the SOI workflow process at the centers. This process control system is a structured set of related components (people, procedures, processes, subsystems, databases, reports, etc.) SOI established to accomplish the major task of monitoring its samples from the point of selection to the point of delivery back to files. STARTS (the system) consists of online database applications, as well as standardized business processes, work instructions, forms, and reports, all of which give the different staffs at the centers and in National Office increased visibility into the operations at the centers. The SOI sampling program, sample selection sheets, document chargeout forms, pulled returns, shelved returns, and shipped workgroups of returns, comprise part of a “signal” system for securing and delivering the correct returns in an SOI sample to the right service center for processing at the right time. The other part is the database, developed for predictable and manageable record keeping. Database Management System Borrowing from manufacturing operations, which schedule and track the flow of materials through a process, STARTS (the database application) gives online access to real-time data about one return, or a group of returns (cycles, workgroups, scanned sets, photocopied sets, etc.). Combining aspects of transaction processing, management information, decision support, and - 42 - monitoring StatiSticS of income SamPleS expert systems, the database is a collection of information about SOI samples, which users manage and utilize when making decisions about planning, organizing, and controlling the processing of the samples [25]. Top-level managers are concerned with planning: Will the center meet the corporation program 75-percent cutoff on the scheduled date? Middle-level managers are concerned with organizing: Can the editors in Unit 5 handle the consolidated 1120 returns? Front-line managers are concerned with controlling: Are the editors; documents, scanned images, or electronic records; and inventory and edit applications available to begin editing the corporation returns?  Application of Management and Statistical Concepts A “Golden Age of SOI Development” occurred at the end of the 1980s and the beginning of the 1990s in SOI National Office and the centers, which resulted in an infrastructure that is still in place today. Inhouse “quality” teams of economists, management and program analysts, statisticians, center managers, editors, clerks, and information technology specialists collaborated in the design, development, application, and maintenance of this infrastructure. Based on the research of American experts such as Frederick Winslow Taylor, Frank Bunker Gilbreth, Walter Shewhart, and of the War Department’s Training Within Industry, SOI learned that continuous incremental improvements benefit an organization [26]. Convergence of Aggregated Information Because STARTS (the database application) stores sample information and provides a traceable record of user transactions or interchanges with that information, one example of its functioning is worth noting here. A section of the Internal Revenue Manual (IRM) notes the date the centers must supply transcribed and edited 1040 return information to National Office for “Advance Data” delivery to OTA and JCT. One year earlier, mathematical statisticians produced the sampling specifications for the computer specialists who wrote the programs that selected returns for the sample. Among the possible inputs, the application reads and stores return information that the sampling program at the IRS computing center loaded into the SOI sample control files, or the “One-Week Followup” date a clerk entered in the STARTS cycle control screen. The application applies a set of logic statements (or SOI business rules) to the loaded records, such as, if the Level Code is equal to “1,” or the Continuous Work History Study (CWHS) Code is equal to “1,” assign the return to the “Cross-Sectional” category, or if the sample code of that return is a specific value within a certain range, assign it, as well, to an additional category, called “Complex” edit. Possible outputs include the application generating and displaying inventory totals, such as the number of “Complex Cross-Sectional” returns, which are available for the SOI edit unit manager to order, or permitting the placement of a user-defined set of these “Complex Cross-Sectional” returns into a STARTS editor workgroup. Value SOI increased the value of the tax returns in its samples not only for its customers, but also for its suppliers at the service centers (see Table 1). Table 1—Added Value At Each Phase of Workflow Pull and control documents Document information Location information Cycle information Pull information Store documents Warehouse information Center information Time information Processing information Order and ship documents Return information Project information Edit priority information Edit site information Workgroup information Center information Complexity information Deadline information Process documents Edit information Scan information Photocopy information Critical case information Split-screen information Release documents Quality review information Refiling information - 43 - koSHanSky SOI assigned information, based on descriptive statistics from different operational sources, to each return record to expedite processing. Identifying and storing information about a return, its edit status, and its extra-processing requirements in a database made the fulfillment of requests for any of this information much easier. For example, the set of all possible outcomes of an operation at a particular phase of the process determined whether a return was released immediately after editing, instead of scanned. Consequently, a supply chain concept replaced the original “shipping” concept. The SOI infrastructure moved not only documents, electronic records, or digitized images, but also information from unit to unit, center to center, headquarters to field office. edit units and SOI control units know in advance from the information in the database application what each should provide as updates or requests and what each should expect back as responses. When an edit unit orders 20 editor workgroups in which each workgroup contains ten “Priority 1” corporation returns, it expects the SOI control unit to assemble and send 200 such returns for distribution to five editors. Because the SOI control unit marks a return as “missing” in STARTS if it does not control that return, only what is in its control is available for the SOI edit unit to order in STARTS. Kaizen The consolidation efforts changed SOI into an organization that continues to apply time-compressed, action-oriented improvement methods to its various projects. Many of the components and functions of the STARTS application were the result of the energy generated through users’ participation, creativity, and the pressure to produce rapidly tangible results. Complexity The purpose of the process control system shifted from one where the principal activity is moving documents from one center to another to one where the activity is helping the centers meet the program completion deadlines, which National Office analysts set to provide timely tax return data to its customers. SOI managed complexity, sometimes even reducing it, when it assigned returns in the various project samples to a series of categories. Combinations of these categories made it possible for the managers to break down the amorphous cycles of returns into pieces that are easier to control and work with. Since it is necessary to edit some returns before others, the STARTS application provided the capability to order specific sets of returns, placing them in specific sets of editor workgroups. These combinations supplemented the strata the math statisticians created for sampling.  Conclusion The formation of cross-functional teams at the centers, and between the centers and National Office, and the development of a monitoring system and corresponding just-in-time electronic database application (i.e., STARTS) brought a very strong focus on the entire SOI workflow process. No function could make a change that affected another function unless they had buy-in from that function. Managers, editors, clerks, statisticians, economists, analysts, and computer specialists looked at samples from beginning to end, not just a particular phase. The teams monitored the status of returns as they “flowed” through the workflow process. When the private company begins managing the IRS files warehouses at the centers in late 2006 and sends the first batch of pulled returns to the SOI edit units, days before the arrival, SOI National Office and its SOI edit units across the country will know what returns the SOI sampling programs selected for the various studies. Unfortunately, the company will not exchange electronic records with STARTS per the contract. In addition, SOI will no longer have a presence Standardization The STARTS application allows SOI to standardize certain processing tasks across the projects and the service centers. It acts as a decoder that helps personnel in National Office, the SOI edit units, and the SOI control units to understand each other’s variants of sample processing. The corresponding system makes these different actors work together through the interchange of information. They have to follow certain rules to avoid miscommunication and guarantee that both the SOI - 44 - monitoring StatiSticS of income SamPleS in the files warehouses per the IRS performance work statement. SOI personnel both in National Office and at the edit units at the centers will not know the contents of the shipments until the SOI edit units can open the boxes or scan the carts. If the company transmitted an electronic version of the shipment manifest for loading into the STARTS database application, then the SOI edit units might consider shelving the returns in workgroups for easy distribution to the editors, instead of storing in a traditional files manner (e.g., cycle or type of return). In the future, if an SOI edit unit runs low on work, the STARTS database application could recognize this situation in the inventory and order more. Because this application stores record information for each return in the sample, whether processed as paper, an electronic record, or a digitized image, SOI can easily repurpose the record content, making it accessible from a variety of devices. The database application increased the availability and use of data, consequently helping to improve each center’s decisionmaking and visualize, synchronize, and automate phases of the workflow process. The power in STARTS reports and screens is that they display accurate, consistent, and timely data. SOI built a reporting system so that managers know in real time how they are meeting the needs of SOI customers. The application replaced transactions done by phone, fax, or mail. It replaced collecting and storing data manually in their own way. In the late 1980s, SOI developed online data entry and verification applications, which linked IRS processing sites across the country through a network of computer terminals and databases. It applied this information network concept to the control and monitoring of its samples. This connectivity and the value-added information embedded in each sample record allowed SOI personnel to monitor the status of each tax and information return as it moved through the different phases of the SOI workflow process from the files warehouses to its edit units and back. Incorporating a wide range of information about the sampling criteria, the study objectives and requirements, and the logistical demands associated with processing enhanced the meaning of the samples to the centers (suppliers) and National Office analysts (producers) and assured an acceleration of the collection of data and the delivery of the final products to SOI customers. Monitoring daily the number of missing and available returns can increase the likelihood the quality of the data is high [27].  Acknowledgments The views expressed in this paper represent the opinions and conclusions of the author and do not necessarily represent those of the Internal Revenue Service. The author thanks John Czajka for his comments of an earlier draft of this paper. Any errors that remain are the responsibility of the author.  Endnotes [1] In addition to the Office of Tax Analysis and the Joint Committee on Taxation, another important customer is the Bureau of Economic Analysis. [2] National Research Council (2005), Principles and Practices for a Federal Statistical Agency, Third Edition, Committee on National Statistics, Margaret E. Martin, Miron L. Straf, and Constance F. Citro, editors, Division of Behavioral and Social Sciences and Education, The National Academies Press, Washington, DC, p. 25. In addition, see 26 USC Sec. 6108, Statistical publications and studies, which describes the SOI mandate. [3] The SOI workflow process is the interchange of documents, record information, and tasks through the SOI sampling, controlling, and editing processes. [4] As a stakeholder and customer, SOI hopes to meet with company representatives and the IRS Files Government Project Management Office to discuss pertinent issues about its samples. After announcing the awarding of the contract, the IRS announced two positions, one a senior manager position, the other a supervisory quality assurance specialist. While a company assumed responsibility for the work performed in files, it - 45 - koSHanSky is necessary to manage the relationship between this company and other IRS offices and check the quality of the company’s work, etc. [5] The company will operate at the IRS facilities in Methuen, MA, Fresno, CA, Norcross, GA, Austin, TX, Ogden, UT, Kansas City, MO, Florence, KY, and Philadelphia, PA. The records centers are part of the National Archives and Records Administration. They store the records of a Federal agency. [6] In addition, SOI is a major requester of electronic records, which include electronically-filed records. [7] Competitors for documents include four different business operating divisions: Large and Mid-Size Business (LMSB), Small Business/ Self-Employed (SB/SE), Wage and Investment (W&I), and Tax-Exempt and Government Entities (TEGE). [8] The acronym “TQO” refers to Total Quality Organization, a commitment on the part of an organization to advocate quality and continuous improvement in all its tasks. [9] The general term, “regular pipeline processing,” refers to the actions of IRS workers who handle tax and information returns from the time the documents first arrive at an IRS service center through the posting of information at the IRS Computing Center and finally the shelving of the documents in the files area. [10] SOI wove supplier and customer data into the process improvements. It captured any available information relevant to the SOI projects at the centers. [11] Scheuren, F. (1991), Comment on “The Federal Statistical System’s Response to Emerging Data Needs” by Jack E. Triplett, Journal of Economic and Social Measurement, IOS Press, Volume 17, Numbers 3, 4, p. 190. [12] The 1990 plan for distributing work to the remaining six processing centers had Andover and Brookhaven centers shipping their individual and corporation returns to the center in Ogden. Memphis shipped its individual returns to the Austin center and corporation returns to the center in Cincinnati. Philadelphia shipped both individual and corporation returns to Cincinnati. The Atlanta, Fresno, and Kansas City centers continued to process their samples of individual and corporation returns. Doug Shearer and Dan Trevors coordinated the plans and issued regular status reports to keep management informed of the activities involved in this consolidation. For the Individual program, the consolidation was effective beginning with the Cycle 9053 End-ofYear Tickler (EOYTICK) processing for the Tax Year (TY) l989 Study and continued with the TY 1990 Study, which began with the selection of returns in Martinsburg Computing Center (MCC) Cycle 9104 (January 1991). Consolidation of the Corporation program began earlier with the TY 1989 study commencing only in Atlanta, Austin, Cincinnati, Fresno, Kansas City, and Ogden in August 1990. The nonprocessing centers began shipping their corporation returns to the edit sites later in the year per SOI notification. Beginning in 1992, the edit processing of the returns in the Individual and Corporation programs resided in only five centers, when SOI discontinued editing at the Fresno center. [13] The centers were located in Andover, MA, Brookhaven, NY, Memphis, TN, and Philadelphia, PA. A team of managers from National Office traveled to these centers to discuss issues and concerns of the managers, editors, and clerks. [14] SOI editors abstracted information from returns, including moving some information to the correct fields on the returns. Tax examiners in nonSOI units at the centers checked and prepared for data transcribing those fields on the returns the IRS deemed important in determining tax liability. - 46 - monitoring StatiSticS of income SamPleS [15] Clerks in the SOI control units did not edit returns. Instead, they pulled returns, looked for missing returns, photocopied returns, scanned returns, packaged returns, and shipped returns to list just some of their duties. One manager commented: “I am a Unit Supervisor in a large unit. I have IMF SOI, AIMS, Cycle, Quality Review … as well as pulling and refiling. SOI is just a part of this unit. We have maintained a record of high accuracy and very few missing documents for a few years. This [is] … due to the integrity, dependability, and dedication of the staff assigned to SOI. They have accomplished a lot with very few people. So, what STARTS means to me is reflected in what the staff commented on … If they are happy and satisfied and feel that STARTS helps them perform their duties more efficiently and accurately due to the increased speed and easier access, then I am happy. If they feel that STARTS helps them maintain a low missing record, and this record is reflected on the SOI reports for Andover, then I am happy with STARTS. I do not use STARTS myself, but I do review the reports that these employees generate.” [16] Editors usually waited the next day to receive feedback because centers scheduled SOI batch programs around regular pipeline batch jobs. [17] It is difficult for an editor to maintain his or her skill level if he or she moves frequently from one project to another, though the frequent changes may guarantee work for that employee. [18] The developers considered SOIACS the first step in building a system to manage its samples in an online environment. SOI planned to build subsystems to manage quality, resources, and sample selection as part of the modernization effort because the service center statisticians were retiring or service center management considered them irrelevant. Dan Trevors of the Quality Support Team and Doug Shearer of the Coordination Team shared responsibility for developing the SOI controlling and shipping process. Linda Taylor of the Distributed Processing System - 47 - Team provided hardware support. The SOI operating branches, as well as the service center files and edit operations, defined, collected, and presented the user requirements. A manager’s comment: “The STARTS system is a valuable tool used on a daily basis. It helps track the work … as well as when it is edited within the edit teams. When a return is marked missing and we find it attached to another return, we are able to go to the remarks [screen] at that time to document the condition. The STARTS system is also used to look up prior-year information. If an EIN is the only information you have to track component parts of a separated 1504C return, the STARTS system can provide much information on this. This helps us to locate additional return parts in order to edit a more complete document. STARTS provides many options in ordering the work. It is broken down by return type, three asset class categories, and the sample code only selection of returns. This gives management the necessary range to order specific types of work at all times but is especially helpful when nearing various project completion dates. As transition continues here in Ogden, we are very interested in the future STARTS process and the new and evolving ways in which we will utilize the system. We look forward to the changes and future training that is available to all leads as well as the clerks and managers.” [19] National Office analysts held a planning session with service center personnel the week of June 18, 1990, at the Austin Service Center to collect ideas, customer needs, and specific requirements for the SOI Automated Control System (SOIACS). Back in National Office, the team reviewed the requirements, analyzed the consequences of implementing a control system, and wrote descriptive and detailed requirements and specifications, which bridged the requirements and the design of the application. Cincinnati Service Center assumed primary responsibility for the Oracle program development of this new application, with Don Flynn as the lead programmer. Tentative plans involved piloting the application in one processing center and one koSHanSky nonprocessing center in the spring of 1991 for the Individual returns project. The SOI programming staffs at the Cincinnati and Ogden Service Centers developed the next generation of the application, which National Office renamed the Statistics of Income Automated Return Tracking System (STARTS). The Cincinnati staff developed and maintained the Individual Master File (IMF) version of STARTS, while the Ogden staff programmed and supported the Business Master File (BMF) version. In 2000, both programming staffs converted the text-based applications to a graphical user interface (GUI) application. [20] Connections between the center terminals and the host minicomputer in Cincinnati occurred through PACNET. [21] In the case of tax returns in SOI samples, this is metainformation about relational database properties; data warehousing; business intelligence; general IT; IT metadata management; file systems; and image, program, project, and study schedules. [22] SOI assigned information to each return: project, sample, files location, edit site, editor, delivery dates, level of edit complexity, document source (paper, electronic, or image). One result was a sample redesign, which embedded a panel within the annual cross-sectional samples. The STARTS application still distinguishes these two sets of returns. See Czajka, J. and Walker, B. (1990), Combining Panel and Cross-Sectional Selection in an Annual Sample of Tax Returns, 1989 Proceedings of the American Statistical Association, Section on Survey Research Methods. [23] The use of digital images, instead of paper, as source documents for editing is a new phase in the SOI workflow process. Other SOI processes include data cleaning and completion, weighting and estimation, and publishing tables and user analyses. [24] Systems acceptability testing (SAT) occurs before the computing centers execute the SOI sampling programs. Sample design and sample selection are topics for further discussion in other papers. [25] Stair, R.M. (1992), Principles of Information Systems: A Managerial Approach, Boyd and Fraser Publishing Company, Boston. [26] Maurer, R. (2004), One Small Step Can Change Your Life: The Kaizen Way, Workman Publishing Company, New York. [27] Improving data quality through editing, imputation, and record linkage is impossible if the administrative records that contain the data are unavailable or incomprehensible. - 48 - Customer Satisfaction Initiatives at IRS's Statistics of Income: Using Surveys To Improve Customer Service Ruth Schwartz and Beth Kilss, Internal Revenue Service I RS’s Statistics of Income (SOI) Division conducts statistical studies on the operations of tax laws and publishes annual reports, including the quarterly SOI Bulletin, which includes statistics produced from tax and information returns. SOI’s Statistical Information Services (SIS) office responds to thousands of data and information requests annually by providing SOI data along with technical assistance. To ensure that customer needs are being met through the SIS office and through its flagship publication, SOI has been measuring customer satisfaction for both via customer satisfaction surveys. These surveys are part of SOI’s commitment to use survey results to improve customer service. This paper will focus on three aspects of these surveys: the process by which we surveyed our customers, the findings from the surveys, and the steps we are taking to use the results to further improve our products and services. In the first section of the paper, background information on the SOI Division and its SIS office will be presented. The second section will describe the methodology used to survey SIS customers, present selected findings from the past 4 years of surveys, and describe how SOI is using these results to identify areas for improvement. Similarly, the third section will describe the methodology, present a summary of the findings, and briefly discuss some of the steps that SOI staff are taking to improve the SOI Bulletin. Finally, next steps to improve SOI products and services in response to survey findings will be discussed. available with respect to the operations of the internal revenue laws, including classifications of taxpayers and of income, the amounts claimed or allowed as deductions, exemptions, and credits and other facts deemed pertinent and valuable.” SOI’s mission is to collect, analyze, and disseminate information on Federal taxation for the Office of Tax Analysis, Congressional committees, the Internal Revenue Service in its administration of the tax laws, other organizations engaged in economic and financial analysis, and the general public. Its mission is similar to that of other Federal statistical agencies—that is, to collect and process data so that they become useful and meaningful information. However, SOI collects data from tax returns rather than through surveys, as do most other statistical agencies. These data are processed and provided to customers in the form of tabulations or microdata files. Although the IRS uses SOI data, the primary uses for SOI data are outside of IRS, in policy analyses designed to study the effects of new or proposed tax laws and in evaluating the functioning of the U.S. economy. u SOI Products and Services Throughout its long history, SOI’s main emphasis has been individual and corporation income tax information. SOI began publishing data with the 1916 Statistics of Income, which reported individual and corporation statistics. Beginning in 1936, for Tax Year 1934, individual and corporation income taxes are each reported separately in annual “complete” reports (Individual Income Tax Returns and Corporation Income Tax Returns, respectively). The annual Corporation Source Book provides detailed balance sheet, income statement, and tax information for major and minor industry sectors by asset size. Over the years, SOI has increased its studies and publications to meet the needs of its customers. Introduced in 1981, the SOI flagship quarterly Statistics of Income Bulletin presents the most recent data and related articles on completed studies and u Background Congress created the Statistics of Income Division 90 years ago in the Revenue Act of 1916, some 3 years after the enactment of the modern income tax in 1913. Since that time, the Internal Revenue Code has included virtually the same language mandating the preparation of statistics. Section 6108 of the Code currently states that “…the Secretary (of the Treasury) shall prepare and publish not less than annually statistics reasonably - 49 - ScHWartz and kilSS a historical section featuring time series data on a variety of tax-related subjects. SOI also periodically publishes compendiums of research on nonprofit organizations, estate taxation, and personal wealth. Research articles presented at professional conferences, namely the American Statistical Association and the National Tax Association, are published annually or biannually in the methodology report series, Special Studies in Federal Tax Statistics. Beginning with the 1998 issue, SOI took over publishing the IRS Data Book, a fiscal year report that presents statistical data on the administration of the U.S. tax system. SOI produces the following microdata files: Individual Public-Use Files; Exempt Organizations Records; and Private Foundations (and Charitable Trusts) records, all of which are available for a fee. Before release of the Individual Public-Use microdata, SOI follows security guidelines and edits the files to protect the confidentiality of individual taxpayers to prevent disclosure of taxpayer information. Tax returns for both the exempt organizations and private foundations are publicly available. Because of their size, these products are available on a CD-ROM or magnetic tape directly from SOI. Exempt organization microdata files have recently been released to the public via the World Wide Web (www. irs.gov/taxstats). Public awareness of SOI products and easy access to them have gradually increased over the years. The establishment of the Statistical Information Services office that responds to data and information requests has helped raise the visibility of SOI products. With the introduction of the IRS World Wide Web 10 years ago, SOI’s products became more widely used. They may be found at: www.irs.gov/taxstats. TaxStats includes statistics for individuals, businesses, charitable and exempt organizations, IRS operations, budget, compliance, and a variety of other topics. Currently, over 6,000 files reside on TaxStats, and this number continues to increase. was straightforward: Provide accurate and timely data along with excellent customer support and technical guidance. Although the number of customers and variety of requests have changed since then, the SIS staff still strives to fulfill this mission after 17 years. When the SIS office was set up, a telephone, paper reports and publications, index cards with contact information, and a fax machine were its primary tools. Word spread quickly, and, soon, the SIS office was inundated with requests, many of whose answers were readily available from published data. When customer requests involved data unavailable from SOI, the SIS staff made every effort to fulfill requests by providing information or contacts from other sources. In the early years of SIS operations, 4,000 to 5,000 information requests were received annually. Over the years, the tools have been greatly improved, and more data are readily available directly to the public. An electronic management system—the Response Processing System (RPS)—tracks customer information and details of data requests. While the number of information requests has leveled off with the availability of data on TaxStats, the complexity of information requests has increased significantly. Many of these requests require extensive research, some supported by SOI subject-matter analysts. Over 2,400 information requests were received in Calendar Year 2005 from a broad range of customers. The customers are as widely varied as the information they request, from a private citizen requesting data on car dealerships to a Congressional request for alternative minimum tax data. Consultants and researchers were the largest group with 23.5 percent of the requests. Academia and the Internal Revenue Service were the second and third largest groups with 13.5 percent and 12.9 percent, respectively. In Calendar Year 2005, most information requests (50.4 percent) were received by phone, followed closely by 48.2 percent received by email. The SIS office also receives information requests via fax, letters, and walk-in customers. u Statistical Information Services The Statistical Information Services (SIS) office was established in 1989 as part of efforts to streamline the SOI organization. From the beginning, the SIS mission u SIS Customer Satisfaction Survey How is the SIS office meeting its goal of providing accurate and timely data along with excellent customer - 50 - cuStomer SatiSfaction initiativeS at irS’S StatiSticS of income support and technical guidance? Although the SIS office has received positive feedback from many of its customers over the years, is this the complete picture? What about the many SIS customers, especially one-time customers, who do not provide any feedback? In 2003, at the suggestion of SOI Director Tom Petska, the SIS office administered its first survey to measure customer satisfaction. Prior to the SIS survey, SOI surveyed its primary customers (Treasury’s Office of Tax Analysis, the Congressional Joint Committee on Taxation, and the Department of Commerce’s Bureau of Economic Analysis). The SIS survey was an expansion of SOI’s efforts to measure customer satisfaction and to use customer input to improve service. measures used by SOI’s parent organization, Research, Analysis, and Statistics (RAS). Known as “balanced measures,” these criteria were designed to measure how well RAS meets its goals. To maintain consistent measures throughout all divisions of RAS, including SOI, some SIS survey questions and response options were changed to include these measures. Findings Table 1 highlights response rates for the 4 years the SIS survey was administered. Initially, the SIS office’s goal was to achieve a response rate of 50 percent. SIS planned to survey approximately 400 customers with the expectation that it would receive 200 responses. Although SIS fell short of distributing 400 surveys by 28 percent, it was quite satisfied with the 49-percent response rate. However, after the first survey in 2003, the response rate dropped 7 percentage points in 2004, but has increased to 44 percent in 2006. The number of Government surveys sent to customers has increased over the years, and this may also contribute to the declining response rate. Although SIS would like to have a higher response rate, it is pleased with its results to date. However, it will continue efforts to improve its survey instrument and its methods for administering it. Table 1—Response Rates for SIS Survey, 2003-2006 Surveys Number of Response Year distributed respondents rate 2003 2004 2005 2006 288 425 300 271 142 181 125 119 49% 43% 42% 44% Administering the Survey SOI mathematical statistician Kevin Cecco and, later, Diane Milleville, in close consultation with the SIS staff, designed the SIS surveys. Following the Office of Management and Budget’s approval, the first SIS survey was administered in 2003. After assisting the customer with an inquiry, an SIS staff member provided a survey by e-mail or fax and asked for the customer to complete the survey related to the customer’s most recent inquiry. For the first survey in 2003, the survey recipients were selected randomly from the daily roster of calls and e-mails. The SIS office planned to survey one of every four customers from January through July 2003. However, the target number of customers surveyed was not reached in July, and the survey was extended an additional month. Over the years, changes were made to improve the survey administration process. Diane Milleville and Information Technology Specialist Elizabeth Nelson, who provides RPS technical support, both helped improve the process. Surveys were imbedded in an e-mail, thus eliminating the additional step of downloading the survey file. Every customer was sent a survey, eliminating difficulties with the random selection process. Customers surveyed were tracked in RPS, which eliminated the need for SIS staff to manually track them. Beginning with the 2004 survey, response options were revised to bring the SIS survey in line with a set of Table 2 presents the respondents by job function for each of the 4 years the survey was administered. For 2003, 2004, and 2006, the top 4 categories—consultant/research, State/local government, academic, and IRS employee (excluding those classified as “other”) accounted for over 57 percent of survey respondents. For 2005, some 3 of the top 4 categories were the same; Federal Government replaced State/local government as the fourth category. Collectively, these accounted for 53.2 percent of survey respondents. In an effort to - 51 - ScHWartz and kilSS improve the SIS customer job function categories, some changes were made during the 4 years the survey was administered. In 2004, the nonprofit category was added. In 2006, the library, marketing, and realtor categories were substituted for the corporation category which was eliminated. There were some differences noted between job functions reported by SIS survey respondents and the general population of SIS customers. SIS compared responses for job function reported by survey respondents and recorded by SIS staff in the Response Processing System (RPS) during the time period in which the SIS surveys were administered. Overall, the differences were generally small for most job categories. An exception was the private citizen category, which ranged from 12 to 2 percentage points higher (for 2003 and 2006, respectively) in RPS than in survey responses. These differences may be a function of respondents’ self-classification versus classification by an SIS staff member. Table 2—Percentage Distribution of SIS Survey Respondents by Job Function, 2003-2006 Job function Total Consultant/Research State/Local government Academic IRS employee Media Corporation Federal Government Private citizen Tax Preparation/ Accounting firm Association/Society Congress Law firm Nonprofit Library Marketing Realtor Other n.a. -- not available The first survey was designed with 17 questions in 2003. Over the 4 years, some questions were removed, while others were added. Overall, the number of questions decreased to a total of 12 for the 2006 survey (see Appendix). Survey questions focusing on 3 issues are discussed below. Table 3 presents the customer’s expectation of timeliness for receiving a response to an information request in the 2003 survey and actual timeliness in response to questions for the 2004-2006 surveys. Note that the 2003 question is different from the question included in the 2004-2006 surveys. The 2003 question asks when the customer expected to receive a response, but the 2004-2006 question asks when a response was received. Response options for all 4 years are the same. By changing the wording of the question, SIS was able to obtain more useful information from its customers. The expected response time (in the 2003 survey) was significantly greater than the actual response time (in the 2004 survey.) Some 36 percent expected a response on the same business day in the 2003 survey. However, over 70 percent actually received their responses on the same business day (in the 2004 survey). For 2004 through 2006, a response was received in 3 business days or less 93 percent to 96 percent of the time. SIS compared the response time for survey respondents to the response time recorded in RPS by SIS staff using the time period that SIS surveys were administered in 2004-2006. Response time of 1 day or less reported by survey respondents ranged from 74.2 percent to 62.3 percent (for 2005 and 2006 respectively). In contrast, the response time of 1 day or less reported in RPS was 93.8 percent or higher for 2004-2006. The SIS staff generally responds to customers within 1 business day as indicated in RPS. However, a completed request including additional research may take 2-3 days. This is indicated by 26.1 percent to 30.7 percent of survey respondents reporting a response time of 2-3 business days. The response time gap between survey responses and RPS may be the difference between making an initial contact and delivering the completed information to the customer. Year 2003 2004 2005 2006 100.0 17.1 10.1 15.1 14.2 5.0 n.a. 7.5 3.4 5.0 0.0 2.6 1.7 5.9 5.9 1.7 1.7 2.5 100.0 100.0 100.0 19.4 14.4 13.7 10.1 7.9 7.2 5.8 4.3 2.9 2.2 0.7 -n.a. n.a. n.a. n.a. 11.5 17.8 13.9 13.3 12.8 5.6 8.3 7.2 6.7 1.7 0.6 1.1 2.8 4.4 n.a. n.a. n.a. 3.9 15.3 8.1 13.7 12.9 5.7 10.5 11.3 2.4 3.2 1.6 0.8 0.8 4.8 n.a. n.a. n.a. 8.9 - 52 - cuStomer SatiSfaction initiativeS at irS’S StatiSticS of income Table 3—Response Timeliness for SIS, 2003-2006 Percentage of respondents indicating . . . Survey Response In question options 2003 When did you expect to receive a response? Same day 36.0 2-3 business days 52.5 4-5 business days 8.6 6 or more business days 2.9 Percentage of respondents indicating . . . Survey Response In question options 2004 2005 Same day 2-3 business days 4-5 business days 6 or more business days 70.6 26.1 1.7 1.7 74.2 23.4 2.4 -- or strongly agreed that their needs were met ranged from 76.5 percent in 2004 to 82.5 percent in 2005. Table 5 presents customers’ overall satisfaction with the most recent response they received from SIS. For all 4 years, the question was the same, but, beginning with the 2004 survey, the response options were changed to reflect RAS balanced measures. Therefore, responses are not comparable between 2003 and the 2004-2006 responses. However, for all 4 years, the satisfaction Table 5—Overall Satisfaction With SIS, 2003-2006 Percentage of respondents indicating . . . Survey Response In question options 2003 Rate your overall Very low 0.7 satisfaction Low 1.4 with your Average 10.1 most recent High 34.8 data request. Very high 52.9 Percentage of respondents indicating . . . Survey question Rate your overall satisfaction with your most recent data request. Response options Totally dissatisfied Dissatisfied Neither Satisfied Totally satisfied 2004 0.6 3.5 9.9 41.5 44.4 In 2005 3.4 0.8 4.2 41.2 50.4 2006 1.7 2.6 7.8 33.9 53.9 2006 62.3 30.7 3.5 3.5 When did you receive a response? Table 4 presents the issue of meeting customer needs. In 2004, the question and the response options were changed to reflect the RAS balanced measures. The 2003 question asked if SOI’s product(s)/data satisfied customer needs. The 2004-2006 question asks if the product(s) or services(s) provided met customer needs. The major difference between the 2003 question and the 2004-2006 question is the response options. In the 2003 survey, there is no option for a “middle ground” between the “disagree options” and the “agree options.” Instead, a “not applicable” option is listed at the end after “strongly agree.” Beginning with the 2004 survey, a “not sure/neither” option is available between the “disagree options” and the “agree options.” During the 4 years of the surveys, the percentage of respondents who agreed Table 4—SIS Met Customer Needs, 2003-2006 Percentage of respondents indicating . . . Survey question SOI's product(s)/data satisfied your needs. Response options In 2003 rate remained high. Respondents who were satisfied or very satisfied ranged from 85.9 percent in 2004 to 91.6 percent in 2005. The surveys each year also included open-ended questions asking for further explanations, recommendations, and suggestions for improving service to SIS customers. The information gleaned from responses to these open-ended questions has been exceptionally useful. Several respondents suggested adding the missing years in SOI historical tables, published in the Statistics of Income Bulletin and also released on TaxStats. In these historical tables, the most current 5 years were shown, and, for earlier years, only every fifth year was shown. Data classified by locality are SIS’s most frequently requested products. SOI, in conjunction with the Census Bureau, produces county-to-county and State-to-State migration data, along with county income data. SOI also produces Zip Code data. Not surprisingly, respondents requested more locality data. Some Strongly disagree 5.1 Disagree 8.0 Agree 30.4 Strongly agree 51.4 Not applicable 5.1 Percentage of respondents indicating . . . Survey Response In question options 2004 2005 Strongly disagree Disagree Not sure/neither Agree Strongly agree 6.9 6.9 9.7 33.1 43.4 5.0 5.0 7.5 29.2 53.3 2006 5.3 3.5 12.3 33.3 45.6 The product(s) or service(s) provided met your needs. - 53 - ScHWartz and kilSS respondents, for example, requested earned income tax credit and alternative minimum tax data by county or Zip Code and migration data classified by occupation. Respondents also requested that locality data or the Corporation Source Book be made available on TaxStats. These products have been available on a reimbursable basis from SOI. Changes Planned or Implemented Based on the input received from SIS customers, the SIS office has made some changes over the past 3 years. The SIS office conducted a benchmarking trip to the SIS’s counterpart at the U.S. Department of Transportation and is looking into other factfinding trips. After the first survey was conducted, the SIS office worked with an Information Technology Specialist to more effectively track customer requests and information about its customers. SOI has also made improvements to its products and services by eliminating breaks in time series data for many of its tables. In selected SOI Bulletin historical tables, data for sequential years are published as space allows. On TaxStats where no space limitation exists, SOI is looking into adding more years of historical data by inserting data for missing years. SOI has also begun adding more data to TaxStats. This year, SOI added the 2000-2003 issues of the Corporation Source Book. Ultimately, the workgroup’s efforts turned to improving the quality of SOI’s most visible publication—the quarterly SOI Bulletin—and the efficiency of the Bulletin production process. Two methods were used—focus groups and a customer satisfaction survey. The focus groups were conducted to learn how authors and reviewers perceive the writing and review process, and to solicit ideas for changes in the writing and review process. The customer satisfaction survey was administered to better understand how SOI customers use the Bulletin, how satisfied they are with the contents, how useful the various features of the Bulletin are to them, and how it should be improved. The remainder of this section of the paper will be devoted to the Bulletin itself, describing the survey process, summarizing the key findings, and, finally, telling how SOI is using the survey results to improve the publication. About the SOI Bulletin Twenty-five years ago, in the summer of 1981, the first issue of the Statistics of Income Bulletin was published. It was initially created as the vehicle for disseminating more limited data on topics formerly covered by separate reports, as well as to provide the results of the growing number of special projects. The first SOI Bulletin was 46 pages and included just 3 articles—on individual income tax returns, sole proprietorship returns, and partnership returns. Recently, SOI Division published the 100th Bulletin (Spring 2006, Volume 25, Number 4), which included 6 articles; 23 selected historical and other data tables; sections on sampling methodology, projects and contacts, and products and services; and an index of selected previously published articles. SOI is currently working on the first issue of its 26th year (Summer 2006, Volume 26, Number 1). The average size of the report for 2005 was 310 pages. Today’s Bulletin is issued quarterly, in March, June, September, and December and provides the earliest published annual financial statistics obtained from the various types of tax and information returns filed, as well as information from periodic or special analytical studies of particular interest to students of the U.S. tax system, tax policymakers, and tax administrators. It also includes personal income and tax data by State and u SOI Bulletin Survey The SOI Division’s long history of publishing stems from its original mandate in 1916. Over the years, the number of publications and the amount of time and effort to publish them have grown, but considerably less time has been spent evaluating the content, frequency, and dissemination of the publications. Three years ago, these tasks were the charge for a new workgroup that involved senior SOI staff and 3 members of SOI's Advisory Panel [1]. Initially, this group undertook to review the content and frequency of all SOI publications; examine how it could make them more useful; look at methods of advertising and disseminating; and look at what it is not publishing that perhaps it should. - 54 - cuStomer SatiSfaction initiativeS at irS’S StatiSticS of income historical data for selected types of taxpayers, in addition to data on tax collections and refunds and on other tax-related items. Much work goes into producing each issue of the Bulletin, but it was not clear whether it was meeting customers’ needs. Thus, a survey was designed to collect critical information on how customers felt about the Bulletin. Administering the Survey Once again, SOI Division mathematical statisticians Kevin Cecco and Diane Milleville were called upon to assist in developing the survey. The result was a relatively brief and visually engaging, 15-question customer survey, which was subsequently cleared for use by the Office of Management and Budget. Following OMB’s approval, the survey was then administered to SOI Bulletin customers in several ways. The survey was sent directly via e-mail to SOI’s main customers at the Department of Treasury’s Office of Tax Analysis, the Congress’s Joint Committee on Taxation, and the Commerce Department’s Bureau of Economic Analysis, as well as to all members of SOI’s Advisory Panel. The survey was also included in the Summer 2004 and Fall 2004 issues of the SOI Bulletin for customers to remove, fill out, and either e-mail or fax back to SOI. As a further outreach to potential SOI Bulletin customers, an SOI Advisory Panel member facilitated the dissemination of the survey via the Federation of Tax Administrators (FTA) list serve in January 2005. Following a reasonable amount of time after publishing the Fall 2004 Bulletin and time allowed for FTA members to reply, the responses were compiled and analyzed. In all, 52 surveys were returned. The majority of respondents were from groups SOI targeted. Only 9 respondents filled out the survey from the Bulletin itself. To put these numbers in perspective, it should be noted that, for the Fall and Summer issues that year, approximately 2,000 copies of each were printed. Of these, about 400 copies were sent to internal IRS and Treasury Department offices, about 1,250 copies were provided to the Government Printing Office (GPO) for subscribers and the Federal Depository Libraries, and about 350 copies were for the SOI Division for internal purposes. Because just 52 responses were received, a major con- cern was that responses might not be representative of all users, meaning this information should probably not be the basis for any final decision concerning the Bulletin. Also, it was not possible to conduct a nonresponse analysis, because the majority of the Bulletin copies are distributed by the GPO, and SOI does not know who the customers are. In addition, SOI decided not to continue to include the survey in subsequent issues of the Bulletin for several reasons—1) the responses were likely to be low again; 2) the OMB approval process was required for each issue of the Bulletin, and, with a low response rate, it would be more difficult to justify including it in the report; and 3) the OMB approval process had just become much longer, taking about 5 weeks instead of 2 weeks. Nevertheless, SOI did have the results from 52 surveys to evaluate, and, after consulting with the mathematical statisticians advising us on this effort, they recommended that SOI work with the results it has and use another vehicle to focus on a particular part of the Bulletin, e.g., another focus group, should SOI decide to solicit additional customer feedback. The findings are presented below. Findings Type of respondents. Over one-third of the respondents (36 percent) were affiliated with State and local governments. Another 18 percent indicated a Federal Government affiliation, while 17 percent had a Congressional affiliation. Nearly one-third of all responses came from members of the FTA list serve. Use of other SOI products. The three most heavily used SOI products other than the SOI Bulletin were the Corporation Source Book, the IRS Data Book, and the Individual complete report—used by 40 percent-50 percent of all respondents. A little over one-third of respondents also indicated they used the Corporation complete report. About one-fourth of all respondents use Special Studies in Federal Tax Statistics, public-use microdata files, and special tabulations. Twenty percent or less said they use other SOI products. How respondents receive the Bulletin. Half of all respondents receive the Bulletin through a subscription. Another 20 percent receive it directly from the SOI Division. - 55 - ScHWartz and kilSS Frequency of use. Of the 49 who responded to how frequently they use the Bulletin, 37 (about 76 percent) use it 4 times a year. Only 8 percent use it once a year. Overall satisfaction. Of the 49 who responded, 86 percent were satisfied or totally satisfied with the SOI Bulletin; only 2 respondents were dissatisfied, while 5 were neither satisfied nor dissatisfied. Use of specific features. Of the 8 features listed (from the Bulletin Board column in the front of the report through the index on the inside back cover), and checking all that apply, the Selected Historical and Other Data section was by far the most frequently used—90 percent of survey respondents, compared to 67 percent who said they use the featured articles and 38 percent who use the data releases. An equal number (about 25 percent of respondents) use each of the remaining features, except for the Bulletin Board, which less than 8 percent indicated they use. Suggestions for change. When asked to check boxes regarding possible changes to the Bulletin, nearly half of all respondents indicated they would like to see more articles on topics of current interest. They also indicated an interest in shorter articles focused on key findings (nearly 37 percent). About one-fourth of respondents said they would like more details on methodologies and samples. For the response “Other,” 8 survey respondents offered varied suggestions, such as adding links to data and explanatory material on the Web, including more longitudinal data, and reporting medians as well as averages and measures of variability. How to publish sections: print, Web, or both. This question dealt with the component parts of an article or data release and asked respondents whether they preferred the parts to be provided in print only, posted to the Web only, or to be available in both places. About two-thirds of respondents preferred that the tables be provided in both mediums; nearly half or more than half of respondents indicated that they preferred most parts of an article to be published in print and on the Web. Use of Selected Historical and Other Data section. When asked if they used the Selected Historical and Other Data section, some 90 percent said yes. Of those - 56 - who said yes, over 93 percent said the tables are useful, and over 84 percent said the footnotes were useful. Of the 2 respondents who answered no to this question, 1 provided additional comments, indicating that publishing the historical tables in every issue was not necessary. Where to publish historical tables. Nearly 70 percent of those who use the historical tables felt that they should be published in both print and on the Web. And of 19 respondents who answered the question about how often to publish the historical tables, 11 (or about 58 percent) felt that the historical section should appear in all SOI Bulletin issues. Verbatims The survey also included the following open-ended questions in order to gain additional information about how the information in the Bulletin is being used and to seek recommendations and suggestions for improvements. The following summarizes the responses SOI received to the open-ended questions from the survey: • What is your primary use of the SOI Bulletin? About 60 percent of respondents chose to reply. Verbatim responses covered a number of areas of uses. A few respondents stated that they use the Bulletin for “quick look-up of tabulations” or to look up the most recent data on a topic. One respondent identified him/ herself as a “scholar and educator with deep interest in the Federal tax system” who reads the Bulletin for “keeping up” responsibilities. Another uses the Bulletin as a resource for responding to media inquiries. The most recurring themes centered around the Bulletin as a source of data for research and for the historical series data. About a third of the answers indicated that the statistics were used for research, revenue estimation, or tax modeling purposes. Another 20 percent were mainly interested specifically in the historical data series that is included in each issue. • If you use the Selected Historical and Other Data section of the SOI Bulletin, which tables do you use, do you find them useful, do you find the accompanying footnotes useful, and how would you improve this section? cuStomer SatiSfaction initiativeS at irS’S StatiSticS of income About half of the 90 percent of survey respondents who indicated that they use the historical data also told which tables they use of the 23-table section. The majority of those use 7 or more tables in the section, and some specifically stated that they use the annual State data, a 53-page table titled “Table 2—Individual Income and Tax Data by State and Size of Adjusted Gross Income.” About 20 percent of those who use the historical data also answered the question about whether they find the tables useful. Several stated they found them useful as a quick reference, while others stated they were difficult to find on the Web. Only 1 person responded to the question about the footnotes, finding them marginally useful because of the limited number of years available. Suggested improvements ranged from only publishing the series once a year to adding more details on the State table, to including many more years of data, to more detailed data by State. • If you could change one thing about the SOI Bulletin, what would it be? that the results are a strong indication that it is doing a good job of producing the SOI Bulletin. It is a useful resource for looking up data on a specific tax-related topic. The historical data are very useful and an important reason why people use the Bulletin. However, it is also clear that there is room for improvement in a number of areas—in improving the writing, e.g., preparing shorter articles focused on key findings and preparing more articles on topics of current interest. Many customers are also interested in more details on methodologies and samples. And another message that came through is an interest in more consecutive years of historical data. These results, along with the results from focus groups with Bulletin authors and technical reviewers, are being used to focus SOI efforts on specific areas of improvement. Recently, SOI has been working with some of the members of SOI’s Web Modernization Team with the goal of improving the process of producing and posting tables to the TaxStats Web site, which should also improve the process of producing Bulletin articles. One outcome in streamlining this part of the Bulletin production process is that we are making data available earlier on TaxStats. The TaxStats Web Team is also working with a contractor on a dynamic tables prototype that will allow users to make their own tables from previously tabulated SOI data. Currently, this is a prototype that allows users to make tables from 2 years of Corporation Source Book data. The prototype will run for 4 months, after which SOI will evaluate feedback, costs, etc., to determine how this will fit into SOI’s data dissemination strategy. SOI also plans to address Bulletin content issues. Working more closely with managers, authors might want to refresh their articles by shortening them, by becoming more familiar with relevant tax and economic literature, by soliciting ideas from senior staff from Treasury’s Office of Tax Analysis and other customers, and by coauthoring articles with senior staff or outside experts. SOI will seek to assist authors in accessing the tax and economic literature by establishing an electronic index of the SOI library and arranging a briefing on electronic research from a sister organization in IRS. SOI will also assemble a collection of examples of Nearly one-third of respondents chose to weigh in on this question, and the responses offered a few themes for SOI to consider—namely, a more detailed index in order to locate earlier, related articles; more topical, interesting articles as some are rather dull; providing links to related, technical documentation on the Web; and making Bulletin tables electronically useable on the Web. • Please provide any additional comments and/or suggestions you may have concerning the SOI Bulletin. Ten responses were received to this question, about 20 percent of those who responded to the survey. No 2 comments were the same, but 1 area for improvement suggested in several responses was in length of articles. There appears to be more interest in the figures, graphs, and tables. Some asked SOI to consider producing a leaner Bulletin, with more interesting writing. Next Steps Although the number of responses to the SOI Bulletin Survey was less than had been hoped for, SOI feels - 57 - ScHWartz and kilSS good Bulletin articles and other descriptive papers to aid newer authors. SOI will continue to work on improvements to the Bulletin, as evidenced by current efforts to get consensus from our senior managers on a plan to improve the Bulletin production process, followed by incremental improvements in content and quality of the articles and tables. In so doing, SOI is committed to responding to the recommendations and suggestions of customers. u Summary and Conclusion As discussed, the Statistics of Income Division is using surveys to improve the methods of conducting business, with the emphasis on providing top-quality service to its customers. The SIS Survey questions dealt with communication, characteristics of staff, opinions of products, and overall satisfaction. When surveying SOI Bulletin customers, questions dealt with characteristics of the customer and their use of this publication, content issues, suggestions for improvement, and overall satisfaction. Administering surveys and examining the findings over the past several years have shown SOI how well it is doing in improving products and services and have helped guide efforts to make improvements in these areas. For both the SOI Bulletin and SIS surveys, specific suggestions included in verbatims related to SOI current products have been particularly useful. The Statistical Information Services office has definitely benefited from the surveys over the past 3 years. The SIS survey has helped maintain focus on the SIS goal of outstanding customer service. To continue to improve its service, the SIS made a benchmarking trip and is looking into other factfinding trips. The SIS office also made enhancements to its electronic tracking system (RPS) to more effectively track requests as well as information about its customers. Overall, the responses received from the SOI Bulletin Survey have been useful in helping direct current efforts to improve the Bulletin. For example, it is clear that SOI customers want to continue to have Historical and other data tables available in both the printed publication and on SOI’s TaxStats Web site. SOI staff are currently working on guidelines for making tables more usable for customers who intend to download and work with the data SOI provides. In addition, SOI is working on improving the publication process itself as well as desktop publishing tools to improve the layout process. It also intends to work with subject-matter experts and mathematical statisticians on content issues, e.g., including more articles on topics of current interest and more information about the statistical significance of reported trends, especially when the reported changes are small in magnitude. Measuring customer satisfaction will continue to be a major priority for SOI. A commitment to collecting and evaluating customer satisfaction data will ensure that SOI does not lose its focus on critical issues that impact its customers. An emphasis on collecting customer satisfaction data will reinforce the SOI culture of providing outstanding service to customers. As is evident from the data presented in this paper, SOI has done a good job of exceeding the expectations of its customers. However, SOI should not rest on its successes, but rather work even harder to ensure that it meets or exceeds customer expectations. u Endnotes [1] “Recent Efforts To Maximize Benefits From the Statistics of Income Advisory Panel,” by Tom Petska and Beth Kilss, Special Studies in Federal Tax Statistics: 2003, Internal Revenue Service, pp. 87-93, 2004 - 58 - cuStomer SatiSfaction initiativeS at irS’S StatiSticS of income Appendix—SIS Survey Questions, 2003-2006 Survey question Which of the following best describes your function? How did you initially learn about the SOI SIS office? How did you initially learn about the SIS office? How often do you contact our office? How often do you contact the SIS office? How did you contact us? Was the first contact with SIS with a (1) person; (2) voice message Was the voice message (1) informative; (2) user-friendly; (3) okay as is; (4) needs improvement by _______ . Did we satisfy your data request? (If only partially or not at all, please explain why in the space provided below.) Did the SIS satisfy your data request? Did the SIS satisfy your data request? (If only partially or not at all, please explain why in the space provided below.) When did you expect to receive a response from us? When did you receive a response? When did you receive a response regarding your most recent data request? How did we respond to your data request? Our staff was focused on determining and satisfying your needs. The SIS staff was focused on determining and satisfying your needs. SOI's product(s)/data satisfied your needs. The product(s) or services (s) provided met your needs. SOI's product(s)/data was received timely. How often do you retrieve data from the SOI Tax Stats Web site? The SOI Tax Stats Web site is user-friendly. The SOI Tax Stats Web site is user-friendly. Why or why not? The Tax Stats Web site would be more useful if SOI considered the following (1) adding more data; (2) deleting data; (3) adding links to other data; (4) having a sophisticated search engine; (5) allowing "create your own" tables; (6) adding more viewable tables; (7) other. The information from the SOI Tax Stats Web site met your needs. If you could change one thing about the SOI Tax Stats Web site, what would it be? How would you prefer to receive products/files from SOI? If given the opportunity, would you be interested in receiving notice of future data/product releases from SOI? What types of new products/data releases would you be most interested in receiving? Please rate your overall satisfaction with your most recent data request. If you could change one thing about your experience with the SIS office, what would it be? Please list any other Web sites that you use to gather statistical information. Please provide comments and/or suggestions on ways we may better serve your data needs. Year question included in SIS survey 2003 2004 2005 2006 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X - 59 - Performance Measurement within the Statistics of Income Division Kevin Cecco, Internal Revenue Service D eveloping performance measures continues to play an important role for many of the Federal statistical agencies. Federal statistical agencies produce critical data to inform public and private decisionmakers about a range of topics of interest, including the economy, the population, and other pertinent statistics. The ability of statistical agencies to make appropriate decisions about the statistical data they produce depends critically on the availability of relevant, innovative, and timely performance measures. The Federal statistical community remains on alert for opportunities to strengthen these measures, when necessary. For Federal statistical programs to effectively benefit their data users, the underlying data systems must be viewed as credible. In order to ensure this credibility, Federal statistical agencies have worked very hard to develop high-quality standards, as well as maintain integrity and efficiency in the production of data. As the collectors and providers of these basic statistics, the responsible agencies act as data stewards, balancing public and private decisionmakers’ needs for information with legal and ethical obligations to minimize reporting burden, respect respondents’ privacy, and protect the confidentiality of the data provided to the Government. To reach this goal, Federal statistical agencies have focused on developing and measuring performance in the critical areas of quality, program performance, relevance, and timeliness. Lastly, customer satisfaction is quite often used as a means of measuring the usefulness of products and services provided by Federal statistical agencies. Performance measures form the basis for evaluating such areas as how efficiently Federal agencies provide services, how well taxpayer dollars are spent, and assessing whether Federal agencies are meeting their mission requirements. u Understanding Performance Measures In general terms, a performance measure is a quantitative or a qualitative measure derived from a series of observed facts that can reveal relative positions in a given area. When evaluated at regular intervals, the measure can point out the positive or negative trends and changes over time. Performance measures are also useful in drawing attention to particular issues that pertain directly to organizational mission achievement. They can also be helpful in setting policy priorities for a Federal agency. There are several pros and cons related to performance measures. These include: Pros: • • • • Cons: • • May send misleading messages if they are poorly constructed or misinterpreted. May be misused if the construction process is not transparent and lacks sound statistical or conceptual principles. Can summarize complex issues in simple terms for supporting decisionmakers. Are easier to interpret than trying to find a trend among larger sets of data. Facilitate communication with appropriate target audiences. Promote accountability and credibility. - 61 - cecco u Constructing Performance Measures There are countless sources of information on how statistical agencies should construct solid performance measures. Provided below are four guidelines that should be followed when creating and implementing performance measures. Each step is important for statistically sound and defensible measures. Equally important is the notion of ensuring that all four guidelines are followed in an orderly and cohesive process. Choices made in one step can have important implications for other steps. 1. Developing a Solid Foundation: A sound framework is the starting point in formulating performance measures. The framework of measures should be built in a manner that correlates with the mission of an organization, as well as aligns with strategic goals and organizational objectives. The framework should be precise, articulating the purpose of the statistical agency. 2. Selecting Quality Data: The strengths and weaknesses of performance measures are largely based on the quality of the underlying data. Ideally, measures should be formulated based on their relevance, analytical soundness, timeliness, and availability. While the development of performance measures must be guided by the framework of useful indicators, the data selection process can be very subjective as there is no specific and generally accepted method for developing measures. More importantly, the inability to obtain relevant data may also limit a statistical agency from building sound and defensible performance measures. 3. Identifying the Right Performance Measures: Over the past decade, there has been a renewed effort in developing meaningful performance measures. Unfortunately, performance measures are sometimes selected in an arbitrary manner. This can lead to measures which confuse and mislead decisionmakers and the general public. The underlying nature of the data needs to be carefully assessed before constructors can develop the “right” measures. 4. Presenting and Disseminating: The way performance measures are presented is not a trivial issue. Performance measures must be able to communicate an accurate and persuasive picture to decisionmakers and organizational leaders. The representation of performance measures should provide clear messages without obscuring individual data points. There are many interesting ways of disseminating critical information, such as developing innovative balanced scorecards. These offer the general public the means to clearly show evidence of improving or declining performance. Statistical agencies should always strive to be independent and unbiased when presenting and disseminating performance measurement results. u Performance Standards within the Federal Statistical Community Statistical agencies maintain the quality of their data or information products, as well as their credibility, by developing meaningful performance measures for their organizations. Federal statistical agencies have collaborated on developing a meaningful set of performance measures for use under the Government Performance and Results Act and in completing the Administration’s Program Assessment Rating Tool (PART). These statistical agencies have agreed that there are six conceptual dimensions within two general areas of focus that are key to measuring and monitoring statistical programs. The first area of focus is Product Quality, encompassing the traditional dimensions of relevance, accuracy, and timeliness. The second area of focus is Program Performance, encompassing the dimensions of cost, dissemination, and mission achievement. Provided below is a brief review of these six quality dimensions, split between Product Quality and Program Performance. Product Quality: Statistical agencies agree that product quality includes many attributes, including relevance, accuracy, and timeliness. The basic measures in this group relate to the quality of specific products, thereby providing actionable information to key stakeholders. - 62 - Performance meaSurement WitHin tHe StatiSticS of income diviSion These are ‘‘outcome-oriented’’ measures and are critical to the usability of these products. Statistical agencies establish goals and evaluate how well targets are met. In some sense, relevance relates to ‘‘doing the right things,’’ while accuracy and timeliness relate to ‘‘doing things right.’’ 1. Relevance: Qualitative or quantitative descriptions of the degree to which products and services are useful and responsive to users’ needs. Relevance of data products and analytic reports may be monitored through a professional review process and ongoing contacts with data users. Product relevance may be indicated by customer satisfaction with product content, information from customers about product use, demonstration of product improvements, comparability with other data series, agency responses to customer suggestions for improvement, new or customized products or services, frequency of use, or responses to data requests from users (including policymakers). 2. Accuracy: Qualitative or quantitative measures of important features of correctness, validity, and reliability of data and information products measured as degree of closeness to target values. For statistical data, accuracy may be defined as the degree of closeness to the target value and measured as sampling error and various aspects of nonsampling error (e.g., response rates, size of revisions, coverage, and edit performance). For analysis products, accuracy may be the quality of the reasoning, reasonableness of assumptions, and clarity of the exposition, typically measured and monitored through review processes. In addition, accuracy is assessed and improved by internal reviews, comparisons of data among different surveys, linkages of survey data to administrative records, redesigns of surveys, or expansions of sample sizes. 3. Timeliness: Qualitative or quantitative measure of timing of information releases. Timeliness may be measured as time from the close of the reference period to the release of information, or customer satisfaction with timeliness. Timeliness may also be measured as how well agencies meet scheduled and publicized release dates, expressed as a percentage of release dates met. Program Performance: Statistical agencies agree that program performance encompasses balancing the dimensions of cost, dissemination, and mission accomplishment for the agency as a whole; operating efficiently and effectively; ensuring that customers receive the information they need; and serving the information needs of the Nation. Costs of products or programs may be used to develop efficiency measures. Dissemination involves making sure customers receive the information they need via the most appropriate mechanisms. Mission achievement means that the information program makes a difference. Hence, three key dimensions are being used to indicate program performance: cost (input), dissemination (output), and mission achievement (outcome). 4. Cost: Quantitative measure of the dollar amount to produce data products or services. The development and use of financial performance measures within the Federal Government are an established goal; the intent of such measures is to determine the ‘‘true costs’’ of various programs or alternative modes of operation at the Federal level. Examples of cost data include full costs of products or programs, return on investment, dollar value of efficiencies, and ratios of cost to products distributed. 5. Dissemination: Qualitative or quantitative information on the availability, accessibility, and distribution of products and services. Most agencies have goals to improve product accessibility, particularly through the Internet. Typical measures include: on-demand requests fulfilled, product downloads, degree of accessibility, customer satisfaction with ease of use, number of participants at user conferences, citations of agency data in the media, number of Internet user sessions, number of formats in which data are available, amount of technical support provided to data users, exhibits to inform the public about information products, issuance of newsletters describing products, and usability testing of Web sites. - 63 - cecco 6. Mission Achievement: Qualitative or quantitative information about the effect of, or satisfaction with, statistical programs. For Government statistical programs, this dimension responds to the question—have we achieved our objectives and met the expectations of our stakeholders? Under this dimension, statistical programs document their contributions to the goals and missions of parent departments and other agencies, the Administration, Congress, and information users in the private sector and the general public. For statistical programs, this broad dimension involves meeting recognized societal information needs; it also addresses the linkage between statistical outputs and programmatic outcomes. u Twelve SOI Performance Measures What follows is a summary of the 12 performance measures. Specifically, a definition is provided, as well as a synopsis of results over the past 3 years. Measures 1 and 2 are collected from customer satisfaction surveys that are administered to our critical stakeholders in OTA, JCT, and BEA, as well as selected customers and employees throughout IRS. 1. Percentage of customers who feel the product or service met their needs: Include a question on a customer satisfaction survey asking: “Did the product(s) or service(s) provided to your organization meet your needs.” 2. Overall RAS Customer Satisfaction rate: Include a question on a customer satisfaction survey asking: “Please rate your overall satisfaction with SOI.” u Performance Standards within the Internal Revenue Service Statistics of Income Division The mission of the Statistics of Income (SOI) Division is to collect, analyze, and disseminate information on Federal taxation for the Treasury Department’s Office of Tax Analysis, Congressional Committees, the Internal Revenue Service in its administration of the tax laws, other organizations engaged in economic and financial analysis, and the general public. To accomplish the mission, the SOI provides statistical data to be used strictly in accordance with, and subject to, the limitations of the disclosure provision of the IRS Code. The SOI Division worked with others within IRS to develop 12 performance measures. The measures cover various areas of operation and attempt to magnify the level of service provided to our primary stakeholders. In creating the performance measures, the group worked very hard to ensure that the measures were all-encompassing within the four strategic goals of SOI, including becoming our customers’ preferred source, attracting and challenging high-quality employees, making a difference in tax administration, and increasing visibility of the SOI Division. Measures 1 and 2–Product Met Needs of Customer and Customer Satisfaction Rates 100% 95% 90% 85% 80% Qtr 3 2003 Qtr 4 2003 Qtr 1 2004 Qtr 2 2004 Qtr 3 2004 Qtr 4 2004 Qtr 1 2005 Qtr 2 2005 Qtr 3 2005 Qtr 4 2005 Qtr 1 2006 Measure 1: Products Met Needs of Customer Measure 2: Customer Satisfaction Rate • Results from the chart show fairly comparable rates between Measures 1 and 2 over the past 3 years • Since this measure captures results from five different customer surveys, relevance and satisfaction rates vary quarter by quarter. - 64 - Performance meaSurement WitHin tHe StatiSticS of income diviSion 3. Overall Employee Satisfaction Scores from the Employee Survey: Definition: The grand mean score from 12 questions found on IRS’s annual employee satisfaction survey. Measure 3–Employee Satisfaction 24 20 16 12 8 4 0 Qtr 4 2003 Measure 5–Number of Applicants per Job Opening Measure captures the annual Gallup Grand Mean Score across Q12 questions for SOI: 2003 Grand Mean Score 3.99 Qtr 1 2004 Qtr 2 2004 Qtr 3 2004 Qtr 4 2004 Qtr 1 2005 Qtr 2 2005 Qtr 3 2005 Qtr 4 2005 Qtr 1 2006 2004 3.86 2005 3.81 Number of applicants per job opening has fluctuated significantly over the past 3 years. On average over the past 3 years, SOI receives approximately seven applicants per job announcement. Results show a slight decline in employee satisfaction over the past three years 6. Number of Senior Leadership Briefings: Definition: Tally of senior leadership team briefings. Senior leaders are defined as individuals and comprise 23 senior IRS executives. 4. RAS Attrition rates: Definition: Attrition rate is defined as the total number of employees who have a break in service from IRS within a given fiscal year divided by the total number of employees (part and full-time) on the rolls at the beginning of a fiscal year. Measure 6–Number of Senior Leadership Briefings 6 4 2 Measure 4–RAS Attrition Rate Attrition rate is defined as the number of employees who have a break in service from IRS within a given fiscal year divided by the number of employees on rolls at the beginning of the fiscal year. Results: 2003 4.70 % 2004 3.80 % 2005 4.40 % 0 Qtr 1 2003 Qtr 2 2003 Qtr 3 Qtr 4 2003 2003 Qtr 1 Qtr 2 2004 2004 Qtr 3 2004 Qtr 4 Qtr 1 2004 2005 Qtr 2 Qtr 3 2005 2005 Qtr 4 2005 Qtr 1 2006 IRS Senior Leadership Group consists of 23 executives across the Service. The graphic shows a relatively small, yet inconsistent, number of Leadership briefings over the past 3 years. 7. Number of Presentations Given Outside the Service: 5. Number of applicants per job opening: Definition: The total number of unique applicants received for each job announcement. This includes all applications received by the servicing personnel specialist. Definition: The number of program presentations given to groups and/or individuals outside the Service. Each briefing will count as one (e.g., if an organization briefs multiple customers at the same time, that will count as one briefing). - 65 - cecco Measure 7–Number of Presentations Given Outside the Service 15 10 5 0 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 2003 2003 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 normal business operations; 3) reports produced as a result of an analysis; or 4) new data sets produced from existing databases. Measure 9–Number of Data Requests, Publications, Reports, and Data Sets 130 110 90 Such audiences for presentations include GAO, TIGTA, ASA, and NTA meetings, and various IRS advisory groups. Results show a relatively consistent pattern in the number of presentations over the past 2 years. 70 50 Qtr 3 2003 Qtr 4 2003 Qtr 1 2004 Qtr 2 2004 Qtr 3 2004 Qtr 4 2004 Qtr 1 2005 Qtr 2 2005 Qtr 3 2005 Qtr 4 2005 Qtr 1 2006 8. Number of New and Repeat Customers: Definition: A Customer is defined as an individual person or organization that officially authorizes a product or service. A Repeat Customer is the same individual or organization requesting a new work activity, and a New Customer is a new individual person or organization requesting a new work activity. Similar to new and repeat customers, the number of data requests, publications, reports, and data sets has bounced around between 75 and 125 per quarter. 10. TaxStats Internet Activity: Definition: The number of visits to the TaxStats Internet site. Visits are defined as the number of times a visitor came to TaxStats within a given period of time. The number of page views to the TaxStats Internet site. When a visitor accesses a page, it requests all of the hits on that page, including the page itself. In order to report the number of page views, the Web site analysis software separates the page hits from the other hits. These numbers make up the page view metric. Measure 8–Number of New and Repeat Customers 130 110 90 70 50 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 • • A customer is defined as an individual or organization authorizing a product or service from RAS. Web activity is not included in this measure. Data have fluctuated for this measure over the past 2 years. 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 Measure 10–TaxStats Internet Activity 9. Number of data requests, publications, reports, and data sets completed: Definition: This measure is a count of work products completed by SOI. It includes four types of work products. It captures: 1) data requests produced from a query from one of the RAS data sets; 2) publications produced according to a regular or routine schedule or as part of - 66 - Qtr 3 2004 Qtr 4 2004 Qtr 1 2005 Qtr 2 2005 Qtr 3 2005 Qtr 4 2005 Qtr 1 2006 Visits Page Views The redesign of the IRS.gov Web site in 2005 might be the prevailing reason for the lack of a spike in TaxStats visits and page views during the 1st Quarter of 2006. Performance meaSurement WitHin tHe StatiSticS of income diviSion 11. RAS Intranet Web Activity: Definition: The number of visits to the RAS Intranet site. Visits are defined as the number of times a visitor came to the RAS Intranet site within a given period of time. The second part of this measure is the number of page views to the RAS Intranet site. When a visitor accesses a page, it requests all of the hits on that page, including the page itself. In order to report the number of page views, the Web site analysis software separates the page hits from the other hits. These numbers make up the page view metric. 12. Number of mentions of SOI in major media: Definition: This indicates media coverage of SOI activities by mass media, such as the Wall Street Journal, Washington Post, New York Times, and Tax Notes. Measure 12–Number of Mentions of RAS in Media 10 8 6 4 2 0 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 2003 2003 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 Measure 11–Number of Visits and Page Views on the RAS Web site 100,000 80,000 60,000 40,000 20,000 0 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr 1 Qtr 2 Qtr 3 Qtr 4 Qtr1 2003 2003 2004 2004 2004 2004 2005 2005 2005 2005 2006 Visits Page Views Measure includes citations in the Wall Street Journal, Washington Post, New York Times, and Tax Notes. The number of media citations for SOI has remained fairly constant over the past 2 years. u References Strengthening Federal Statistics, Analytical Prespectives, Budget of the United States Government, Fiscal Year 2007, Chapter 4, February 2006. OECD Working Paper 2005/3, Handbook on Constructing Composite Indicators: Methodology and User Guide, August 2005. Data for this measure became available to RAS during the 3rd Quarter of 2003. Results clearly reveal an aberration in data. This spike was likely caused by Google search testing in June and July. - 67 - 3  Broad Quality Issues in Organizations Milleville Tying Web Site Performance to Mission Achievement in the Federal Government Diane M. Milleville, Internal Revenue Service A s the World Wide Web (WWW) continues to expand, both in size and in how it is accessed, so does the Federal Government’s dependence on it as a gateway for reaching the American public, who increasingly rely on the Web to obtain information. The role of the WWW in how Federal agencies interact with their customers has changed dramatically over the years. Federal Web sites are fairly extensive, containing a wealth of information targeted to a variety of audiences. While agencies have been utilizing the Web to disseminate information for years, little, in comparison, has been done to understand and evaluate how effective these Web sites are when it comes to agency mission achievement. However, with the costs associated with Federal Web sites, it is imperative that each agency ensure that its Web site makes a meaningful contribution toward achieving its mission. As with most things, that is easier said than done. The Government placed greater emphasis on this task, having issued an assortment of documents that each addresses the topic in different ways, but did not develop a concise guide to address the most important aspects of mission achievement assessment and how Webmasters can apply it to their own sites, leaving this undertaking largely undefined and Webmasters at a loss of direction. In an effort to help Webmasters with various tasks, the Web Manager’s Advisory Council, a group of Web managers from all areas of the Federal Government, created task groups to develop guidance that contained as much detail as possible, while remaining general enough to apply to any Federal site. Among these task groups was the Performance Measures and Mission Achievement (PMMA) task group [1], which developed a detailed single-source guide to show how a Web site contributes to mission achievement [2]. The guide condenses the vast amount of information on this topic into a step-by-step process to show mission achievement through Web site performance, while also meeting Government performance measure commitments. It was designed for both Web managers who are more advanced in their efforts, as well as for managers who are just beginning the process. Following the guide, every Federal Web manager should be able to demonstrate how their respective Web sites contribute to their agency’s missions. u Performance Measurement as a Requirement General performance measures are not new to the Federal Government. Since the early 90s, various Government initiatives have emphasized the importance of measuring performance of Federal programs. Each initiative addresses performance measures in a slightly different manner. Some added additional requirements, building on previous initiatives and improving areas that were lacking, while others reinvented the idea of Government performance measurement. But each edict has one thing in common: holding Federal programs accountable to the American public. In 1993, the Government Performance and Results Act mandated that Federal performance be measured and results reported publicly, in an effort to make all agencies accountable to the American public. This Act, which is considered to be the most significant advance in bringing accountability to Government programs [3], mandated that Federal performance be measured and results be reported publicly. - 71 - milleville Since 1993, the Federal Government has added additional requirements, which have built upon the Government Performance and Results Act. This includes the Program Assessment Rating Tool (PART), which was introduced in the Fiscal Year 2004 budget. PART assesses a program’s effectiveness and demands that Federal programs show results in order to earn financial support. The Office of Management and Budget Circular A-130, Management’s Responsibility for Internal Controls, called for the institution of performance measures that monitor actual performance as compared to expected results. There is no lack of information when it comes to what agencies need to evaluate. The problem is that the Federal Government does not provide much guidance in terms of how agencies can evaluate their programs. This is especially true for measuring Web site effectiveness. achievement is to break the process up into steps. These steps are: • • • • • Review and understand agency mission statement; Identify mission categories; Identify related business models; Map existing Web services to business models; and Develop metrics that compliment business models. Each step leads into the next. By working through each step, Web managers will be able to determine which aspects of the site are most important and will be able to match metrics to these specific areas. u How To Show Mission Achievement Determining how to show mission achievement through Web site performance is not easy, especially with the lack of guidance available. Web managers are familiar with common Web performance metrics that cover visitor traffic (including visits and page views). And while such information is valuable, these types of broad measures alone cannot be used to demonstrate mission achievement. Before a Web site manager begins this process, he or she should understand that not all aspects of a Federal Web site must demonstrate mission achievement. It is acceptable to provide features on a Web site that do not relate to an agency’s mission. Another thing to keep in mind is that agencies do not need an extensive amount of metrics in order to show mission achievement. Well-developed, quality metrics will provide much more valuable information than a report full of every metric the manager could think of. Since there is much to consider before jumping into actual performance metrics, the PMMA task group decided that the easiest way to prove mission u Step 1—Understand Mission Statement The key to showing mission achievement is to first have a comprehensive understanding of the agency mission statement. It is important to note that, although the topic here is “mission achievement,” the goals and purpose of an agency are not solely detailed within the agency’s mission. Other important documents covering strategic planning and vision also contain pertinent information about an agency and should be included in this process. The Web manager should review these documents and highlight words and phrases that are most important to the agency. Example: To show that IRS.gov contributes to IRS mission achievement, the Web manager should gather the IRS mission statement, vision, and goals, as well as any other important documents or publications containing information on IRS goals. By reviewing these documents, the Web manager would see that the IRS focuses on educating taxpayers about their tax obligations, ensuring that all taxpayers pay their fair share of taxes and that the agency concentrates on minimizing the amount it spends when collecting tax payments [4]. Key topics from this step are “educa- 72 - tying Web Site Performance to miSSion acHievement tion,” “compliance,” and “fiscal performance and cost containment.” u Step 3—Identify Business Models Each mission category relates to various business models. The PMMA task group created a matrix that allows Web managers to easily map mission categories to the business models with which they are most often associated. The matrix also indicates how often each model is used to support a mission category (indicated by: H-High, M-Medium, L-Low). It is important to note that some mission categories may share the same business models. When this happens, the Web manager should pay special attention to the models that are repeated, since those are the ones most relevant to the agency’s mission. The Web manager does not need to use all business models identified in this step. He or she should use the frequency of use indicators to decide where to start. For certain agencies, business models that are used infrequently among Federal agencies may be more relevant than ones that are marked with medium or high. In this case, the Web manager should focus on the more appropriate model, regardless of general usage frequency. Example: The three mission categories identified in the previous step relate to eight different business models: interactive tools, targeted education, e-commerce, reduce costs, recruitment, nonfinancial transactions, print forms available, and news/information. With so many models, the Web manager may feel overwhelmed and unsure where he or she should start. Within this list though, three models appear multiple times: targeted education (3), interactive tools (2), and e-commerce (2). Since these occur multiple times, the Web manager should focus on these three models, at least at the beginning of the process. Then, if the Web manager wants to explore more options, he or she can return to the full list. u Step 2—Identify Mission Categories Since the number of topics from the mission statement and supporting documents can be quite large, the PMMA task group decided to group topics into mission categories to help generalize the process for all Federal agencies. The mission categories are based on the “modes of delivery” as described in the Federal Enterprise Architecture’s Business Reference Model [5]. The “modes of delivery” detail the different ways in which the Government carries out its purpose. This organization lends itself easily to the categorization of mission statements. The modes are divided into two areas: Government service delivery and financial vehicles. Government service delivery modes involve how agencies provide services to citizens, while financial vehicle modes involve monetary transactions. Categories of Government service delivery modes are: knowledge and creation management; public goods creation and management; regulatory compliance and enforcement; and direct services for citizens. Financial vehicle modes include: Federal finance assistance; credit and insurance; and transfers to States and local governments. Example: The IRS Web manager identified three topics in step one. By referring to the guidance provided on mission categories, he or she would be able to map each of the three topics identified to a specific mission category. The topics match as follows: Education Knowledge and Creation Management Direct Services to Citizens Regulatory Compliance and Enforcement Compliance Fiscal Performance and Cost Containment u Step 4—Match Web Services to Business Models Once the Web manager has identified the business models on which he or she should focus, the next step is to evaluate existing Web site services and determine - 73 - milleville which services complement each business model. These services will be the ones that the agency evaluates, using results to show how the site contributes to mission achievement. Web service types can include general information, publications and forms available for download, and customized tools designed to help the customer obtain specific information, among others. As previously stated, not all services on the Web site will directly support the agency’s mission. Example: The IRS.gov Web manager should focus on each model separately. Beginning with targeted education, he or she should compile a list of all items or areas of the site that are related to educating taxpayers. This can include providing electronic versions of forms, publications, and instructions online, as well as tax tips. For interactive tools, the manager should determine what, if any, tools are on IRS.gov. Current interactive tools include: withholding calculator, alternative minimum tax assistant, and the refund status tool. Finally, there is e-commerce. IRS does not currently engage in e-commerce activities on its Web site. However, it does provide access to e-file partners and free file alliance companies; hence, the site encourages e-commerce. And this type of activity enhances the IRS’s ability to collect tax revenue. Therefore, the IRS Web manager should evaluate how the site is impacting tax collection. each goal. Finally, he or she will develop metrics for each question (most likely, there will be multiple metrics used to answer one question). Once the manager has a metric in mind, he or she should ask the following two questions: 1) What will be done with this information? and 2) What kind of action will be taken based on this information? If the answer is “nothing” or “none,” the metric is not worth tracking. It is important that the information collected be of value to the organization. If it is not, a different measure should be selected instead. After a metric is selected, time must be spent to define the metric—what it covers, what should be collected and how, and what do the results mean. All of this should be done prior to implementation; however, it may be necessary to collect some information for a baseline before the agency can define results. Example: Targeted Education Goal: Reduce costs as a result of providing educational and instructional materials online. Question: How do the costs for providing targeted education online compare with other materials? Metric: The amount of money saved by not mailing hard-copy information. Things to consider: Which materials should be included in this measure? How much would it cost to send out each of the materials in this measure? Data to collect: The number of downloads per each type included. Savings: For each material, the cost of mailing the item multiplied by the number of downloads associated with each item. Example: Interactive Tools Goal: Reduce costs of processing paper versions by providing online tools for frequently requested items. u Step 5—Select Appropriate Performance Metrics Now that the Web manager has made it through the first four steps, he or she is ready and prepared to start thinking about performance measures. Having completed the other steps in the process, the Web manager will be more familiar with the agency’s overall mission and goals and will be able to more easily identify metrics that will show mission achievement. The PMMA task group recommends that Web managers use Victor Basili’s Goal Question Metric approach. Using this method, the manager first sets a goal for each model and then derives questions for - 74 - tying Web Site Performance to miSSion acHievement Question: How much money is saved by customers using online tools instead of filing paper requests? Metric: The amount of money saved by customers using online tools as compared to using paper versions. Things to consider: Which tools should be included in this measure? How much would it cost to process hard copies of the items included in this measure? Data to collect: The number of completed transactions per each tool included. Savings: For the number of times each tool was used, multiply the cost of the online tool and the cost of processing hard copies, separately. Calculate the difference. Example: E-commerce Goal: Streamline and reduce the costs of the collection of tax returns through increased use of e-file. Question: What are the direct cost savings from processing electronic returns? Metric: The amount of money saved by processing an e-file return instead of a paper return. Things to consider: What aspects are involved in processing both e-file and paper returns? How much does it cost to process a print return? How much does it cost to process an e-file return? Data to collect: The number of e-filed returns. Savings: For the number of returns e-filed, multiply the cost of processing a paper return and an e-filed return, separately. Calculate the difference. should be agreed upon and deemed official. This is key because loosely defined terms may lead to misinterpretation. Limitations for each metric should be identified and clearly explained. If a Web manager does not fully understand the limitations associated with each metric, the reported result may not be accurate, and misinterpretation will most likely occur. While some limitations may have a small impact on data, others may contribute to an agency’s inability to collect certain data. Cookie usage is one of the most pressing limitations for Federal Web sites. A cookie is a small text file placed on a customer’s computer hard drive by a Web server. This file allows the Web server to identify individual computers—permitting a company to recognize returning users, track online purchases, or maintain and serve customized Web pages. There are two types of cookies that can be used on a site: session cookies and persistent cookies. Session cookies have a short life-span; they are placed on the user’s computer when he or she lands on the site and expire shortly after the visit concludes. Persistent cookies remain on the customer’s computer for much longer. The length of time is defined by the Web site, but could be 30 or more years. The Federal Government generally prohibits the use of persistent cookies on all Government Web sites. Federal agencies may be granted permission to use persistent cookies on their Web sites if they can demonstrate: “a compelling need to gather site user data; ensure appropriate and publicly disclosed privacy safeguards for handling site data, as well as information collected through cookies; and obtain personal approval by the agency head [6].” While the first two requirements are relatively easy to demonstrate, the third one is not easy to obtain. Within the Federal Government, there is a negative connotation associated with any cookie use, which makes it almost impossible to acquire personal approval for cookie usage from the head of an agency. Without persistent cookies, Federal agencies cannot collect certain data - 75 - u Next Steps The process is not complete once the Web manager has selected metrics related to agency-specific goals. Although selecting these metrics was the assigned task, there are several other things that should be considered. First, all terms associated with each metric must be clearly defined. These definitions milleville for metrics, including visit frequency, unique visitors, and first-time versus repeat visitors, among others. Next, the Web manager should determine how often data for each metric should be collected. Sometimes, it will make sense to assess metrics monthly, while other metrics may only need to be assessed on a quarterly or yearly basis. For some metrics, it may be useful to collect data for a few different timeframes. This type of analysis may show different trends, or it may help determine what drives a certain trend. Prior to data collection implementation, the agency should determine what will be done if a metric shows negative results. It is important to determine the consequences for poor performance early on, instead of putting it off until it occurs. Establishing a plan for how to handle negative results will help an agency quickly respond to (and hopefully recover from) poor performance results. number has no significant use outside of showing the Web manager what the server workload is like during a given timeframe. When developing metrics, it is of the utmost importance to spend time educating everyone who will be using the information. This process is essential because misreported or misinterpreted data may lead to poor decisions, and will highlight a lack of understanding among the agency. u Developing a Report Results from selected metrics should not be reported individually, but instead in a comprehensive report. The type of report is up to the agency. The report could be a single page, a detailed report that includes charts and graphs, a dashboard-style report, a balanced scorecard-style report, or any other style that matches the information presented. Incorporating all Web site performance metrics into one report will help the audience see the global view of the Web site and how each aspect contributes to mission achievement. It is always important to keep the audience in mind when deciding on the report style. It may be necessary to develop a few different reports, each tailored to a different audience. For example, agency executives who need this information may want a short report, perhaps a dashboard, while the Web manager will most likely want as much detail as possible, requiring a very different report. In any and all reports, data reported should be presented in a simple and clear manner. Graphics and charts that are used in reports should be carefully considered; while some graphics look visually interesting, they may not truly reflect the results and may mislead the audience, which could lead to poor decisionmaking. In addition to the results, the report should also include a statement of intent, definitions for all metrics and associated terms, and explanations of all data collection and interpretation limitations. Someone u The Education Process With the implementation of any new program, there should also be an education process. Education of both employees who work on the Web site and management who will use the results to make decisions or present the information to others is essential when it comes to Web site performance metrics. Many people assume they know what the different metrics mean, but they often do not have a good understanding of the terms, associated limitations, or interpretation issues that may exist. “Web hits” are a prime example of why education is important. Many people do not know what a Web hit is. They assume that it is the leading metric that shows how many people come to a site in a given timeframe. What they do not realize is that hits and visits are not synonymous. A hit is any element called by a Web browser when requesting a Web page. This includes images, animation, audio, video, downloads, documents, and the page itself, among other items. One single page may produce 30 or more hits each time it is requested. It turns out that this inflated - 76 - tying Web Site Performance to miSSion acHievement who fully understands the metrics should also provide some analysis of the results to help with interpretation. These additional areas will help reinforce the education initially provided and will help ensure that decisions and actions taken based on the information in the report will be appropriate to the results shown. changes, the performance metrics should change to accommodate the new focus. Web managers should also examine the metrics on an annual basis to determine if the information derived from the metrics is what was originally intended. This will certify that statements included in performance reports are accurate. By developing performance metrics that demonstrate mission achievement, agencies will not only be able to assess the resources spent on Web sites, but will also prove themselves financially responsible to the American public. In turn, this information will help raise the public’s confidence in the Federal Government as a whole. u Conclusions Although the idea of linking Web site performance measures to mission achievement sounds daunting, breaking the process into steps makes the task more straightforward. Each step also builds the Web manager’s understanding of how the Web site relates to the agency’s mission; this will help the Web manager select the best metrics possible. When it comes to showing mission achievement through performance measures, there is much more involved than just selecting metrics and collecting data. Agencies must thoroughly understand the metrics they select, the data collection method they use, and any (and all) data collection and interpretation limitations that exist. In addition, the agency should spend time educating end users of the results; everyone should understand what can and cannot be determined from the information collected. Education is, and should be, a permanent part of this process. After an initial explanation of the selected performance measures package, the agency should continue to remind users of definitions, limitations, and interpretation issues by including explanations in all reports produced. This is the best safeguard in ensuring that results will not be misinterpreted or misused. Finally, agencies should continuously evaluate and reevaluate performance metrics. If the agency’s focus u Endnotes [1] The PMMA task group is an interagency group created by the Web Managers Advisory Council. [2] The full guide is available on the First Gov Web site: http://www.firstgov.gov/webcontent/improving/evaluating/mission.shtml [3] Budget of The United States Government, Fiscal Year 2004. Section: Rating the Performance of Federal Programs. Available: http://www.whitehouse.gov/omb/budget/fy2004/performance.html [4] Department of Treasury (2005), Internal Revenue Service 2005 Data Book, Table 31. Available: http://www.irs.gov/pub/irs-soi/05db31ps.xls [5] FY07 Budget Formulation FEA Consolidated Reference Model Document (May 2005). Available: http://www.whitehouse.gov/omb/egov/documents/ CRM.PDF [6] Office of Management and Budget (2000), “Cookies Letter.” Available: http://www.whitehouse. gov/omb/inforeg/cookies_letter90500.html - 77 - 4  Survey-Based Estimation Henry  Valliant Comparing Strategies To Estimate a Measure of Heteroscedasticity Kimberly Henry, Internal Revenue Service, and ComparingRichard Valliant, aaMeasure of Michigan Strategies To Estimate a Measure of Heteroscedasticity Comparing Strategies To Estimate University of Heteroscedasticity Comparing Strategies ToTo EstimateMeasure of Heteroscedasticity Comparing Strategies Estimate a Measure of Heteroscedasticity Comparing Strategies To To Estimate a MeasureHeteroscedasticity Comparing Strategies Estimate a Measure of of Heteroscedasticity 2 2 Kim Henry1 Kim Henry1 1 and and Richard Valliant22 KimKimHenry11Richard Valliant 2 Henry and and Richard Valliant Richard Valliant 1 Henry andWashington DC DC220013-2608 Kim Kim Henry1 Richard Valliant20013-2608 Statistics of Income, Box Box 2608, Washington Statistics of Income, P.O. P.O. 2608, and Richard Valliant 1 11 Statistics of Income, P.O.P.O.Box 2608, Washington 20013-2608 Box 2608, Washington DC 1 2 Statistics of Income, P.O. Box 2608, Statistics of of Income,Lefrak Hall,Hall, CollegeDC20013-2608 20013-2608 2 University of Income, UniversityStatisticsMichigan, 1218 Lefrak Washington DCMDMD 20742 of Michigan, 1218 P.O. Box 2608, Washington DC20742 College Park Park20742 MD 20013-2608 2 22 University of Michigan, 1218 Lefrak Hall, College Park 2 1 resultsonare are simulated population data.Conclusions, then discussed in 4. results for simulated Section are variances are are discussed Section Conclusions, results estimates of totals and their 4. The Section data. Conclusions, results discussed in in 4. Conclusions, limitations, are data. in Section 4. The strategies’ in strategies’ 3/ 4 2 forlimitations, populationfuturepopulationSection Section 55 5 effects simulated and and future considerations are Section for simulatedforfuture discussed The strategies’ineffects 5 effects limitations,andpopulation data.totals and in effects futureconsiderations are in evaluated.estimates considerations their variances are then Figure 1: Generated Populations Figure 1: Generated Populations limitations, andinfuture of The strategies’ invariances 5 are 5 then Figure 1: Generated Populations and andconsiderations and are are Section considerations their Section are limitations, Section 6. of6.totals ontotals Section considerations then Section on estimatestheir variances are limitations, andand their variances are then future in Figure 1: 1—Generated Populations and references Figure Generated Populations on and references in paperSection 6. estimates of of Sectionorganized into six on Superpopulation Model and sections. After the estimates in totals 6. andand references in is references This  andreferences in Section 6. 6. 33 /44 / 22 evaluated. and evaluated. 3 / 43 / 4 3 / 4 2 2 2 evaluated. introduction, Section 2 contains descriptions of our evaluated. references in Section Generated PopulationsGenerated the the After the This paperissix six 2. Superpopulation Model Generated After sections. 2.This paper is organized modelorganized into six sections. After the Superpopulation Model and organized into After sections. This paper isThis paper andand sections. six populations. organized is Generated generated 2. Superpopulation ModelSection 2Generated descriptions of our 2. superpopulation intointo and contains Superpopulation Section 2. Superpopulation Model andand Generated introduction,contains contains 2. Section Superpopulation Modeldescriptions our Populations 2 contains descriptions of of our Populations introduction,Modeland2Generated descriptions of our introduction, introduction, Section 2 our simulation setup details, while Section 3 includes Populationssuperpopulation model and generated populations. Populations Populations superpopulation Model Populationsdiscussed model and populations. populations. Theory andand generated generated superpopulation model generated populations. superpopulationare model results Section 3 in Section 4. Conclusions, while Section 3 2.1: 2.1: 3 includes our future considerations are setup details, whilepopulation has has a relatively strong dependence SectionModel Theory and includes setup simulation setup details, first first population a relatively strong dependence 3 Model Theory includes our simulation our details, while Sectionlimitations, simulation oursetup details, in Section 5 The 2.1:2.1:includesTheoryare discussed in Section whileConclusions, population has a relatively strong dependence Model Theory aresimulation The first 2.1:2.1: Model Model Model Theory results Given results in Sectioninterest Conclusions, 4. and an 4. Givenarestudystudy variableinterestYY in an an auxiliary TheThe The first x , x , has a relatively strong has a much results Givendiscussedvariablediscussed andand auxiliary Conclusions, and and while thea second onestrongadependence resultsaand a study Theoryinterest 4. 4. Conclusions, between first population has hasrelatively has dependence discussed Section Section Y Section auxiliary The first population while a second one strong dependence 6. considerations are in Section Given a arereferences inofin of future considerationsauxiliarybetween5yy and population the the relativelyhas amuch studyvariable of of interest andand auxiliaryin Section 5 variable and of interest Y and an auxiliary between y x , while an an are Given limitations, study variable interest Y Given a study variable futureinterest in and an second of between y y x while the second much limitations, GivenX consider andsuperpopulation with 5the5auxiliary between and and, while the second one onemuch a much limitations,a limitations,considerations are Section with we variable future considerations are in Y with variable X ,,we future variable a superpopulation the fol- weaker y and x that these the one one have have a x , that populations has has we consideraasuperpopulation with variable andanda ,studyconsidersuperpopulationSectionthe the between relationship., Notewhile thesesecond has aa amuch populations weaker relationship. Note we consider a variable X and referencesin Section 6. in variable , structure: populations have a a 6. andand variableinandreferencesconsidersuperpopulation with with the weaker relationship. Note that these populations have a references X we, consider Sectionsuperpopulation the weaker relationship. Note that these populations have referencesSection 6. in following X X we weaker relationship. which resulted Note theseresulted in some modelfollowingvariable, ,Section 6. Model and Generated with theweaker relationship. Note thatthat theseinpopulations have aa structure:we consider aa superpopulation lowing2. Superpopulation structure: smallsmall non-zero intercept, which modelfollowing structure: following structure: following structure: smallnon-zero intercept, which resulted insome some modelnon-zero intercept, following y which resultedsome modelsmall estimatorsintercept, which the resulted in some modelE | |xixstructure: 1xix xi () Populationsi)| xi ) EE (2.iyMSuperpopulationModel and Generated basedsmall non-zero intercept, in thewhichearlierin study. modelsmall non-zero intercept, earlier HMT some 0 based non-zero being biased in resulted inHMT study. M y Superpopulation 1Model and Generated 0 estimators being biased in the earlier HMT study. (2. iM((Eii i | Model )andi Generated E Model)|and0012 11xi x y |( xi) 02x Generated xi 1 i M EM based estimators being biased 2. Superpopulation y M xyi i 2. Superpopulation i based estimators being biased in the earlier HMT study. based estimators being biased in the earlier HMT study. (2.1) (2.1) (2.1) based estimators being biased in the earlier HMT study. x0 Populations VarM Var| |x x x ( yiy y (2.1) Populations VarPopulations)) xixi 22xxi 2 Populations Var M (((iy)iii)|||xxi) 2 ( (2.1) Simulation Setup (2.1)(2.1) Simulation Setup 2.1: ModelVar ( yi | x ) i i x VariMTheory y i M 3. The first population has a relatively strong dependence M iM 3. Simulation Setup i i be known The The’sxare assumedvariable of interest Y unitunitthe the 3. Simulation Setup xix’s i ’s are assumed toiknown for each andi in i auxiliary Simulation assumed to be known for for each in in unit TheThe are’s2.1: assumedbetobeknowneach each i unitithe the3. 3. between y Setup x , while the second one has a much a ’s Model Theory study to The xxi’s are assumedtobe be known for each iini i in for iGiven 2.1: assumedto be known for each an in the The The’sxare Model Theorytoknownfor eachunit unit in the 3. Simulation Setup are are assumed unit i Theory The first population has a relatively strong depen2.1: Model Theory The The exponentin model (2.1)’s TheThe first populationandpopulation has strong dependence dependence 2.1: Model i The first population strongrelatively strong dependence The relatively strong in model with first population first a finitefinite population. exponent a superpopulation (2.1)’sthe This section describesdetails of has aaour simulation study, dependence relatively consider variable finite population.X ,aaweThe variable ofininterest Y and an Thissection describes hasrelativelyofourthese populations have a population. TheThe exponent of interest Y and an auxiliary describesthe adetails thatofsimulation study, exponent model (2.1)’s This section relationship.the detailsour simulation study, Given the finite population. The exponent in model (2.1)’s This section describesand detailsof our simulationone has much finite population. study variable and finitestudy variable interest exponent inan model (2.1)’s This section has y and x , , while our second one has weaker in model (2.1)’s y This between while Notesecond the second study, while has a of the simulation Given aconditionalpopulation. interest referredinamodelmeasure auxiliaryworking xmodels,thedetails designs,our muchone has a amuch variable of been been Y Givenstudy population. studyThe Y andto asauxiliary (2.1)’s includingand ,describesthesecond oneone simulationstudy, a finiteGiven has of exponent an to measure between dence between describes thesample the hassimulation study, auxiliary including and between the and x while of second section, ymodels, details designs,a simulation variance has referred as a conditional variance has been referred to as a measure between y working the sample x while much following structure:,beenconsider aato as a measure including working models, the which resultedsimulation conditional variance X , we consider the sample designs, in populations small working models, Note conditional variance hashas hasreferred superpopulation with the non-zerorelationship. Note that thesesimulation have conditional variance superpopulation with a measure variance to conditionalconsider (Foremanreferred superpopulation including workingintercept, Note designs, some wevariable | variableof X conditional (variance beenbeen or coefficientmeasurewith much weaker relationship.estimating these have simulation aa we consider we variable ,X ,variable y Xahas been referred to coefficient of strategies,the methodmodels,these sample designs, amodelincluding the that estimating working these populations heteroscedasticityx ) a 1995), referredas the of measurerelationship. Notemethod of sample that . havepopulations 1995), or with of heteroscedasticity (Foremansuperpopulation astoaastheweaker includingweaker relationship.samplethat these populations have strategies, andestimators beingestimating .designs, simulation of EM i(Foreman0 1995), or coefficient a ofstrategies, and and method ofmodels, in the.earlier HMT study. xi weakerbased weaker Note thatof estimating . relationship. methodbiased populations a of heteroscedasticity (Foreman, 1995), or of following structure: following the of heteroscedasticity i 2002). 11995), parameter is of structure: of heteroscedasticity (Brewer 2002). Thisor coefficient of strategies,small and the method of estimatingresulted some modelsmall non-zero of estimating of heteroscedasticity (Foreman or coefficient strategies,and the following heteroscedasticity (Foreman 1995), or coefficient of non-zeroaintercept, methodintercept,in which model- in in some following structure: structure: 2 This 1995), coefficientsmall non-zero and non-zero resulted which resulted in strategies, non-zero intercept, some.resulted heteroscedasticity heteroscedasticity (Brewer (ForemanThisparameter isisof small of have small the whichintercept, whichsome .model-some modelintercept, which resulted heteroscedasticity (Breweryy | |xx2002). This x parameteris of 2002). heteroscedasticityy(Brewer2002).0 Thisproduces nearly EM ((Breweri 2002).1xiparameter heteroscedasticity(EiM| (xx)i estimate 0produces parameter is of model-basedestimators being biasedininthethe earlier HMT (Brewerx i xestimate1This nearly of Var)reasonable i ))2002). parameter ofis (2.1) of heteroscedasticity 0 1 i ii 1 i This i based estimators being biased the earlier interest i ( y ) i E heteroscedasticity ( y | a |reasonable estimate producesparameterbased estimators being biasedthe the earlier in in earlier HMT study. EM interest sincesincexa M 0 (Brewer, based Models based estimators being biased HMT study. HMT study. 3.1: Models nearly 3.1:estimators being biased in in earlier HMT study. interestMsinceaxiireasonable 3. Models Simulation interest since aa reasonable beestimate produces nearly 22 Models estimate produces interest a reasonable (2.1) is ofinterest sinceare reasonableand))estimators each (2.1) in 3.1:Using Models Setup(2000) notation, we we based interest ’sxsince VarMMx(y i| estimate produces nearly 3.1:Models et. et. reasonable estimate unit 2 optimal yii since adesigns to|xx knowni for ofproduces nearly(2.1) optimal The ( xsampleiassumed yestimatorsxxof totalstotals iandthe3.1: 3.1:Valliant al’s al’s (2000) notation, based VarVarM ( i )| x ) Var2 ( estimatorsi of (2.1) and Using study. | M optimal sample ydesignsxiandi iandi i estimators totals and andUsing Valliant Simulational’s (2000) notation, we based sample designs and and estimators of totals and Using Valliant et. al’s (2000) notation, we based ValliantSimulationSetup notation, we based optimal sample designs Valliant, of 3. et. al’s (2000) finite population. designs and estimators totals (2.1)’s in This Valliant et. Setup (2000) Using we theiroptimal sample 4.2.1, exponent Dorfman,ofeach Simulation Setup totals et. following variances designs Dorfman, their optimal samplei(Theorem 4.2.1, tobe known for and 3.unitandinthe ofsectionon the following two two notation, modelsbased (Theorem The Valliant, Dorfman, totals3. estimators of describes al’s are 4.2.1, assumed Valliant, in model and Using 3. Valliant theirvariancesThe xx(Theorem 4.2.1,toValliant,iDorfman, and Simulation of totals on following twotwoour simulation TheThe’s variances (Theorem assumed4.2.1, Valliant,foreachandestimators of totals on the the the detailsworking models study, xi theirtheirThe ito’sarebe known for beunitunit ithe and unitiiestimators Setup on the following of working models assumed xiare are assumedbe(Theorem each known in the estimators totalstotals on the following two working models ’s variances ’s to known for each in Dorfman, and the working models their variances (Theorem 4.2.1, Valliant, Dorfman, estimators of working variances estimators x ) on the following designs, simulation Royall 2000). variance has The exponent to asin measure(2.1)’s (1,1Mx(1)1 :of totals models, sample two working(3.1) Royall 2000).finite population. been referred (3.1) models a M : includingx,working finite Royallconditional The exponent in in model (2.1)’s 2000). The (3.1) Royall 2000). This (x Royall 2000). population. finite population. 2000). models The exponent includemodel81 finite population. ofexponent like model (2.1) in model (2.1)’sdescribes)sectiondescribes the details ofstudy,simulationstudy, M M (( M/x2 ) x Royall Applications of (Foreman like (2.1)’s include section M (1,1/ :211and ,the method ourthe details of.our simulation study, models 1995), or coefficient of section This,11::the)the )details of estimating Applications (2.1) This This strategies, ,section describes our simulation study,(3.1) describes1 :details of of simulation our (3.1)(3.1) of heteroscedasticity Applications of ofvariance has been referred include measure ( xincluding x working models, sample designs, simulation (3.2) M , 1, x conditional variance has been measure to as aa conditional models like to alike includeas Applicationssegregationmodelsto(2.1) depreciable measure working / /2models, ) sample designs, simulation models report depreciable include M ( x , x x :2: x/)working designs, simulation (3.2) like (2.1)(2.1) (2.1) to including : x Applications hasof segregation likereportmeasureincluding working /(2models,2) sample models, sample designs, simulation models to as asreferred include (3.2) Applications of conditional variancecost costbeen referred This parameter is of conditional variance a companies has companies usingusing been referredto report depreciable including M M (( x (,xxx ::xx )) x ) (3.2) (3.2)(3.2) Mx , heteroscedasticity (Brewer 2002). x : companies using cost segregationto to report of using heteroscedasticity Service 1995), or coefficient of strategies, andthe method companies using cost segregation coefficientForm coefficient andand theM of ,estimating .of estimating . . report of assetscompaniessinceusing cost segregationreport depreciable heteroscedasticity (Foreman 1995),(ForemanTax depreciable of heteroscedasticity acost segregationestimate1995), or 1120 assets onof heteroscedasticity (ForemanForm 1120 strategies, of the method andofthemethod of .estimating on companiesInternal Revenue or to coefficientdepreciable their their (Foreman Service Taxto Formof of nearly Internal Revenue 1995), or strategies, strategies, method estimating interest assets on their Internal reasonable Service Tax This parameter is of Models Revenue Service Tax produces 1120 heteroscedasticity This assets ontheir (Brewer 2002). Service TaxTax parameter 3.1: assets onheteroscedasticity (Brewer Service Formof1120 assets on Internal Revenue parameter This heteroscedasticitytheirInternal RevenueThis 2002). isForm11201120is of (Brewer 2002). (Brewer 2002). heteroscedasticitytheir Internal Revenue parameterof Form is stimating totals is often a survey sampling objecnearly optimal sample designs and estimators of totals tive. With a model-based approach, one factor and their variances (Theorem 4.2.1, Valliant, Dorfman, University of Michigan, 1218Lefrak Hall, College Park MD MD 20742 University of Michigan, 12181218 Lefrak Hall, College Park 20742 University of Michigan, Lefrak Hall, College Park MD 20742 that can affect the variance and bias of estimated and Royall, 2000). 1.is the superpopulation structure. We consider cases Introduction Comparing Strategies To Estimate a Measure of Heteroscedasticity 2002) and (e.g.,(e.g., Allen and Foster 2005 and Strobel and 1. Introduction totals Introduction (e.g., AllenAllen and Foster 2005 and Strobel 2002) and Allen and Foster 2005 and Strobel 2002)2002) and and Foster 1. Introduction (e.g., inventory data Foster and and and 2002) 2002) 1. 1. Introduction (e.g., Allen inventory 2005 valuesStrobel Strobel and 1. Introduction (e.g., comparing Allen and models like versus actualcompanies data comparing ApplicationsFoster 20052005(2.1)actual values and and values versus Strobel values actual of where a dependent variable’s variance is proportional To Estimate a inventory data Heteroscedasticity values Comparing Estimate toMeasure of inventory of Heteroscedasticity Comparing objective. comparing inventory 2 data of values versus comparing Measure values versus include comparing inventory values Comparing Strategies sampling objective. aEstimate Heteroscedasticity 1984). actual values Estimating Comparing Strategies Strategies a(e.g.,comparing aMeasure datadata values 1984). actual values is often a survey To Strategies To Measure of Heteroscedasticityal. versus Estimate (e.g., Roshwalb 1987 Godfreyreportversus Estimating totalstotalsoftena asurvey samplingTo Kim Henry1(e.g., Richard Valliantand and Godfrey et1984). actual values and Estimating totalsisisoften often aa survey sampling objective. andRoshwalb 19871987 Godfrey et al.et al. 1984). sampling objective. Roshwalb segregation Godfrey et depreciable to et someEstimating totals is oftensurveyone factorcan affect power of totals is approach,factor thatsampling objective. using cost1987population data, data, goal goal use use variable. Various strate- (e.g.,(e.g., Roshwalb 1987 and Godfrey et al. is objective. (e.g., Roshwalb Estimating the independent survey sampling can affect GivenRoshwalb 1987and Godfreyal. al. 1984).to is assets on Estimating totals Given generated and to WithWith a model-based is often afactor that that affect Box 2608, Washington population data,our our 1984).use a model-based approach, one survey can generated DCpopulation 1 Given generated population data, our is 1120 With a model-based approach, 1Statistics of that selection Henry and InternalValliant22 Serviceour our goal is to use 20013-2608 Withvariance and bias approach, one factor can Kim affectGiven 2Richard drawdraw affect Given 2generated samples and and goal aamodel-based approach, one factor (1) that affect various strategies to population data,data, goalfrom (e.g., gies WithWith a model-basedthisoneestimated Income, isKimvarious1and Richard generated population Tax Formtois tofrom use that are model-basedapproach, one factor totals 1 P.O. Henrytheir generatedRevenue samples estimateour goal is use conceivable in estimated totalsthatand Richard Valliant Given Valliant case include: is can can to the of the estimate the variance and bias of 2 UniversityHenry1 is the 1218 Lefrak Hall, College Park MD 20742 estimate strategies to draw samples and Kim of and Richard strategies the the variance and bias estimated Kim Michigan, thevariousValliant Washington DCsamples and 2002) and comparvariance and bias bias of estimated HenryIncome,P.O.various and Foster, draw 20013-2608 Statistics of the Box strategies 1 estimated of a the the variancestructure.ofof Income,totals of Income, a P.O.Box 2608, DCto to 2005 andthese and estimate from the pilot variance toand andofbias of 11StatisticsBoxBox the the then then strategiesdraw of20013-2608estimate the from sample preliminary structural whereis various2608, 20013-2608 DC Strobel, various to superpopulation makeWe consider casestotals param-Washington strategies to the impactsamples strategies on We of2Income, totals 2608,them, Allen DC Washington draw of strategies on consider P.O. is superpopulation structure.1Statisticsestimated casestotals 2608, Washington examineimpact samples theseand estimatefromfrom where isa a Statistics P.O. them, examine 20013-2608 the 2 superpopulation structure. WeWe consider cases Michigan, 1218 Lefrak Hall,the impact cases superpopulation structure. considerconsider ofto consider where where them,ing then examine the impactMDthese thenthen University some Michigan, a College ofPark CollegePark of strategies on on Park of superpopulation variance of proportional Lefrak aa 1218Collegeexamine the impact versus dependent (2) 2structure. We WeMichigan, casessome them, inventory Collegethe variances. strategies on on eter superpopulationselection ofisa proportionalto of where estimation LefrakHall,andand values these20742 strategies the the estimates, variable's structure. is Universitytobased onHall,them,ofexamineexaminevariances. 20742strategies the (e.g., dependentIntroduction 2 UniversityMichigan,cases somewhere Hall, them,Allenanddata 20742of MDtheseactual values theand dependentvariable's variance is of main sample 1218someestimation of Park MDMD their impact andthese variable's University proportional 1218 to some variance (e.g., totalstotalstheir Foster 2005 1.dependent variable's varianceproportional to Lefrak estimation then and20742variances. of Strobel 2002) dependent variable's variance isVarious strategies to some variable's variance is proportional dependent the independent variable.is Various strategies estimationtotals theirGodfrey et al., estimationof totals and theirtheir variances. of 1987 andand variances. 1984). totals and their variances. proportional power of power pilottheindependent variable. Various strategies Roshwalb, inventory data values versus actual values estimation either of the the independent variable. about population results or educated guesses Various power of of1.independent variable. comparing of totals power power of1.Introduction include: Various strategies Various (e.g., that power the in independentmodel-based selection of 2.2: and(e.g., Allen and and 2005 2002) are and Introduction a include: (1) or strategies thatIntroductionof independent variable. (1) selection objective.Generated PopulationsandandGodfrey et2002) Strobel 2002) and conceivabletotals in this casevariable. selection of 2.2: (e.g., (e.g., (e.g., Populations Strobel and Allen 1. Introductionconceivableof case case survey sampling ofstrategies Allen Roshwalb 1987 and FosterStrobel and andand 2002) and 1. are Estimating the this eitherinclude: (1) (1) selection of AllenGeneratedFoster 2005Foster 2005al.and Strobel parameters, conceivablethisthis caseinclude: (1) selection of GeneratedFoster 2005 design- 2.2: thatthat are conceivablein often a include: are conceivableuse is this that that are (3)toinpreliminary structural parameter are to conceivablepreliminary structural(1) selection of Generated Populations in case case include: parameter 2.2: Generated Populations data values1984). actual values Populations comparingdata valuesversions of theversus comparingPopulations versus actual population inventory values versus 2.2:2.2:inventory unstratifiedversusofactualpopulation actual values in this a make a apilot pilot samplemake preliminary structural parameter affectcreated Givenunstratified population data, our goal is to use Generated inventory data data, values is to comparing created generated data values values goal pilotsample Estimating totals is oftenone survey sampling objective. Given two sample tomake totals Withsample the total. is various sample designs,comparing (e.g., Roshwalb versions of a factor based pilotpilotmodel-based preliminary aa survey that can We We created two unstratified versionsthe oural. 1984). use pilotsample to toof aFor aoften structuralparameterWe We inventory unstratifiedpopulationGodfreyet population aa estimator ofto make approach,samplestructural parametercreatedtwo generated 1987 and Godfreythe al. 1984). Estimating makemainmain sample parameter describedtwo andand unstratifiedal. 1984). population 1987 et and unstratifiedal. (1983,of of population versions of et a (2) selectiona survey preliminary based on (e.g., Roshwalb createdHansen al.versionsversions the the HMT sample estimates, is oftenmake preliminarystructural sampling (e.g.,We We 1987two two Godfreyal. 1984). the denoted population (2)is selectionsurvey sampling objective. onobjective. in (e.g., Roshwalb et et(1983, denoted HMT of sampling objective. based described createdin estimates, (2) selection of a main estimatedbased on the Roshwalbstrategieset draw samples and estimate Estimatingtheestimators, alternative of main sample basedis on totals (2) often abias strategies for totals Estimating totals selection of a sample estimating 1987 Hansen Godfrey samples denoted HMT from various in from estimates, variance model-based main sample based ondescribed various Given generated population to al. and sizes, pilot pilot resultsand of a guesses one factor based onaffect in strategiesgeneratedal. (1983, data, our goal and estimates,With selection guesses mainone populationcan described Hansen to draw population estimate goal et estimates, selection factor sample either With educated factor eitherestimates, (2)or(2) or educatedof aaboutaboutcases where adescribed sinceHansen model(1983,goal denoted HMT is to use described Hansen et et al. goal is denoted With a model-based approach, oneoneapproach,aboutfactorthathereafter), sinceinitGivenHansenimpact(1983, isdenoteduse on is to use generated populationet model (1983,We our Witheither results aamodel-based approach, population thatcan affect generated it follows al.(2.1).(2.1). data,to HMTthe model-based approach, affect population data, chose eitheraof that the varianceeducatedbiasthatthatfor simulated hereafter), since examine thedata,drawof these strategies onHMTfrom pilotpilotresultseducatedarebias consideraffect totalsGiventhem,then itfollows model our our We choseestimate results variance and We about superpopulationor andguesses of cancanpopulation is them, variousinstrategies impact samples strategies structure. guesses estimated results(3) of eitherguesses about population hereafter), sincestrategies tomodel (2.1). to useestimate or use educated model-based or totals Given then examine either pilotpilotor use educated compared population or ishereafter), since followsthe tomodelof(2.1). We chose the from (2.1). these We chose We and chose values eithervariance of or estimated oftotals about populationstrategies to drawit samples and estimate power of either a estimated the samples theparameters, andand results of variance ais model-basedthevarious equal3/4 various drawitsamples and of 10,000from units. variance the and (3) estimated guesses is or equal to hereafter), sincefollows draw estimate andfrom is the the parameters, (3) bias use either a totalsmodel-based some hereafter),and to it followspopulations (2.1). We chose varianceand (3) variousthe to 3/4 parameters, and bias (3) use structure. Wemodel-based or where strategies and 2 and their model of 10,000 dependent variable's of for followsvariances. parameters, estimatorof of total. proportional to or parameters, and use (3) use either For considersample where a 3/4them, then examine the variances.theseunits. totals for populations superpopulation consider casesestimates parameters, of of the effects where a a of cases various model-based population data.and and the total.eitherWemodel-based of equalestimation of2totals populations impact10,000 units.units. on the The We and then populations design-basedstructure. Westructure. aaonconsider cases or estimation and andfor populations of 10,000 units. design-based superpopulationconsiderForvarious wherestrategiesequala to them, 3/4 for and theirthese of of 10,000strategies on the estimator strategies’variable. Various cases thento to 3/4the 2 2examine the impact of on thestrategies Foreither a sample them, equal examineand 2 for for populations these the units. 3/4 showshow 2the populationY Xfor on each superpopulation structure. of the total. superpopulation the independent total. various sample Figuresthenand to 2impact of these strategies ,of 10,000 equal and design-based dependent variable's total. For various sample to Figures 1 2 estimator power of Y for , the population strategies each design-based various sample design-based and estimators,variance For various them, 1 design-based variance thenvariance is to some Figures examine 2 the impact of of population XX designs, dependent variable's the sizes,variance of the evaluated. designs, variable's estimator are alternativeForto strategies samplesome of1estimationshow theand their variances. for for totals the totals andsizes,and estimatorin the case total. is somefor estimation of1 and andandshow theand their variances. Y each variances of alternativestrategiesselection Figures 1 and their2 totals population , Y X, Y X each dependent thattheir andestimators, this proportional proportionalestimationtotalsand showofthevariances.population ,for , for eacheach dependent are conceivable is proportional strategies for for to Figures estimation ofvariances. variable's estimatoris alternative proportional of some show designs,sizes,sizes, and the independentinclude: strategies forstrategiespopulation 2 Populations population Y-scales): estimators, (1)VariousgeneratedFigures 1 (note(note a difference in X Y totals 2 theirdifference in Y-scales): designs, power of and independentalternative strategies designs, sizes, and estimators, alternative strategies for generated population a 2.2: estimators, are estimatingindependent variable. power strategies generated Generated (note aadifference in Y-scales): values that variance variable. Various estimating values sizes,ofestimators, alternative are comparedstrategies population Populations in Y-scales): power of the thepowerofthatvariable. Various arecompared strategies power a designs, of ofthatvariance power variable. compared for Generated (note a difference ofpilot sample tothevariance population estimatingindependent of that varianceVarious include:(1) selectionWe createdpopulation (note a difference in the population valuesareconceivable preliminary structural generated two (note difference estimating values conceivable inthispowerinclude: effects generatedpopulation unstratified versions of Y-scales): values that data. strategies’ effects estimatingpopulationmakethatin power strategies’ parametergenerated2.2:Generated Populations in Y-scales): that in valuesdata. variance power are compared thatare inofcase include:thiscaseselection of selectionof 2.2: Generated Populations case compared estimating population intovariance powerAfter the of include: (1) are of (1) are compared of for simulated population data. Thea Theselection based Generated Populations thatthat Thissimulated(2) selectiondata.sixstrategies’ effects 2.2: Generated Populations are for conceivablethisthis case of (1) strategies’ effectson conceivableis are simulated population The The sample 2.2: paper sections. for for estimates, population data. preliminary structural parameter simulatedapilotorganized make The strategies’ effectsparameter for simulatedtotalstotalstoand theirmain are structural Hansen unstratified et Populations Populations Figure 1: 1: unstratified versions pilot for make sample The strategies’ on estimates ofsample to make on pilot either athe make population containsparameterpopulation described created Generated al. the the population HMT We inFiguretwoGenerated (1983, denoted population Populations a pilot estimatessimulated totals educatedpreliminary then WeWe created WeFigure 1:two GeneratedofPopulationsof the population a sample topilotsecond and their data. variances are then created twotwo created Generatedof to of and and variances aboutare effects Figure 1: Generated (2.1). 2 introduction,estimates, of(2) selectionvariancesparameterthenbased hereafter),4unstratified versions Populationsof chose on on sampleof results or andstructural aadescriptionsthenbased on We created two unstratified versions of the populaestimatesestimates, preliminarytheirof main sample of their structural are unstratified versions model versionsWe the population on estimates ofpreliminary and guessesmain are of estimates totals section theirof variances sample variances / it Hansen et al. 2 (1983, denoted HMT 33 / 4 3Figure1: Hansen et al. Populations / since in follows selection their basedthen on described4 Figure on (2) estimates totals main sample based on on evaluated. selection(2)totalsmain samplevariances are then tion described3/in Hansen 1: Generateddenoted HMT here-HMT evaluated. selection of of a (1983, denoted estimates, parameters, and (3) use of either a populations.described indescribed 3 in al. et (1983, denoted 22HMT a estimates, evaluated. either pilot results oreducated guesses about population et 4 our This(2) either pilot results orintosections. AfterAfter the or equalHansen3 /44 /al. for al. (1983, 2 HMT 2 units. superpopulation model and generated model-based evaluated. is organized into educated guesses about population toHansen et 2 evaluated.paper is organized six six sections. the described in hereafter), since (1983, denoted of(2.1). We chose evaluated. or educated guesses about population 3/4 and populations This or paper is organized into six it (2.1). We We (2.1). it follows model 10,000 equal follows either pilotdesign-based is organized into total. For After thesample after), since itfollows model (2.1).model chose We chose resultspaper estimator of the sections. various the either pilot resultseducated guesses about population hereafter), since hereafter), sincemodel (2.1). We chose This paper includes our use of either After it it follows chose six This paper is 2 organized intosections. model-based or The introduction, paper and (3) descriptions ofaAfter the thirdThisSection organized simulationeither adetails,hereafter), since follows model section setup parameters, of either six sections.or of 2 (3) descriptions or introduction, This (3) usecontainsintomodel-based strategies the Figures 1 and 2 and 2 for populations of 10,000 units. parameters, andparameters,isand containsalternative of our Afterfor toor equal2 to 3/4show the for populations, Yof for each units. (3) use 2 either a parameters, and Section contains use of six sections. our model-based model-based to populationspopulation X introduction,design-based estimator descriptions For of equal sample 3/4and2 for for populations of 10,000 units.10,000 designs,Section and estimators,a the total. For various to 3/4 sizes, introduction,modelof and contains descriptions our our sample equal 2 populations of 10,000 units. Section 2 and contains populations. ofequal introduction, discussedestimatorgenerated populations. our to3/4andand for 3/4 and 2 of 10,000 units. Figures Section 2in contains descriptions of our of of while results are introduction, the superpopulation Section the fourthvarious superpopulation values oftotal. variance power sample generated populations. generated population 2 show the population X , , Y for each in Y-scales): design-based estimator ofmodelthat2generatedsection. ConcluForgenerated are various varioustotal. design-based design-based and total. For the descriptions estimator of Figures 1 and 2 show the population X Y for Figures 1 population superpopulation model the and generatedsample compared 1 for andshow the and(note a differencefor each generated each estimating superpopulation sizes, simulation generatedwhile Figures superpopulationand our and considerationspopulations. designs, model strategies superpopulation modelestimators, details, theFigures 1 Section andincludes futureand setupsetupalternative strategies1 for 2 show the population X , Y ,for for each 3designs, model and estimators, sions,sizes,simulatedoursimulation and The alternativewhile Sectionlimitations, oursizes,alternative strategiespopulations. 33includes estimators, alternativedetails, arepopulations. and 2 2 show the population X Y each designs,Section 3 and population data. setup details, while estimators, designs, sizes, includes our simulation details,for in effects Section includes simulationthat variance power are compared setup strategies forare while for 3estimatingour simulationvariance power whilecompared generated population (note aadifference in Y-scales): strategies’ generated population (note Y-scales): in Y-scales): Section are includes in ofSection setup includes values Section of a a difference difference results discussedvariance that setup details, results are values discussed Section are Conclusions, while population (noteFigure 1: Generated Populations fifth section and references in the sixth section.details, generated population (notedifference Y-scales): estimating values of3thatthatin ourpower 4. are comparedgenerated population (note a difference ininin Y-scales): estimating estimating values ofsimulation4. Conclusions, variance power compared E i D: D: strateg D: strate i strategy alsooptimal, weightedx ). et et al.2000, can be alsodepends on (Valliant depends x i 2000, 4.2.1). alsosample, denoted xxi (Valliant al. al.sample sec. 4.2.1). By definition An depends on oni pp( (Valliantet 2000, sec.sec. 4.2.1). selecting t balanced does not An optimal, often optimal, weighted balanced sample can Also, B AnAn There is byweighted balanced sample can xi be ByBy definitio optimal, balanced sample can approximated weighteda probability-proportional-to-be be C.definition, a huge incentive to use optimal By definit reasonable samples and by bya aprobability-proportional-to- x xx C. C. Also, B estimators probability-proportional-toin the applications we consider selecting B a ed C. Also, Also, the approximated by a probability-proportional-to- i i approximated approximated sample, i population The first population has a relatively strong dependencedue todenoted pp( x ). high data collection costs. In a cost segregation selecting m not t selecting p sample, denoted pp( ).x ). incentive to use optimal selecting theth sample, denoted pp( huge sample, denotedoften x x ). t Y and an auxiliary between y and x , while the second one has a muchstudy, for example, experts may be needed to assign does reducin There is pp( a if Henry and valliant does not does ma does notnot reasonable consider There often a a huge incentive use use optimal There is often erpopulation with the weaker relationship. Note that these populations havesamples and is often hugethe applicationsto use 7,optimal estimates o There is estimators ainhuge incentive towe optimal incentive to acapital goods to depreciation classes (e.g., 5, 15, or reasonable The first population has a relatively strong dependence due to and estimators in incosts.be time-consuming and populations. samples and Assessmentsinthe applicationswe consider samples data collection the In a cost consider estimators applications we we consider reasonable samples highandestimators thecan applicationssegregation reasonable a d an auxiliary betweensmalland x , while the second one has in much model- 39-year). y non-zero intercept, which resulted a some due duehighhighdata collection costs.beInaacost segregation populations. W reducing study,to high so, the experts mayIn a costcost to assign ifpopulations due to datadata collection costs. Inneeded segregation population to for example, smaller segregation expensive; collection costs. that yields 3.4: Estim HMT tion with the weaker based estimators being biased in the earlier have astudy. capital goods example, experts the sample size 7, to assign if if reducing if reducin (e.g., to relationship. Note that these populations study, for example, experts may be needed to15, or estimates ofv study, for to depreciation classesneeded5, assign may be needed assign reducing study, for example, experts may be desired precision, the better. (2.1) To estimat 39-year). Designs small capital goodsto depreciation classes (e.g., 5, 7, and capital capital goodsAssessments can beclasses(e.g., 5, 7, 15, or estimates oo to to depreciation time-consuming or estimates  non-zero intercept, which resulted in some model- Samplegoods depreciation classes (e.g., 5, 7, 15,15, or estimates of to Simulation Setup 3. Simulation n for each unit i in the estimators being Setup in the earlier HMT study. expensive; so,Assessments can be time-consuming and the smaller be be time-consuming based biased 39-year). Assessments can time-consuming and and fit a given 39-year). Designs 39-year). Assessments canthe sample size that yields 3.4: Estimati 3.2: Sample desiredeach so, i better.population,sizesize that four 3.4: Estim expensive; the smaller the sample we that yieldsfour expensive; (2.1) expensive; so,so, the smaller the sample size that yields 3.4:3.4:Estima in model (2.1)’s section describes the details of our our simulation study,For each unit the insmaller the samplewe consideryields Tosquared re Estimatio For precision, the population, the consider estimate This This section describes the details of simulation 3. Simulation Setup desired precision, better. desired precision, thebetter. desired precision, thethe (wor) sampledesigns: erred i as a study, including working models, sample designs, simuTo given ch unitto in themeasure including working models, sample designs, simulationwithout replacement better. without replacement (wor) sample designs: To To estimate estimate w fit a estimat 3.2: Sample Designs 995), or coefficient of strategies, and the method of estimating .. (1) srswor: simple random sampling. squared given fit resid model (2.1)’s lation strategies, and the method of estimating a aarepeat and given ForSample Designs the 3.2: Sample Designs 3.2: ppswor: thein Sample i 3.2:(2)each unitDesigns population, we consider four fit fit given w This parameter This section describes the details of our simulation study, is of (1) For each simplei Hartley-Rao (1962) method with four squared Forres srswor: unit in random population, we consider four squared re sampling. we consider squared al residu as a measure including working models, sample designs, simulation in population, we withouteach unit i in the sample designs: (wor) For For probabilities of selection proportional to a measure each replacementthe the population, consider four unit i mate produces nearly 3.1: Models (2) without replacement (wor)sampling.method with probppswor: the Hartley-Rao sample designs: coefficient of Models and the method of estimating . strategies, (1)without replacement (wor)sample designs: srswor: simple (wor) sample designs: random (1962) without of size (MOS). replacement and repeated which co mators is of arameterof totals and Using Valliant et. al’s (2000) notation, we based (1) srswor: simple random (1962) method with abilities simple randomproportional (2)(3) ppstrat: strata are formed in the population by srswor: selection sampling. (1) (1) ppswor:of simple randomsampling. to a measure ofand andrepeate srswor: the Hartley-Rao sampling. and repeat repeated t Valliant, nearly Dorfman, and estimators of totals on the following two working models probabilities of selection proportional to a measure For all s alternative oduces (2) ppswor: Hartley-Rao (1962) method with (1962) strata size (MOS).the Hartley-Rao formingmethod with Using Valliant et al.’s (2000) notation, we based esti(2) (2) ppswor: the Hartley-Rao (1962) method with ppswor: the 3.1: Models (3.1) of cumulating an MOS andproportional to with equal whichFor all For For str all al M (1,1 : x ) size size. are corre probabilities of srswor of proportional measure probabilities selection measure probabilities of selection proportional selected from of totals and mators of Valliant et. following two working we based Using (3) ppstrat: strataAnofselection one unit isto a toaameasure total (MOS). formed in the population by cumu- homosced like (2.1) include totals on the al’s (2000) notation, models: /2 which cor (3) of sizeanstratum.and forming strata with equal ppstrat: MOS (3.2) latingsize strata are formed in the population by totalwhich corres of size(MOS). of (MOS). (MOS). M (x x :x ) which are co Dorfman, and estimators of totals on the, following two working models alternatives each to report depreciable occurrence (3) ppstrat: strata formed in in sampling. Ppswor ppstrat: an MOS formed in population are balanced the population (3) (3) cumulatingstrataare formed thethe strata with equal alternatives ppstrat:bal: weightedone unit is selected by by each alternatives strata are of and forming population by (3.1) M (1,1 : x ) (4) wtd srswor alternative (3.1) size. An An srswor of one unit is selected from from ervice Tax Form 1120 total size. using an and andforming strata with equal homoscedast cumulating MOS MOS forming strata with equal cumulating an MOS forming strata that equal cumulating an an MOS andare selected withsatisfy 2.1) include than 5 un /2 samples (3.2) which arecas (3.2) M (x , x : x ) homosced stratum.size. An srswor one oneunitselected from each stratum.srswor of of unit is is selectedsample homosceda total size. total from total size. An An srsworof one unitis selected from homoscedastic rt depreciable Also, unr occurrences which for which are (4) wtd particular conditions on the population andPpsworwhich areare u (4) eachbal:stratum. balanced sampling. Ppswor wtd stratum. bal: weighted balanced sampling. weighted each stratum. x . each ax Form 1120 moments of i occurrence occurrence Model (3.1) is the correct working model, i.e., the equal thre than 5 cases samples weighted balanced are selected that satisfyoccurrences fo using an MOS selected that Ppswor (4) et al.bal: (4) (4) wtd bal: weighted balanced sampling. Ppswor wtd wtd using an MOS aresampling. Ppswor bal: 1960; Cochran 1977 pp. 124-126; Model (3.1) is the correct working model, i.e., the one (e.g, Bryantsamplesof weighted balanced sampling. satisfy For samples conditionsMOSarepopulationthatsatisfy of than cases f each1977 pp.an MOSthe selected that samples thanthan55case these designs, are selected andsatisfy we drew 1,000 sample del (3.1) is the correct working model, (2.1). Model (3.2) isBryant et al. 1960; Cochran using an MOS are selected that satisfy Bryant 124-126; one equivalent to model i.e., the one associated 5 particularconditions 1977 odel (3.1) is the correct workingcorrect working model, (e.g,associated (e.g,1960; Cochran 1960; Cochranon 1977 in accounting sample Also, for cas th Model (3.1) is is model (2.1). Modelone (e.g, the one et al. Bryantsamples500usingwhichon usedpp. 124-126; the ppstrat containsall et samples using 124-126; al. equivalent Sitter andparticular1960; Cochran Skinner 1994), Model (3.1)tothe model, i.e., the model,is the correct working (3.2) i.e., the one (e.g, Bryant et al. 1977 pp. an is the populationˆ and i.e., pp. 124-126; x uivalent to model Model (3.1) is correct is associatedmodel, i.e., the one (e.g,(e.g,Bryantand al.conditions on onthe 1977pp. 124-126;sample Also, forfor a Model (3.1) Model the correct working model, i.e., the is the correct working i.e., and Model following(3.2) isworking model,Sitter and Skinner 1994), Bryantetis usedandwhich ison population 124-126; (3.1) model(2.1). associated Sitter the Skinner (e.g,and100particular iniCochran used population and is uivalent to model (2.1).toModelthesuperpopulation structure: one oneSitterBryant momentsof units.Liu is2002). inpp.accounting sample equal three (2.1). tomodel(3.2) Model structure: associated applications etparticularconditions the 1977accounting 1994), which al. 1960;x .accounting theMoreisspecific which et 1994), . When1977 in 124-126; used conditions used of in Cochran with the following superpopulation(3.2) is isassociated Sitter and Skinneris1994),1960;Cochran theMOS pp. and ,sample particularal.1960; accounting population and Also, for Also, all moments population with the (Batcher which equivalent equivalent (2.1). Model (3.2) Skinner h the following superpopulationstructure: Model (3.2) applications (Batcher andandmoments of .which is isis pp( accounting and wtd 2002). equivalent to totomodel (2.1).Model (3.2)associated (Batcher Formoments of xMore specific drewx in selection is 1994), ththe following superpopulationsuperpopulation/ 2structure: isisassociated Sitter and design2002). 1994),. givenweused in1,000 samples of applications applicationsSkinnerof these arewhich usedused specificof equivalent model (2.1). Sitter Liu Skinner More specific More) specific Skinner of Liu 2002). equivalent Sitter these approximates optimal in 66-67 accounting each designsi xx 4 p with the following y model(2.1). x Model(3.2) details andLiumomentsanddesigns, with the following structure: applications(Batcher 1994),i i. which in pages accounting (Batcher equal /thre equal three to E ( superpopulation structure: x details associateddesigns onbalgiven in and andLiu2002). More “deep stratification” equal3three withwithfollowing | xixi 2002). following 1 ) superpopulation 1structure: on theseapplicationstheseof ofunits.aredesigns,of weto1,0001,000 samples applicationseach theseareandIt Liuin2002).Morex ˆspecific of Liu designs100 and(Batcherthese 66-67similar drew1,000 samples of are (Batcher pagesLiu is thewe drew 1,000 of E M ((yi i |with the1/ 22xMi / /22 isuperpopulation structure: xi ithe the following1 xi 1/ 2 / i2 //2 structure: on thesedetails on areal.givensampling.designs,ofweMore 66-67the ppstrat of contains the )) superpopulation i details applicationsx500of these When wein pages specific (Batcherdesigns, 2002). drew samples (3.3)detailsForthese(2000). and given EM y | x i For designs 2 ValliantonFor each in pages 66-67 MOS is etFor each of these designs,pages More ,specific each designs given drew 66-67 of samples 1/Ex y y x x ) EVar(i (|iy|i |)ix ) 1/ 21/22ix ii/ 2 / 2 / 1 x1i xiiValliant et al. (2000). on on these designs are given in pages 66-67 of xx contains contains M( 1/ x details 2 x Valliant et al. (2000). (3.3) Valliant et 100(2000). designs When the pages is ˆ 66-67 of M details et100anddesigns are given in in is) 66-67 ˆ,oftheppstrat contains theth( 22 M ExM(Myi(i|yxi|)xix ) 1/ 221/ii2 xi(3.3)11 i1 xix details these 500 units. When MOSx pages ,x x , ,the ppstrat population tn on theseunits. When given al. and 500 E M i iy i Valliant 100 and 500 units. optimal the MOS selection ppstrat al. approximates When pp( MOS is ˆ and wtd 100 and(2000). 500 x VarM ((yi i | | xi i)) ofdesign(2000). units. are the the MOS is thethe ppstrat population (3.3) 1 i VarM y x xE(M( y(x ix|i ))i 2 x22x 1/ 2 xi ii Valliant et al.al. (2000). 3 / 4 (t (3.3) (3.3) 3.3:Valliant et al. (2000). population Valliant et VarM Myi |ii |i ) ii Var i Strategies approximates optimal pp( )x x ) )selection and wtd population pop (3.3) design approximates optimal design VarMM( yi( |yxi|)x x ) 2 ixi 2i x2i x design approximates optimal pp( x selection and and (3.3) bal stratification” Var M (i yi | ) design xapproximates optimal pp( “deepselection and wtd 3 / 43 popu selection wtd (3.3) Var 3 / /44 pp 3.3: Strategies The strategies sampling. It is similar toselecting a pilot wtd Working model M (3.3) iisi called i thethe3.3: Strategies 3.3: Strategies we sampling. consisted of to “deep stratification” minimal model 3.3: Strategies Working minimal model Strategies xsampling. It isItsimilar to to “deep stratification” bal sampling. balbalx x examined It is similar orking model (3.3) is called the (3.3) is called The orking model Workingmodelmodel p. iscalled modelminimalstrategies we examined consisted of selectingisasimilar“deep stratification” (3.3) called (3.3) the minimal the Strategies examined consisted Working is et. al. 2000, is 100) associatedminimalmodel we examined consisted of selecting a pilotselecting a a pilot (Valliant model(3.3) minimal model with the model The3.3:bal sampling. It is similar to “deep stratification” above The3.3: Strategies examinedestimate ofof followed by called the The strategies 3.3:strategies wepreliminary consistedpilot selecting pilot a to get a strategies we (Valliant et associated with the above minimal model alliant et. al. 2000,Workingal., 2000, p. 100) calledthe study to getmodelstudy (e.g.,estimate of examined consisted of1977, a pilotpilot Working100)model (3.3) calledabove thestudy themodel The The strategies examined consisted a selecting pp. pilot (3.3) with the Working model p. If100)thewere minimal above The strategies alliant et. al.(Valliant 100) variance. (3.3)isiscalled unknown, above a preliminary estimatewe al.,followed byof of selecting a a 2000, p. et.model2000, p.is (2.1) associated with toabovea preliminary Bryant et p. 1960; examined consisted followed by selecting 124-126; 2000, the but conditionalal.associated100) associated withminimalget study strategiesawewe of selecting Cochran,followed Both (Valliant et. al. associated with the the study to get preliminary estimatea ofof a followed by a a get a maintosample or preliminary estimate of sample. by by a nditional variance.(Valliantal.were 2000,If 100)butassociated with above the only Sitterget preliminary estimateBoth offollowed by aby a ap(Valliant et. et. al. 2000, 100)100) the unknown, but conditional variance.workingassociated with thethebe above only selecting preliminary estimateBoth get a a main sample. main of (Valliantis et.2000,unknown,were the main but the a or studyselecting a only 1994), associatedmain but above with nditional variance. If (2.1)variance. If p.(2.1) were(3.3) may thethe or study to totoandaSkinner, sample.which is used followed If (2.1)small, unknown, model unknown, sample were If p.(2.1) but unknown, sample followed al. p. (2.1) were in accounting study get preliminary estimate main intercept conditional variance. conditional were rounding only et al. ercept is small, conditional model (3.1) IfIfthewere(3.3) unknown, reason-mainsample (e.g, onlyetselectingpossibility sample. pp. 124-126; conditional variance. isIf (2.1) correct unknown, were the main with the Bryant selecting a Cochransample. Both 124-126; the main ercept is small, workingstarting workingcorrectcorrect options model,crossed the (e.g, orpossibilityal. the al.aamainCochran 1977 124-126; workingsmall, workingmodelworkingmay butsize. crossed with the possibilityselecting 1960; sample. 1977 model (3.3) (2.1) working model, i.e., i.e., one sample or only et of rounding a working interceptis issmall, workingcorrectwere(3.3)sampleabuti.e.,theoptions(e.g, Bryant onlyal. 1960;1960; mainof1977 pp.Both pp. 124-126; is Model is(3.3) correct be (3.3)may bebemodel, one plications (Batcher and Liu, 2002). More specific details the (e.g, crossed with et Bryant 1960; a a main 1977 124-126; conditional(3.1)place themay bewere aworkingwereathei.e., mainone BryantBryant selectingmain Cochran pp.pp.Both variance. the(2.1) unknown, model, thethe sample ororor al.ofal. 1960;main sample. Both124-126; but Modelvariance.theis iscorrectworkingmodel,beai.e.,theoneone sampleBryantonlyal.1960; Cochran 1977 pp. Both (3.1) is is formay working model, i.e., i.e.,theoneone(e.g, (e.g, et etetselecting Cochran 1977 pp. 124-126; the model aworkingmodel, the intercept Modelsmall, theiscorrect working options i.e.,i.e., options (e.g,sampleBryantet therounding Cochran1977 pp.pp. reasonable Model(3.1)(3.1) determining model, interceptModel Model isthe correct workingmay i.e., themainwere(e.g,Bryant withal. 1960; Cochranrounding124-126; model sample. 124-126; Model(3.1)working correct workingmodel, a oneone were crossed al. al. 1960; Cochranconcern four (3.1) is the model (3.3) may be the options (e.g, Bryant main possibility ofof rounding 124-126; correct 1960; Cochran 1977 Model Model small,(3.1) a sample size. Model Bryant and Cochran with sonable starting interceptModel (3.1) model (2.1). Model (3.2)(3.2)associated werecrossedet with1960;possibilityis rounding in accounting interceptfor determiningworking model sample model, i.e., oneone(e.g,(e.g, Bryantal. the comparisons1977 1977 124-126; is Model is asonablestarting placeModel(3.1) Bryantworking1960;working mayi.e.,beaisoroptions (e.g,and ourandet concernCochranof usedisin accounting placeModelissmall, modelal. size. a(3.3) model,Thus, a 124-126; wereSitter et et Skinner Cochranis1977pp.used for starting isis the model determining the model model or (3.2) (2.1). Sittercrossed the 1994),Cochran pp. equivalent (3.1) afor determining sample x is be the ournot.Sittercomparisons 1994), 1994), 66-67 accounting ableinterceptisplacetotototomodel(2.1). aModel(3.2) size. pp.associatedthese andSkinner1994),1960;four isusedisusedinpp.in accounting equivalent determining Model(3.2) Thus, our (3.1) reasonable equivalentplacemodelis(2.1). Modelmodel,isassociated options Thus, Skinnerare1994), possibility1977inrounding124-126; with 1994), four reasonableequivalentforthe sampleproportional not.mayandtheoptions (e.g,were andet Skinnerpossibility which1977 accounting equivalent toto model(2.1). Model sample (3.2) the Sitter BryantSkinner al. the possibility used in Valliant ing model, i.e., thestartingequivalentthemodelcorrect(2.1).(3.3)(3.2)i.e., associatedmain SitterandSkinner1960;which which usedpp. inaccounting onestarting small, of correct (2.1). Modelto is isis associated onSittercrossed Skinner1994),which isofusedin 124-126; (e.g, equivalentplace determining Model not.1977 is one one Sitter designs al.with 1994),which used used accounting and pages equivalent model i (2.1).Modelasamplesize.is associated Thus, and Skinnerthe which concern four which comparisons comparisons (2.1). a sample When starting placeforet determiningCochran size.size.associatedSitterBryantSkinner 1960; which concerninfouraccounting the (3.1) place correct working model,i associatednot. Sitter crossedmain1994), which is rounding variance forfor determining a(3.2) y equivalent to toto is to(2.1). workingorModelsize.size. orassociated andand main concerninwhich isusedin of More specific reasonableequivalent modelsuperpopulation(3.2) (3.2)isisassociatedmainThus,our SkinnergivenwhichisLiuisisof inaccounting et orstrategies:Sitter and (Batcher1994),and is Liu usedaccountingspecific not. Sitter applications (Batcherwhich is used 2002).accounting reasonableis startingmodelto(2.1).and Modelsample is accounting Sitter Skinner 1994), Liu 2002). in in specific starting Skinner (2.1). When the variance ofwiththewithto following xdeterminingstructure:associated Sitter and our(Batcher comparisons2002). More specific yi the to place (2.1). and strategies: reasonable proportional superpopulation(3.2) used in or not. applications Skinner and andLiu concern four specific When the associatedthe withfollowingsuperpopulationstrategies: and associated applications (Batchercomparisons 2002). More four of withthefollowingy y tosuperpopulationstructure: isthe and following superpopulation is proportional proportional (3.2) and equivalent the ofof is issuperpopulation structure: or not.applications our main 1994), Liu 2002). More More the structure: applicationsmain comparisons concern specific equivalent following superpopulation structure: Model and our (Batcher and Liu odel (3.2) isvariancewithwiththefollowingmodeli1994), whichtox(3.2)associatedoral.applications our(Batcherand Liu2002). 2002).accounting Thus, (Batcher andand Liu 2002). More four withiequivalent modeliforxproportionala isi xis withthe following isuperpopulationstructure: the following i Model to/ 2structure: strategies:applications main (Batcher whichused More accounting applications(Batcher comparisons concernspecific Thus, applications (Batcher and and 2002). More specific specific WhenwithSitterfollowingsuperpopulationstructure: of strategies: (2000). Skinner 1994),and Liu 2002).MoreMorespecific variance combination of auxiliaries, one Whenytheavariancetosuperpopulation/ 2 structure: applications Liu not.a details Thus, i E ( yi Whenthe linear Esuperpopulationstructure:xand |withwith following superpopulation/ structure: xi with following i is ) the variance , ( (Batcher 50 are given details on these designs designs 2002). pages specific following applicationsthese thesedesignsare estimatepagesMore66-67 on of withis the ofEM (y iyyi(|x)( |proportional/i2tox/xto 1 a xand A: drawdetails onx onpilotdesigns are aregiven in pages specific structure: applicationsthese (Batcher aregiven giveninin pages 66-67 When variance ( M y ) one1/ofxix2/x/1/22 x draw pp variance( x ) oneofx 2 2 A: x i details units,estimate designs given in More 66-67 ioni structure:linearWhen thethefollowingyof|xMy|superpopulation2002).xaMore xstrategies: 50detailsthesethese areandgiven inpagespages66-67ofof of of applicationsy((Batcher ) proportional2ixx pp ((yy | |xxi)) is a linearWhen thevariance (ofof|of|ixiixxisisxand2/1/LiuA:i1structure: (( specificdetailsunits,thesedesigns,units,given 2002). and66-67 ofofof detailspp on onthesedesigns, and given in More 66-67 2 E )auxiliaries, detailstheseestimate is a E E(My( combinationtheEauxiliaries,yy)x|proportionaldrawiofofiand x)) pilot detailsonon )(Batcher andandareLiuin ininpages 66-67of66-67 combination ofEauxiliaries,)iisi proportionalxixone ii1and strategies: ofapplications designsandare given given in pages ofof designs Liu are are pages specific i i (M ixi i)|i ixi i)ii ) 1/1/1/1//1//ixi i2ix/ 2x 1toixi i i 2 i x / on thesedesigns Liu 2002). pages 66-67 details Valliant al. (2000). 66-67 strategies:on Eyi |i |y of 50 i x M EE E MMMM( yi i 1/ xi2 2 1 1i 1 1 (E i 2 of 50 E y | | ) is a linearMcombination ofof2x2 2resultsxhold: pages 66-67 detailspp sampletheseof(2000). estimate (, and 66-67 of A: pilot mainValliant et(2000). are ), estimate M M i yix|i x ) is a linear combination 2 auxiliaries, one ˆ Valliant (2000). /2 details al. details onx ˆthese (2000). designs in selectof Valliantx pilotdesigns ˆ x these Valliant ( al. pilotof 50 E MM( ( y(i|yxxi)xiisdetailsEon( y(combinationxi22auxiliaries,main ofA: drawValliantet etal.(2000). designsˆ given in pages, ),66-67 of of which isiy|i ) ix ,atwoaimportantyioptimalityare2 igiven11 in ione sample(3.3)aValliant()onet(2000).ofof(units,estimate pages and 66-67 E M (Myii |Var) x| 1/(1) i |M linear results|i i)designs1/) / 2i x 2select 1ia main sampledrawaapp (onetet()al.ppstrat50units,units,given , and a linearE My(x ( |y)( iy of auxiliaries,i one of E MiM (i i x optimalitycombination)x1/)x2 21/auxiliaries,x (1) ) linearM combination of2 2 auxiliaries, one (3.3) Strategiesal.x(2000). units,areppstrat in x pages A: A: aValliantxx etal.using pp xare), givenestimate Valliant et ettheseppstrat linear ( yi x|xxi) i) 2i| (1) i E pilot draw pp ( al. ), al.(2000). a Vary( hold: x using pp app xal.xpilotpilot ( x ), Varresultsyxxxx)i) xxxx x11 xi 1x draw et al. ) ) (2000). ˆ (3.3) pp((et al. (2000). ich is xxi ,, two important|isi )isis VarVarcombination22ofof2x xiselect a one ofof(3.3)drawValliantppet(2000). pp (( 50ˆˆ units,ppstrat ( , and and (3.3)A:usingValliantˆ)),(2000). pp (x50), ), estimate ˆ ), ),, VarMy M (3.3) Valliant( ˆ )etusing VarVar i | |ihold: hich is i i two important, optimality MM(M ((2000).i xi i results hold: (1) (3.3)(3.3)aValliant sampleal. (3.3) i | (3.3) main Var VarMy ( al. select a bal which selection probabilitiesM(ixyoptimality ixi2i ithe anticipated (1)select wtdmainsample using ( xˆ ˆ ˆppstrat ( ( x xˆ ˆˆ ˆ is which is , two important i )|Mixii) ) 22 and which isxix two important|(optimalityx i results hold: (1) (3.3)selectmainsamplesamples. twoimportantyoptimality i and wtd (1) important | i) The (3.3) Valliant etVar |(ixiyioptimality resultswtd bal ˆ VarM i xi ) xi x which is xi ii , ,two twoMM ( yi that) minimize xresultshold: (( xxˆ ˆselect 3.3: mainxsample using ppx x ppstrat (x x ˆ a which thatminimize ( y anticipated i resultshold: (1) )andselect3.3:3.3: sample using ( , , Varimportant| optimality ppstrat )samples. Strategies which hold: e selection probabilities isisxixminimizetheiM|anticipatedii results hold: bal(1) (3.3)(3.3)abalStrategies using pp pp x (), x ), ),ppstrat x (), x ), ), ˆ Strategies 3.3: 3.3:ˆ Strategies e selection probabilitiesthatthetwo importanti ioptimality theandanticipatedandsamples.a( mainˆ samples. thethat minimize anticipated 3.3:Strategies wtd 3.3:Strategies bal Strategies xStrategies Strategies samples. The selection probabilities thatthat is called called the minimaland3.3:srswor, ))ˆsamples. examined consisted of selecting a pilot variance of probabilities minimize is the minimal x) The selectionprobabilities(3.3)(3.3) (GREG) the minimaland drawbalstrategiesˆ we examined consisted)ofand selecting selection iWorking model minimize anticipated ˆ TheselectionWorking modelestimator the the estimator modelwtdThe3.3:strategies we we, examined consistedwtd selecting a pilot B: 3.3:3.3: ((The xppswor ( x ) ppstrat ( x is thethe draw srswor,modeland(wtd )Strategies ( examined consisted , selecting pilot model Thebalxstrategies x ) , and Working general (3.3) minimize called theminimalmodel modelStrategies)we examinedwtdconsistedofselecting aa pilot model regressioniscalledthetheminimal model wtdThestrategies samples.examined consisted selectingaapilot Theppstrat) we we, and wtd bal xppstrat we ( x iance of the The TheWorkingprobabilities(3.3) isis called B:minimal general regressionmodel (3.3)that (3.3)called the anticipated model 3.3:strategiesstrategies)examinedconsistedofofselecting aaa pilot WorkingStrategies (3.3)isminimize anticipated probabilities x ),,strategies( Working (GREG) (3.3) is called draw srswor,model 3.3:Thestrategies samples. consisted of ofselecting pilot The Working general thatestimator calledtheminimal riance of thevarianceselection general regressioncalled theB:estimatormodelppsworThe strategiesweweexamined consistedselecting a pilotpilot general ofof 3.3:model regression (GREG)the anticipatedB: drawTheStrategies we x , ppstrat consistedofwtd regressionmodel (3.3) iscalled estimator Working(GREG) that (GREG) model minimal model wtdstrategies examinedppstrat Thesrswor,( ppswor examined consisted, ofandof of Working model (3.3) minimize model is minimal ppsworsrswor,Strategies examined selecting pilot Working model minimal selection probabilities x the variance(Valliant model 2000, (3.3) p. p.(GREG) with with thebalpilotThe strategies (apreliminary (estimate followed by the wtd variance(Valliantet.et.al. xet. 2000,100)(GREG) associated theabovedrawxsrswor, ppsworaexamined consisted )),andofselecting pilotpilot of the(Valliant regressioncalled the minimal with the The ( studygetwe (( x )preliminary (Valliant et. al. (Särndal, Swensson, and associated proportional general(3.3)p.examined associatedminimal B:above )strategiesawe examined estimatex , selecting a bya Working Swensson,p. 100)associated the above The main samples preliminary estimate Working to al.2000, p.is100) called estimator the abovestudy to getget apreliminary ,estimatex x (3.3) p. associated minimal the model study to wea the estimator B: studysrswor, getpreliminary selecting of(Särndal, et.Swensson, and associated xof themodel above to study a ppsworx(),, ) , ppstrat of of wtd ed proportionalare model(Valliantmodelal. regression100)consistedwithselectingB: (drawstrategiesatopreliminaryestimate( x()ofof)ofeach. followedabyby the minimaltovarianceofthegeneral2000,2000,100)associatedestimator abovedrawonly andgetgetpreliminary)estimateofofofselecting followed a The ) x variance(Valliantal.al.al.i2000,p.isp.called(GREG)(with main above The studystrategiespreliminaryconsisted(ofin)ofwtd followedby (Valliant generalal. 2000,is 100)balwithwith above xWorkinget.et.et.modelregression associated withthemodel astudyTheandto getaget preliminary estimate andfollowed aby followed of (Vallianttheet.al.al.variance. 100)(2.1) balthe unknown,above pilotstudy toestimateonly and ppstrat estimate, followed by p. followed get a a e proportionalvariance(Valliant thegeneral2000,If 100)associated (with))thetheaboveB: studytoget estimatewe(examinedestimate of,followed abybyaaaaa a to proportionalstrategies (Särndal,If Swensson, estimator samples draw srswor,ppsworaxinxeach.estimateof andfollowed pilot a xiconditionalet. 2000,we 100)100)associated and main samples studytoto getpreliminary ppstrat consisted and wtd by (Särndal,variance.regression Swensson, xwithbut the the study to ppsworpreliminary estimate x followed and werewere and i (Valliant to x x 2000, p. (2.1) (GREG) only to samples onlyin each. ( conditional are proportional sec.variance.preliminary wereunknown, followed (study main to get orpreliminary estimateinmain sample. a Both are are aboveconditionalal.variance. IfIfIf(2.1) estimateunknown,butthe(strategy to getsample onlyandonly selecting ofeach. sample. a (Valliant variance. (2.1)were model-based p. wereunknown, but bal (Valliant variance.variance.Swensson,unknown, thethe but main A, get preliminary estimate maina followed by ) estimate conditional (Valliant to variance. If(2.1)100) were were with but butthe thestudy rounding onlyonly estimate ofa sample. by with conditional geti x 2000,If (2.1) associated unknown, by x a sample preliminary estimate maineach. followed Both sociated with areare conditionalet.tovariance.IfThe(2.1)Swensson, ofand buttheabovexmain tomain asample toandestimateaamaineach. followed aby Both the proportional to al.variance.(Särndal, (2.1)unknown, butbutthe( by)x)main samples aonlyorselecting aaone-half. selecting Wretman conditional12.2).a(2)p. 100)(2.1) wereunknown, above the study samplea orororonlyandselectingaofinmainsample. Both 1992, conditional to2000,(Särndal,associated unknown, but the a mainsampleor onlyonlyselecting mainsample. Both onlythe nearest study selecting ˆ only and estimate mainmain conditional 2000, p. 100) (2.1)Swensson, Wret- bal C: conditional ixal. x (Särndal,associated with theandbal main mainsample oror or selecting a aaamainsample. Both If If were and the proportionalet.to et.toi(Särndal, optimal wereunknown, abovebalmainmainsampleonlyonlyselectingmaininsample. Both Both and sample. are proportionalvariance. iworking(2.1) were unknown,be abebexmainmain samplesonly selecting sample ) thesamples one-half. each. etman 1992, sec. 12.2).proportional small, small, IfworkingC:(3.3)(3.3)butroundingamainthe sampleoronlyonly and estimate insample. Both (2) intercept isi isi If (Särndal, Swensson,(3.3)rounding ( ˆmain) mainor one-half.selecting main insample. Both The optimal model-based weremodel may may the (axto samplesamples selecting sample. Both sample nearest retman 1992,Wretmanconditionaloptimal(2) (2) ofoptimal model-based maybetheaabal ˆ to werewereˆ crossed with possibility of each. sec. 12.2).interceptaisvariance.working modelbalance thatA, the (2) working intercept12.2); small, Theweightedmodel-based strategy nearest man, interceptsec.is12.2).small,working modelunknown, A, C:beaaaoptionsoptionsor onlytowith selectingmain sample.roundsec. and working intercept 12.2). (2) onlyoptimal model-based were unknown, but1992,interceptsmall, small,The optimal(3.3)main may bebe strategyxsample crossed crossedthepossibility sample. of Both the 1992,interceptvariance. (2.1) modelunknown, but may Both A, were or orˆ with nearest theapossibility rounding 1992,The issmall,or working model(3.3) sample conditional certain model-based wereC:model maymaybut optionsA, were crossed the nearesttheone-half. ofsample. willconditional small, working model(3.3) may C: main rounding ˆ with the a one-half. of have interceptsec.issmall,typeIf (2.1) modelstrategy be a a main options were selecting a main Wretman intercept isisissmall,Theworkingmodel(3.3) may sample. strategysample crossedtowiththe theone-half. ofrounding (3.3) intercept the with possibility rounding sample rounding main sec. 12.2). (2) The optimal a (3.3) options rounding ˆto selecting possibility of Both mple will have Wretman type 1992,sample startingfor selectingmodel-basedbebeaC:size.optionsnearestcrossedthethepossibilitymainofrounding Both a certainreasonable startingbalance place formodel-basedmay bestrategymainrounding onlyonlywithnearest one-half.rounding weighted placeworking model (3.3) The 1992,a small, working determining aaasample rounding optionswere crossed with mple (3.3) sample willreasonablestartingplaceofofthatoptimal(3.3)thata thatsize.size.optionsthewerecrossedwiththe thepossibilityofrounding a Wretman interceptsec. starting placemodel determiningsample strategytoB,A, werecrossed thethe thepossibility ofrounding certaina1992,ofiscertain12.2).of weighteddetermining B, beC: optionsA,wereroundingˆ withnearestpossibility ofrounding typeofsec. 12.2). balanceoptimal balanceasample size. a options rounding one-half.the nearest one-half.rounding reasonable small,(2) thatdetermining maya rounding tothe werecrossed is working determining 4.2.1). size. D: a reasonableplace for fordeterminingsample maysize. strategy were crossed to our main possibility of concern is certain placefor fordeterminingaasample size. options were crossed with small, type determining model-based size. working 2000, sec. that model a may reasonable certain (2)placeweighted D:strategysamplesize. optionswerenearestone-half. the possibilityof rounding modelwill havesampleintercepta weighted typeplaceforforthe D:(3.3)(3.3) B, sampleingstrategynot.crossedour ˆtotothethe possibilityconcern four four may Wretmanhaveisstarting crossedThemodelbalancesamplesize.be C:oroptions A,were crossed withcomparisonsofconcern four be intercept starting typeplaceforweighted balance of rounding ˆˆoptions rounding with nearest one-half. rounding have sample reasonable x starting placeal. determining asample D: strategy B, roundingThus,main comparisons willreasonablestarting et for strategy willhave were reasonable starting the possibility reasonable (Valliant possibility of concern four reasonable a aia starting type weighted balance sampleD: strategynot.not.not. Thus, with nearest one-half. concernconcern four or or Thus, our tomainnearestcomparisons rounding also depends havecertain type of ofwith determiningthat on options possibility Thus, or Thus, B, orornot.rounding ˆ ourthe comparisons main comparisons comparisons Thus, main comparisons o depends on sample will willet al.the startingvarianceweighted balance that andx andorornot.rounding ˆourmainthe comparisons concern fourfour xisample will et Whencertain type ydetermining a sample to D: strategy not.Thus,Thus,ourtheour nearestcomparisons Thus, main comparisons concern four not. not. Thus, to main main one-half. concern four reasonable thea variance 4.2.1). isproportional sample reasonable starting place fory for y y sec.4.2.1). a4.2.1). sample o depends onalsosample havehave(Valliant etetofofyisofproportionalato xxisize.i D: or or not.B, Thus, ourˆ mainnearest one-half. concern four x (Valliant onxal. 2000,the etfor weightedproportionalx and or and strategies: ourˆ ourthemain nearest one-half. (ValliantWhenthevariance of al.,iof2000,isec.4.2.1).that x the sec. is balance ermining a sampledependsreasonable(Valliantplacey2000,proportionaltotosize.size.strategynot. roundingour ˆtotomaincomparisons concern four When certainsec. 4.2.1). determining one-half. concern alsoi size. Whenweightedvarianceyofdeterminingproportionaland andfourorB,not. is Thus, tomain thefor comparisons and four four Wheni When variance iiy is proportional i xi i i definition, there also dependsWhenstarting varianceial.iyyisiisproportionaltox Anxxto andorstrategy B, roundingour usedcomparisons concern four dependsor When2000,place ofof imainiiscomparisonstotoiandand orstrategies: no our onnot. thevariance the balanced proportional is proportional not. strategies: not. When theThus, our variance the (Valliant et al.i 2000,is can to bei i strategies: An dependsonWhenthevarianceetal.of 2000,sec.sec. 4.2.1). xByandiD:strategies:Thus, our main comparisons concern optimal,on onxii xvariance of ofal.is2000,proportionaltoconcernstrategies: Thus, srswor sample 4.2.1). strategies: A strategies: strategies: also also depends xthei(Valliantcan yet isyproportional to there no strategies: for sec. alsoand balancedi |is sample combination ofdefinition, xto and no srswor used for strategies A and (Valliant iibeal. Byproportional one is strategies: used on x xthe(Valliant combination of auxiliaries, of one ofstrategies: 2000, sec. auxiliaries, strategies: 4.2.1). and strategies: optimal, weighteddepends| E ythesample linear yofcombination of to xthereiis andsrsworA: draw astrategies A andofstrategies estimate , and When|a linear of variance is iproportional When isivariance linear can is definition, to n proportionalAn xioptimal,When)))i(is is)ilinear a combinationofofauxiliaries,ionexdefinition, there is draw pppppilotpilot strategies A and , and , and optimal, weightedEEM(((strategies:aalinearsamplebesampleauxiliaries, iBy ofof of A: draw a no srswor used for units, estimate balanced balanced combinationbe approximated abalanced Eweightedxi )linearcan ( By canauxiliaries,one of A: there no ( pilot ) pilot of 50 units, for and xiM|x avariance of combination auxiliaries, By definition,drawD correspond usedassuming estimate optimal,E(EEEy(i(yMxyby xyi)isisaprobability-proportional-to- x oneoneofof A:A:andA:aispp(( srswor )xofoffor units,estimateand, ,and weighted ))) aisaisbalanced sampleofofauxiliaries,one one B draw a app( ax)))pilottoof5050strategies A estimate 50 is combination ofauxiliaries,one of x pilotof units,units, 1 of 50 50units, estimate ,and y xx i An optimal,Mx(i|i)weighted alinearcombinationauxiliaries, can be y|( i|yyii|i|ixi i a linear combination of of auxiliaries, C. Also, draw drawaapp (x( pilotpilotofunits,units,estimateandand linearcombination can bei one A: A: drawpp (ppx() xxx))()pilot505050units, estimate ,,,and | is M approximated weighted linear EMM and A: A: a a pp )x pilot for units, estimate ( of1 of M M yM i An AnEEoptimal, i is )a linear combinationC. auxiliaries, oneof corresponddrawassuming srswor usedunits, estimate, there pp no proximated by Anoptimal,iiM| ixiy)) weightedbalancedxi sample auxiliaries,be ByofByof Bdrawdrawppa(ppx ) sampleused50forunits, ),ˆAppstratˆ ( and ),ˆ weighted combination sample canbe and Also, select there is pp srswor 1 pp pp estimate A be linear )combinationunits, ( linear balanced of Also, B one of of proximated approximatediywhich|isi,atwoaimportantxoptimality results and C.Also,definition,to isano no ( pilot pilotfor(strategies1 1for(andˆ ˆ), ( ˆ ), by one probability-proportional-to- i ofC. Also, xestimatehold:definition, aselectmain)xpilot ofusing50ˆstrategies and, xand x probability-proportional-toEM y A: draw is ,pp (balanced sample denoted oneA: A: draw there sample )used 50 units, estimate A by a optimal,xiby x aisprobability-proportional-to- canresultsBycorrespond toamain(issrsworusing 50 units,ˆ),estimate ppstrat, x on of auxiliaries, aaprobability-proportional-to-pilot sample, auxiliaries,Ddefinition, ppswor, correspondtotoassumingˆ (strategies for,and),ˆand ofM (( E by i x a xa probability-proportional-to-B hold:Dof and A:andselectassumingpilot of 50 forsamples, which ˆ A: a a pp main x wtd of B draw main a sampleusingpp D ppstrat, xsample using x ˆ xestimate x approximated |is ispp( x,twoimportant optimality results iihold: (1), (1) selectselectamainsamplesampleassumingppstrat (for(x(andˆ), ), which is is ixx,twoxi two important optimality iresults selecting(1)selectdrawmainsampleusingpp(pp(xˆx),),),ppstrat( ( ( x(x),x ), , ), ,important optimality results hold: Also, B select D main samples,to pp (bal( two important50 results hold:hold:ppstrat, selecta correspondusingto of x ˆ optimality C. (1) ppstrat i twoimportant optimality resultshold: C.(1) which whicha i,x,a,probability-proportional-to- xx ppswor,hold: select andmain sample usingusing),xxˆx ),, ppstratxforxxˆ ), isisisiis twoprobability-proportional-to- x hold:(1) Also,selectaamain sample andusingpp( (xx(ppˆ ˆ( ppstrat1 ˆ for x by twoimportant optimality results ppswor,(1) (1) theBandDa aDsample usingwhich assumingppstrat ), ), ppstrat ppstrat sample,which x xi, , i ). important optimality results hold:(1) Also,and a maincorrespond assuming ), ppstrat denoted pp and which two important optimality results C. select wtd bal sample usingpp which which B Dcorrespond toassuming ), 1 approximated i selecting probability-proportional-to- i (1) mple, denoted approximatedisisby byiitwoimportant optimality ( resultsppstrat(1)C.(1) selectaamaincorrespond )wtd pp’s,ppˆˆ(), xppstrat1( axˆˆ(), x ˆ ), ). which x , twoaimportant usingselecting),thehold:(1) ˆ ˆ mple, denoted pp( approximatedx is,two ,).important optimality resultsithexselecting),theandwtdwtdppstrat,ˆ)samples.ppbalxxsamples, ppstrat ), pp( xxwhich selectionaprobabilities that that minimizeanticipatedppstrat, and and balpopulation)using samples, which ). whichpp( i). probabilities optimality the hold:does selectwtd samples, which and main a wtd using ), ppstrat matchmainppstrat, ) samples. andandwtda wtd (ˆˆ ( using whichThe selectiontwo incentive minimizeˆthe matchhold: pp ( imality resultssample,which selectionprobabilitiesthat minimizenottheanticipated ˆ populationbalbal(sample(willsamples.samples,which x hold: (1)Theselectionxxmain probabilities does minimize itheselecting theppswor,balbal(xbalˆandx ˆ wtdbal but willwhich sample, denoted oftenselection samplethat minimizenotresultsanticipatedthewtdwtdour( (sample)and wtdbal(bal samples,be( pp( denoted ii probabilities optimality xoptimal anticipated andppswor,ppstrat,samples. Theis selectxa x huge important minimize the anticipated andthewtdmain balx)sample be (a ThereTheis selection).probabilitiesthatminimizetheanticipatedx notselectwtdbal’s,x (xbutˆ) )ˆsamples. wtd bal samples, which use The probabilities that minimize the the selecting and wtdbal’s, ˆ ppstrat, and anticipatedand ppswor, ˆbut )xsamples. our and ppswor, ˆ x x ˆand be samples. Theselection x toprobabilitiesthat minimize anticipated selectingandwtd bal ( x) xsamples. samples. samples. ppswor,bal TheTheselectionprobabilities that tothat the the anticipated populationand(balppstrat,samples.a selecting match our and ’s, will sample,huge denotedpp( probabilities that minimize match our notmatch our (populationsamples. but will bea a denoted pp( pp( use ˆoptimal anticipated andthe bal population ’s, There is oftensample,The selection togeneral) samples.does (GREG) doesnot and advance (bal ˆ() samples.wtd but ,will be which, and wtd a The denotedof the x the).general regression hugeselectionwtdof xuse optimal incentive bal ). sample, Theand a hugeincentive toto minimizeanticipated There is oftenTherevariance a probabilities that minimizeconsider anticipated B: drawdraw choice ) ppswor)(, ppstratin many ) wtd a The selection probabilities that regression (GREG) estimator wtd wtd srswor, for and our population wtd draw ppswor samples. estimator B: x x applications selection generalregression (GREG) estimator samples variance ofofoftheprobabilities thatuse(GREG) estimator not B: B:forB:srswor, ppswor ( (sampling)will(will) )xbe,and, wtd and variance of theincentive estimators often the the reasonableB: B: bal srswor, ppswor ,ppstrat Thereisvariance thethe generalregressionwe(GREG) estimator often optimal does variance general The isincentive in (theincentive to(GREG)anticipated selection the , minimize estimators varianceoftenthehugegeneralregressionuseoptimal advance does notdrawsrswor,xx ppswor ( ppsworppstratppstrat,)),()and) wtd varianceof a generalregression reasonableestimator choice matchsrswor,population x ) )xbutx ppstrat and wtd drawsrswor, )ppswor many ),(x,) ppstrat be and Therevariance ofthetheweincentive minimizeoptimaladvance choiceB: drawsamplingxppswor xx,)x,’s,),ppstrat (((, )( andand wtd often regression (GREG) does does match our population draw srswor, in mples and the anticipatedvariance ofofahugegeneral regression (GREG) estimator B:B:drawdrawsamplingppsworx(samplingbut((x(xx)x,x,andxawtd wtd in variance oftenahuge consider reasonableoptimalpopulations. draw consider theppswor )’s,ppstratin willx and awtd for choice rounding ppstrat Therethe applications general regressionuseuse optimalreasonablematch srswor, srswor, samplingppstratmanya wtd and is applicationshuge incentivetoto (GREG) estimator mples and estimatorsThereisestimatorsgeneralconsider to(Särndal,optimalestimator B:B:B: drawour ppswor (manyxppstrat(inD )toxbe,,and wtd inanddata often thethecosts.regression(GREG) Swensson, and notadvancesrswor, for ( (( xin’s,,but theareisof the inwegeneralregressionuse(GREG)estimator ThereisB:of theingeneral incentive we(GREG) estimator wtdB: draw srswor,main samples) onlyppstrat xmany , wtdeach. a the applications weppstrat estimator hugexregression (GREG) use and( reasonable bal We srswor, samples(only ,andCand estimate )each. wtd advance) srswor, ppswor ( , )x,) estimatex) ,,see and choice in ( x ) ppstrat ( ( and wtd of regression consider samples and variance proportionalxi to xiaxcost(Särndal, Swensson, the rounding(xin ) mainD to xonly and estimate due to high proportionalgeneral applications we consider and samplesare estimators thetoapplications populations. x )consider variance collectiontothe In(Särndal, Swensson, ,and drawmain and x sionhigh data samplesvarianceareproportionaltoxto xx(Särndal, ,consider (Weestimatorand advancemainxchoiceforforsamplingppstratmany each.in each. (GREG) collectionandestimators srswor,toppswor(Särndal,Swensson,reasonablebaldrawbal mainCppsworonly samplingin in ) many in in estimatorare proportional into applicationspopulations. We consider the roundingsrswor, for onlyand estimate in each. balx)( )main Csamplesto seeand estimate in each. ) ppswor for and estimate and draw in the and ( x )segregation andand ( (advance samplessamplessampling estimateineach. areproportional segregation(Särndal, weSwensson, reasonablebal x x) ) main samples only only and (Särndal, Swensson, areare proportional in e to high data samplesand and proportional tothexiapplications consider populations. balbalconsidersamples onlyonlyCandestimate seeineach. In a cost proportional ( areestimators (Särndal, Swensson, and and i we )main samples leads estimate each. e to collection costs.proportionalsegregation cost to Swensson, and In cost in (variabilityininsamples consider reasonable(advancechoice samples see and DD to in i the and D in We the rounding due samplesaredataproportional totoiapplicationssegregationmodel-based thebal(’sxxx)mainchoice ˆ ’s andand toestimateinin in each. high costs. estimators inxix ix be cost TheSwensson, if and a (Särndal, consider study,are dataproportionalto may i(Särndal, we Swensson,and andbal bal )consider theroundingonlyandandimproved for and collection costs. i (Särndal, Swensson, assign duesamplesexample, aexperts the InIn aneeded Swensson, populations.bal x(x()main mainsamplesonlyinestimate toinin many to highare estimatorsto costs.12.2). ifsegregation variability collection proportionalsec. tosamples only andoptimal are may collectionto 12.2). (2) Theifoptimal Swensson, reducingWe xC: consider theimproved Ctheand toseeto one-half. proportional 1992, xiassign a cost segregation in () main to the ) in D rndal, example, totoandhigh data1992,sec.costs.iassignIn(Särndal,estimate model-based Weˆˆconsidersamplesrounding toestimateone-half. each. Wretmanneeded costs. model-based dy, for Swensson,dueare Wretmancollection12.2).In(Särndal,reducingmodel-based inbal strategystrategytosamplestoandˆnearest nearesteach. populations.strategyconsider rounding the tonearest to see the due to totoWretman 1992,sec.12.2). (2)InTheoptimaloptimalifpopulations.the(C:x strategysamples’s’sˆˆonlyinnearestone-half. high Wretman collection12.2).12.2). costoptimalmodel-based due expertsexample,1992,maincosts.bea(2)The5,optimal or ifreducingC:C:( strategyA,roundingˆonlyˆandCinnearestone-half. highdata1992,1992,sec.12.2).xi(2)cost costsegregationandpopulations. strategymaintheimprovedˆ theto C improvedeach. data be experts tosec. needed optimal model-based C: C: We leads roundingˆˆˆ toleadsandthe one-half. ’s A, A, roundingˆtoto C improved Wretman) sec. classes (e.g., to 7, model-based reducingvariabilityA, A, therounding totheestimate Done-half. strategyxleads rounding leads nearest see the to estimate x 12.2). dueforWretman1992,1992,may12.2).(2)(2) optimalmodel-basedeach. each.We main rounding roundingthe nearestD one-half. dy, for example, expertshighbalcollection sec. (2)(2)Thea(2)reducingmodel-based inbalC:strategyA,rounding ˆonlythetotheandnearestin see may be( needed may (2) The optimalmodel-based Wretmandepreciationcosts. TheThesegregationvariability C:C: variabilityininA,A, rounding nearest and one-half. 1992, sec. 12.2). neededThe assign segregationestimates strategy A, variances. Wretman sec. The assign optimal capital Wretman will goods to study, Wretman data1992,have a certain type of to 15, balance that ofC: strategyrounding ˆ ˆ to to the nearest one-half. example, study, for sampleexperts sec. be A, rounding leads to improved totals and weighted sampleC: will1992, 5,may15, be to of to of balance will weighted totals ital goods to study, forfor classes haveexpertsmay typeoftype to nearestbalancethatreducingD: strategyrounding ˆ ˆ ’s’sthe nearestimproved depreciationWretmanhave certain typeneededweightedassign and variances.variabilityA, roundingtothe nearestone-half. Wretman 1992,haveaaclasses (e.g.,weighted balance and leads to improved Wretman will(e.g., have a (2)or(2) of the assign one-half. C: variabilityroundingrounding nearest one-half. sec. optimal oror pital goods tocapital goodsexample,(e.g.,aA,certaintime-consumingmodel-basedthat that strategystrategythethe ˆ ˆˆto ˆtheleadsnearest one-half. depreciationAssessments certain 15,typeofof5,5, optimalbalanceif if variances. strategyin inin the study,goodsexample,have 5, arounding(e.g., weightedassign that samplesample will certain be The estimates balance that strategy A, weighted model-based he optimal model-basedsamplewillhave aacan7,bemaybeofof weightedbalancethat reducingstrategyA, rounding ’stoto thethe tonearest one-half. variability B, study,sampletoto 1992,experts7, 12.2).of ˆneeded7,totoassign thatreducingD:totals andB,rounding ˆ ˆˆtothetonearestone-half. one-half. sample example,expertscertaintype Theweightedtotals that C: strategy variances. willwill a study, example, expertssec.may orTheestimatesmodel-based forclasses sec. 12.2). (2) typeneeded 15,and estimates D: strategy A, rounding rounding ˆ to the one-half. the samplestrategy 12.2). typebe optimal balance if of 7, will have have certain type depreciation classes 39-year). sample depreciationcertaintypeweightedweighted balance D:D:D:strategyB, rounding ˆ toˆˆ toto ˆnearestone-half.one-half. D: capital forsamplewill have acertaincertainneeded15, ofbalancethat C:ofofD:C: strategyB, B,roundingtoˆtotothenearestone-half. strategy thethenearestone-half. estimates totals and B,B,B,rounding the thenearestnearest rounding nearestone-half. one-half. D: strategyvariances. one-half. strategy B,variances. year). Assessmentssamplebe totohave a x aclasses (Valliantweighted ororthat that of totals and rounding cansampledepreciation classes (e.g.,2000,balance sec. be time-consumingtype(e.g., 5, et 2000, and capital sampleto depreciation on classesetweighted7,15, 4.2.1). 4.2.1). D: totals and variances. to tothe nearestone-half. or certain will have certain classes al.2000,balance will depreciation (Valliant on (Valliant al. yields 4.2.1). capitalalsoAssessmentscanx be(Valliantetweighted7,sec.sec.that D: D: strategy B, B,B,rounding thenearest one-half. goodsalsowill on can(Vallianttypeal.et5, al. 15, 4.2.1). -year). Assessmentsgoodsdepends onhavethexisampleofˆetal. that7,andsec.sec.estimates ofofstrategyrounding ˆ to thenearest one-half. can also time-consuming(Valliant(e.g.,2000, sec. 4.2.1). and of weighted balance thatalsodependsdepreciationtime-consumingnearestbalanceestimatesstrategy and rounding ˆ to ˆthe nearestone-half. alsoso,dependsononxi xxroundingof(e.g.,the5,2000,sec.estimates strategy B, variances. capitalAssessmentsonaxcertain xtypeetetal.ofal.5,15,and4.2.1). 4.2.1). D: D:of B, rounding goods depends B, (Valliant sizeal.7,2000,sec.one-half. 15, 4.2.1). totals capital goods depends on (Valliant nearest rounding to the nearest one-half. 39-year). alsoalsoD: strategy ixi i ibe itime-consuming 2000, 3.4: Estimation strategy expensive;also to smaller the depends 39-year). alsodepends xon depends al. 2000, depends et 2000, toal. 2000, also Assessments icani beyields pensive; so, the 39-year).depends onsize i that (Valliant 3.4:2000,sec.andof4.2.1). smaller the Assessments can (Valliant etet 2000, and and 4.2.1). sampleonsize that be time-consuming sec. 4.2.1). 39-year). depends time-consuming sec. pensive; so, expensive; so,depends onweightedyieldssizeal. that 2000, and can be By definition, there is no srswor used for strategies A and the 4.2.1). alsosample xthe(Valliant time-consuming sec. be 39-year).AnAssessments weighted etsize sample on can (Valliant3.4: et also optimal, sec. et al. 2000, sec. smaller theoptimal,better.theiweightedbalanced yieldscan3.4:can By definition, there is no srswor srswor used for strategies desiredalso so, Ansmaller xiiweightedbalanced al.yields 4.2.1). be be offollowing Roshwalb no precision, optimal, can sampleet al. Estimation3.4: that 39-year).AnAssessmentsweighted balanced sample sample estimate definition, there no is(1987), used for strategiesand be balanced sample of expensive; Anthedepends weighted bebalanced Estimation4.2.1). EstimationBy definition,isno thethe smaller (Valliant An optimal, xsamplebalanced sample can balancedsample of An optimal, weighted time-consuming By definition,there issrswor used for strategies ired precision,theexpensive; optimal, weighted sample Tosize sample can3.4:bebe ByBydefinition,thereis isnosrswor usedforforstrategiesAand and By of srswor expensive;Anoptimal, smaller the balanced thatsample ,canToEstimationdefinition, thereno isnosrsworused forstrategies AAAand size that yields An By definition, is no srswor definition, siredprecision, desired AnAn so,optimal,weightedbalancedTosample yieldscan bebeByBydefinition, therewetheresrsworusedwe forstrategiesA Aand A and thebetter. so, thethe better. samplebalancedsampleyieldsbe 3.4: Estimation (1987),and isiteratively usedtoiterativelyA andand for better. optimal,smaller the expensive; optimal, smaller the sample estimate cancan 3.4: Estimation ofof B there iteratively used strategies so,the weighted abalanced estimate cancan Estimation,definition, there D nosrswor usedfor for strategies and the weighted sample sizesample yieldscan be x C.By definition, there is no srsworused forstrategies Aand desired precision, definition, there probability-proportional-to- be ABy definition,thereBDandissrswor usedfor to strategies1and1 1 for precision, correspond for strategies assuming following x C.,Also, there correspond wefor assuming 1 1for Roshwalb optimal,by balanced that sample To be i Roshwalb Band Dis nocorrespondto we iteratively 1 and An so, the better. weighteda probability-proportional-to-beC.C.Also, B (1987),D Dcorrespondusedassuming expensive; An approximatedaa abyprobability-proportional-to-xxix C.andByAlso,BBandandDDcorrespondtototoassuming 1A forforfor size that ,followingC.C.C.Also, Also,and weno correspondassumingassuming 1forfor An optimal,smaller by probability-proportional-to- estimate definition,andRoshwalbno correspondassuming A 11for optimal, better. the balanced sample strategies By Also,Also,there iscorrespond toto assuming weighted no srswor used for can To srswor strategies A approximated a isprobability-proportional-to-xestimateAlso, BB BandRoshwalb (1987), toassuming strategies D(1987), used of the nced sample desired precision, thethe by by aa probability-proportional-to-axx C. working modelcorrespond to the assuming can beapproximated better. approximated by by By probability-proportional-to- Also,C. following approximated by by probability-proportional-to- x i ixi given following D and srswor approximated the better. probability-proportional-to-estimate ,Also, andand D correspond weiteratively B probability-proportional-to- iiestimate , following Roshwalb (1987), desired approximated precision, regressed log approximated a approximated precision, To xi C.forC. Also,ppswor,correspond toand bal bal samples,1for 3.2:desired approximated bya aprobability-proportional-to-fitTomodeli Also,regressedandcorrespond (1987),wtdsamples,1which for Sample Designs B and Da corresponda to assuming ix To1estimateAlso,and andRoshwalb thetowtd we iteratively forwhich , B and D D correspond andassuming following Roshwalb (1987), assuming fit a given working model and selecting the ppswor, ppstrat,assumingbal samples, which the D of and log we samples, for iteratively B ppswor, ppstrat, wtdbal wtd C. working, followingandcorrespond assuming the which Also,regressedD) aslog of the theto balsamples, 1which B modelxtheppstrat, and wtdtoiteratively selecting and desired sample, denoted by ). ). ). precision, the better. C. approximated C. Also, : Sample Designs approximated bypp( x probability-proportional-to- xii given selectingthetheppswor,ppstrat, and wtd bal samples, which ility-proportional-to- xapproximateddenoteda pp( x probability-proportional-to- Aselecting the ppswor, ppstrat, and wtd bal samples, which sample, by selecting on B ppswor,ppstrat, i andworking log( 2:Sample DesignsSample sample,denotedpp(pp(x).).). ).x we fit given workinga given selectingtheppswor,ppstrat, and the log ofsamples,which a selecting the thethe and ppstrat, squared selecting the model ppstrat, and wtd bal bal samples, which residuals ppswor, regressed log of the which ppstrat, selecting i sample,denoted ppswor, follows: sample,sample, sample, denoted 3.2: Sample Designs the pp(pp().xpp( Designsdenoted pp( ). consider four fitfitfit log(x )selecting theppswor, regressedand wtd samples,thewill For sample, denoted denotedx x x). unit i in 3.2: eachsample,denoted population, sample, denoted pp(x onalog(givendoesthe ppswor, ppstrat, and and wtd bal ’s,willwhich working the of regressed xdoes noton match our regressed ) as the model as a does follows: matchfollows:and wtd logbal ofwillwhich selectingnotmatch ) ourppstrat, population but the will in theSample denotedpp(isxpp(hugea incentive wtd bal samples,givendoesnotmatch modelourand population’s, ’s,butwillwill be a consider not and ppstrat, Sample sample, Therepp( xx ). xfour hugeand residualsfit optimalgivenfollows: 2 ppswor,population thebal samples, thebebebe Designsdenoted often huge squared does working selectingthepp( a ). theconsider ppswor, 3.2: population,isweoften aa huge ).incentive to use optimaloptimalasnotnotmatchxmatchpopulation ’s,’s,’s,butbutwill beabe does not on log( )our population the ’s, but the Sample iDesignspopulation,four consider use use on fit does doesworkingxour asandourregressedwtd but willwill which population,iswe ispopulation, we incentive four optimal a residualsdoesmatchmodel our population bal samples, be r each unit ii3.2:eachsample,Designsoften sampleappstrat,squared four tosquaredwhichselectingppswor,ourpopulationwtd thebutlogsamples,beaa aabe a each unit Forin 3.2:sample, denotedoftenoftenhugewe considertoresidualssquaredselectingnotnot match ourfollows: and ’s,’s,logbutofbutbewill a a a log( There match follows: ) population match incentiveuse squared use There the often aaahuge incentive usetouseoptimal residuals not log( ) ) ) ourfollows: iThere often oftena huge incentive totouse optimal does in is often unitThereinisisisoften huge hugeincentive totouse optimal 22 residuals on on rilog( xaspopulation , For each unitThere (wor) a Thereis without replacement is designs: to log( There optimal huge incentive There in the population, incentive fourbut optimal)doesdoeslog( on xadvancepopulation samplingbut be a a There designs:population, population to usewe-log( r be residuals xadvancex )asas follows: for ’s, but in will amany incentive optimal squared residuals )match ourlog( xi for ’s, but will be be squared a reasonable ,, log( population ’s, will population forsampling many but not not match we the For eacheachunitThereestimators our hugethe consider use optimalreasonablematch our choice for unit i in inmatch a choice hout replacement (wor)sample andtheand estimators applicationsusefour we i doesreasonablei2advance population for’s,sampling ininin in will does 82 For replacement notoften sample designs: consider four considerreasonable x )advance choice for sampling samples often a designs: consider thoutreplacement (wor) samplesandrandom apopulation,we applicationsoptimalconsider notlog( riadvance choice ) ,)forsamplingsamplingmany many reasonable samples andi (wor)oftenhuge incentive to ’s, we consider reasonable )2 ) our choice log( There samples and isestimators in applicationswe consider choice centive to use(1) Foreachsamplesisandestimatorsinthe theapplicationsweconsider-rireasonablelog(advance choice xi xchoicesampling in inmanymany optimal sample iandandthe sampling. applications we consider repeated the processadvancelog( reasonable 2 choice sampling many srswor:There isestimatorshugethe applications toweconsider simple without samplesunitdesigns:estimators theinincentive useweconsider reasonable log( advance ˆchoicefor for sampling in inmany many withoutsamplessamplesestimatorsin theincentiveapplicationsconsider )reasonableadvance2advance ii for sampling in many replacement estimatorsin in the applicationswe optimal applications log( samples (wor) sample in the we to in consider and stabilized. We consider for sampling in srswor: simple withoutsamples andhighsample ininapplications wecost segregationpopulations.irr2i )advance consider,xroundingC CandD Dandmany see sampling.and (wor)collection costs.repeated costprocess until ˆ populations.considerthethe rounding and and D to withoutsamplesand estimatorssamplecosts.andforIn In segregation manyreasonableWeiuntil choice xirounding ininCand andintoseesee replacementto estimatorsthechoiceapplications consider reasonablelog(Werconsider choice ) ,) sampling C and to seesee (wor)collection costs. costs. , dueand high collectiondesigns:andaa acostsegregation populations. advance considertheroundingin C C DD toto to random sampling. applications awe consider data (1962) methodIna costsegregation populations. log(2consider the rounding C and D to see srswor: simplerandom replacement datasampling.in applicationssamplingconsideruntil populations.Weconsiderthe)forthe, roundingCin DDsee seeto randomduetohighdatadatadatain thedesigns: with costsegregationpopulations. WeiWeconsiderthelog(irounding inininandCmanyD see sampling.estimatorssample costs.InIncostwe the processreasonable advance )We log(thethei for in in in C andtomany (wor)collection designs:Incost a segregation populations.log(consider the rounding sampling to to see choice rounding duesamples data collection high in samplesduehigh collection the due simple data totohigh D pplications we(1) srswor:toreplacement estimatorsthecosts.In Inrepeated we segregationthe processi untilconsiderlog( xrounding populations. We ˆ stabilized. due duehigh data collection (1) considerduereasonablecollection costs.costs. costcost segregation populations. We ri) ˆ ˆ stabilized. (2) withoutsimplehighdataadvance costs. InIn a a segregationrepeated populations.We ppswor:tototo random collection costs. a a cost the and repeated thestabilized. Hartley-Rao collection srswor:totheand estimators in segregation (1) srswor: (1962) method sampling. duedue high example, experts ppswor: the Hartley-Raotohigh datarandom experts mayIn may costto and to assignreducing variabilityˆ,stabilized.the forcedCtoand improved if if process until reducing Wevariabilityrounding ˆ to improved study, method proportional variability the in in if consider srswor: forforforexample, sampling. be a neededneeded and populations. We considerˆ in the was’s ˆ ’s leads andtosee Hartley-Raoto highhigh datawithexperts maywithbeneeded to assignto if ifpopulations. until consideritthe ˆˆ ˆ’sˆ’sleads and D improved study, Hartley-Rao (1962) method a needed populations. We consider srswor: simple collection may duestudy,toofrandom withexpertsmayrounding toto assign ifpopulations.variability0 in ininstabilized. leads improvedto due duesimple forcollection methodto aFor all cost segregation study, for data example, s. ppswor: the (1)(1)ppswor:simpleforexample,expertsexpertsbewithabe to totoassignForrepeatedprocessvariabilityinstabilized.’s’sinleadsCimproved see In a cost segregationstudy,study, example,sampling.themaybeneeded assign assignifreducingprocessvariabilitythethe’s’sˆleads ’sinleadsimproved study,forfor example, expertsmaymay be needed and D all strategies,was ˆ forced toinone, (2) ppswor: study,fordata randomexpertsmayInbeIn neededCassign ifrepeatedreducingwas forcedthenone, ˆ roundingto tototoimproved probabilitiesHartley-Raocollection costs.neededneededtotoassign if reducingitWe variabilitytoinroundingleads one, toD improved measure leads D to see (2) (1)study, (1962)example,(1962) costs. maybecostallin segregationˆsee0thethereducing until inthethethe the in C toto improved the for selection experts example, assign ,, then and the study, Hartley-Rao (1962) method with segregationrepeated 0reducing variability the the (2)selectionthehighexample,experts costs. bebe needed strategies, if ifˆififreducing variability stabilized. ˆ ’sleads toto improved example, to measure For (e.g., 5, assign allreducing variabilitythenˆthetheˆ ’sforcedleads improved strategies,For all ifreducingif ifˆvariability it it was ˆ ’sleads one, improved then probabilities (2)(2) ppswor:forthe goodsvariability maymethod’s neededassign 15, or strategies, of totals variances. ˆ forced toto improved proportional to a to (1962) classesneeded 7, 15,improved estimates variability in ˆ ppswor:study, example, aexperts classesclasses5,to7,assign if reducing process0until the the thethe example, measure methodbewith capital Hartley-Raodepreciation (e.g., 5, 7, 15,or depreciation to classes 15, For Hartley-Rao (1962) classes(e.g., probabilitiesof probabilitiesgoods totodepreciation in classes(e.g.,leads7,15,15,orifestimates of it to ofand ,variances. ’s leads one, to of selection(MOS). selectionexperts maytobe measure(e.g.,to7,assignestimates of totalsˆ and and variances. leads to proportionaltodepreciation classes(e.g., 5,5,7,which or strategies, totals pp, then was todepreciation be measure capitalforgoodsto depreciation classes aneeded5,to 15,15,oror corresponds variability inand sampling. Rejected goods goodsdepreciation may ˆwith 5, 7, or estimates of totalsˆand0variances. reducing reducing in) variances. to for example, experts the (e.g., (2) study,if capital to depreciation ppswor: goods 5, 7, study, of selection proportional capital or ay be needed to assigncapitalreducing to proportional a capital goods estimates totalsx of sizecapitalgoods probabilities of (e.g., capital ( in Section 5 Generated nal to ixstrategies: and B B units, estimate , and M and i strategies: al and strategies: and select main ˆ ) samples. pp ( Bˆ A: draw a( pp ( x ) ˆpilot of 500 using 0 nine totals. totals. model mod wtd x y to ione sample 0 0 wtd nine ze of xstrategies: strategies:balpp (x( asample of 50 units, xpp ( ppstratand ˆ ),( 12 totals. wtd 1 0 (3. imality i hold: strategies: a ( ˆ samples. (1) heresults (1)of (1) anddraw balsample)main estimateusing ), x ˆ ), ppstrat ppstrat liaries, strategies:andand wtd bal ˆ ) xsamples. bal nine For For For 5 0 heofof A:resultsahold:ppx )mainselect xunits,units,(estimateestimate ppstratppstratx ), 12 12 ˆ ), 0ppstrat ( 1ˆ ),1 fe the anticipatedselectpp pilot ( of) 50 units, a oneanticipateddrawaa( pp ((xxa pilotx50ofresultsppestimate, ppstrat ( xaˆ ,main sample using pp ( x 0 of draw ts hold:estimatorimportant )x ) pp (ofx using of 50x ˆ(1) ,, ,and A: of a a ( wtdpilot(of pilot ˆ estimate e ofisoneA:drawppA: draw)x ) pilot ofˆ)50) samples.units, estimate ),and wtd 5 ( ne anticipated drawA: pp ( xoptimality5050units, estimateandand wtd,, balTable 2, Strategies B and D 0 liaries, oneA: draw a B: draw srswor,x 501 , hold:x4),Population x ) In wtd bal bal 5 5 pilot ˆppswor ) estimate ,select and wtd ,and and A: 1: Number of pilotwtd bal ( x hich estimator , two one of xA: draw B: draw srswor, ppswor of ( 3, /ppstrat units, pp pilot x ) ) , ppstrat x ) , and 3 fewer large 0ˆ srswor 0 ) and 3 units, ppstrat ( x() GREG) anticipated A: draw aappbal xppswor (x 50samples. ( and,, andˆ wtd iaries, iTable B: and srswor, ) pilot ( x ) ,units, estimate and wtd of anticipated 0 3 0 ˆ 2ˆˆ produced ’s Tˆ(1,1 :ˆ x ) , T , (1,1 : x 0) ˆ G) ize the G) estimator and wtd bal (Timessamples. ˆ ) ,T T (1,1 :) x T , and :Tx minimize the main select a mainx(sample ˆusing pp ( ( ˆ ), ), ˆ ˆ ˆ (1,1 : x ˆ (1,1 , ˆ ) ˆ ), ˆ ) ) sults hold: selecta main sampleminimizepp(), ˆanticipatedˆ x(ˆ (x ˆ xˆ), ), srsworsrsworsamples. 0 a sample pp pp ppstrat sample using ( ), )sults selectselect a main sampleusingppx( using ), ppstrat (x x and ( xˆbal anticipated probabilities draw using pp ( x ˆˆ), ppstrat ( ppstrat 0 0 0 T0(1,1 : x 0 0 main main main samples only x (1) ˆ 0 0 d: (1) select (1) he(1) hold: (1)and aselect aa(usingusingppthe ˆ xˆ),pp ppstrat),estimate (x )x,than x, and C. select main draw srswor, ppstrat (1) d:Swensson,selectamainbalthat main samplesx xonly and xestimate), (wtdeach.( Asrswor 2 2 GREG) select estimator sample sample using), ), (ppstrat ppstratin ˆ),), ppswor GR 2 GR GR 0 G B: x ) main samples d: selection(1) sion (GREG) a bal wtd balusing ) samples. pp ( x ˆ) , (ppstrat), wensson, estimator main(sample )srswor, ( ppswor (andand (ppstrat (in xeach.x )wtd 2 wtdRounding in C and D also produced fewer B: ( each. ensson, and ( ˆ xˆ ) x ( ˆ x estimate ults hold: and bal (balˆ ) samples.sample (x only, ppstrat x )x, ppstratinand and (1) ˆ 2 2 D 0 ˆ0 / 2 T ( x T/ ( x x / 2 :, xˆ ): ,x/ 2) T ( x T/ (ˆ x and (regressionxppswor ( x )ppswor), ( x ) ,draw wtdppswor2 ) 1 1 0 1 and wtd ˆ draw ) and 2 ˆ he anticipatedwtd bal( ( x srswor, ˆ ˆ ) 0 0 riance and general x) samples. the and dheestimator wtd B: and (wtdˆ)bal : x x (GREG) estimator 2 , D : B: )Dppswor ppswor ( 2 were at least 0twice as wtd largeT (ˆxppstrat: x ) , 3 ( x ˆ , ,ˆ ˆ : 0 ) T x x , ated model-based balx( C: ˆM ()samples. samples M ( x andx estimate Dsrswor, ˆ ppswor pated of and wtd balwtdxˆ) samples. rounding to ’s , x ˆ , and atedanticipated andbalandx xstrategy mainsamples. ˆonlythe nearest estimateppstrat Swensson, wtd bal bal (1samples. ategy mal pated Swensson,andstrategybal xrounding ˆ samples/ only and one-half. in’s. There x ) , ppstrat ( x ) , and many 0 ( () x ) samples. rndal, and wtdC: strategysamples. ) main to the nearest x pated Design C: drawbal,1 () rounding ˆ to the nearest one-half. large ppstrat 3 3 x model-based A, A,ppswor ( x ) , , and wtd x ) ,ppstrat model-based srswor,( ppsworA, samples only and ppstrat ( one-half. in each. each. 0 0 0 0 0 wtd bal ˆ / 2 ˆ /2 ˆ ˆ 0 EG)proportional balsrswor,(Särndal, ,xSwensson,x )and , and balin each. estimator 3 0 0 0 srswor, )x,)ppstrat estimate wtd x ) main nsson,B:andB:that toB: x x )ppsworx ) ( ppstrat ( ( ( x )x, )and wtd ( and wtd samples only and estimate in each. re anticipated draw ator ˆ ( e balance that mator B: draw srswor, ppswor ( xxx, ppstrat ( ,pilot and wtd when ˆ / 2ˆ ˆ ˆ ˆ ppswor , ppstrat and atorestimator hted matorestimatordrawsrswor, strategyB,(ppswor ( xˆˆ (to then=50 wtdone-half. using model (3.1) versus model (3.2). EG)optimal B: iC: ppswor for B:draw B: D: main draw srswor, ) ) ,ppstrat (ppstrat( x and wtd mator draw ˆ d balance B: model-based strategy ( A, roundingx ) )the x )) ,and one-half.wtd , 2 srswor, ppswor roundingto , ppstratto ( xwtd n=50 dimal balance draw B: strategyC: strategy),A, rounding)x ˆ, nearest )one-half. bal bal 2 2 that , x x x ) :, x ): x )mode G) model-basedsrswor, pilot B, (ppsworppstrat tox the ,nearest, , andcomParing 1 1 he Stimate H T0 ( x TGR (T:GR ( x x are forare are 1 0 ˆ GR one-half. wtd wensson,1992,xsec. )D: drawxsrswor,rounding estimate in each.the)nearest one-half. StrategieS to e0 0 a meaSure of 0 eteroScedaSticity and (52 A,B, only estimate nearest each. dand sec.bal ( )(main strategymainrounding (only and nearest171 in each. wtd 2 model-based (that)xD:main samplessamplesestimate nearesteach.wtd bal rounding ˆ to the nearest one-half. 0 and bal ppsworxC:balsamplesmainonlyandˆ to to the estimate strategy A, (bal ( x 12.2). samples optimalandestimate estimate in each. andand ˆestimate in one-half. strategy)) only samples only and in ineach. rounding model-based in each. the and retman4.2.1). )that x only C: 000, 4.2.1). 4.2.1). main samples only and 159 67 and wensson, balbalancemain( (2) The only rounding ˆ to the in each. one-half. and ghted balance ( xx)bal D: samples B, and estimate and and of sec. 0, sec. Note 1, that true A an that the is bal )main x samples main ) D: , weighted ensson, bal NoteTablethe true the not ava and In that Notestrategiestrueis n B, rounding to the balance that model-based a certain56strategysamples only164that ˆ nearestin each. Estimation of Totals mple have bal ( typemain ˆ the nearest estimate nearest one-half. nearest dlsed willC:4.2.1).A, strategy B,A,strategy the ˆtoto the one-half. In Table Table 1, strategies nearest numbers are the are the A ased cancanstrategyBystrategy ˆ rounding balance one-half.for strategiesB,Inrounding 1,ˆ strategies andone-half. numbers the estimators computed us rounding weighted ˆ to based 2000, strategydefinition, A, roundingnearest used for strategies 3.5:A strategies to and A and B’s B’s numbers are 2000, sec.C: sec.ByC: strategyofthereto tothesrswor used for strategies A 1, A ample C:C: strategy A, roundingˆ ˆˆisto is ˆˆ nearestone-half.one-half. ased can ppstrat D: definition, therethe the nearest one-half.199 A, rounding to 56 no etmodel-basedstrategy A,rounding ˆtherenearestsrswor nearestD: strategy used based C: strategyByC:A,rounding rounding srswor theone-half.one-half.Inandand to the the nearest pleal.4.2.1).C:strategyA, definition, roundingno toone-half. one-half. A and Table three the AB’s estimators for totals: number ofcomputed computed be be 4.2.1). rounding to no nearest nearest one-half. We consider is negative ˆ ’s. ple be estimators estimators using Fo model-based the d balancewtd balx C: strategy A, al. 2000, sec. 167 nearest one-half. that of the sec. tthat D: strategy iB, (Valliant etˆ to the nearest one-half. 181number ofnumber of ˆ ’s. kindsC’s. For C D, D, numbers sothat D: on depends strategyC.strategy B ˆ ˆ the ˆ 4.2.1). to assuming number ethat ˆ that d that sample strategyB,Also, and roundingnearest to nearest balancecan canD:be60roundingD59D thenearest one-half.for that C. Also,rounding ˆDtoto thenearestone-half. negative negative’s. and D, the correspond assuming ebalance D:D: xibe D:roundingandandtocorrespond theone-half. strategies1A forof and (HT) ˆ For C best numbers standard standard the the other rounding oportional-to-strategyB,B,definition,roundingno to the one-half.one-half.strategies Anegative For estimator,andand the the numbers for thewhere choices. D: strategy strategy B, there the nearestone-half.used forHorvitz-Thompson B, rounding ˆˆ to is nearest srswor 1 for C. B, rounding sample x By B srswor the correspond no assuming nced rtional-to- x strategy Also,(18) definition, to to By linear unbiased standard for for small cho rtional-to- D: other other po to include (28) ˆto the be used (122) 1 for and 0, sec. that D: 157 B B, is no strategy .e can ppswor definition, B, roundingthere is(98) for243one-half. include is 2.1).optimal,ii By selecting theB 134sample ˆ263usednearestbal includeandincludeno srswor used for ˆpositive ’s were rounded cases 4.2.1). 4.2.1). weighted Also,theppswor,srsworand and tostrategies Apredictors 1 cases where smallpositive Aˆwere estimators down n sec. 4.2.1). By samples, there (BLUP), positive ’s regressionwere and rounded 2.1). 0,sec. be selecting ppswor, ppstrat,cancorrespondbal samples, ppswor, ppstrat, wtd bal (114) which C. the Also, correspond 275 definition,cases small small positive ˆ 4.2.1). 4.2.1). forselecting(20)there and(25)ppstrat, and wtd assuming which forcases where smallstrategies’srounded roundedto zero.to The number roportional-to- xiselecting balanced BDand256 (83) bal samples, which wherefor ,.1). and assumingto assuming ility-proportional-to- 129 B C.ppswor, ppstrat,D to wtd wtd samples, for 1 cases where and general were xi the ppstrat Also, 150 C down C. thereprobability-proportional-toand srswor D onal-to- Bybe whichdoes notsrswor correspond strategies Abut BAanda to numbers by 1 for eple becanxi definition,definition,the ppswor,used forused ’s, andand 1 (GREG). The zero.estimator is given parentheses arethe Variance EstimationEstima definition,By there is no srswor ppstrat, and wtdbut and be down The is match be By definition, does not there is oursrswor xi for A and proximatedbe bydoes136 (24)nono srswor usedfor strategiesA Adown down be population 267 and n to By Bywtddefinition,thereis is matchusedusedfor ppstrat, andAlso, willbea D which 3.6: the number of numbers in definition,not selecting is no srswor strategiesbal and thereis match used strategies A A wtd be canoptimalBya selectingnothere thenofor for used ’s, strategies A and will not match142 (24) population for strategies population mplebeoptimal baldoesdefinition,thereournoppswor, (63) forC.butbut willsamples,correspond The assuming parentheses are are the ofVariance Estimation which will nuse can Bydefinition,thereno isnosrsworoursrswor strategies’s,(105) baltoand a zero. The numbers in in parentheses number 3.6: Variance’s. The for negative ˆ use optimal selectingthereppswor,ourcorrespondusedbal samples, whichzero. to HT plebeuseBybe B By definition, andsrsworused252 strategiesstrategies Abeand to zero. numbers in parentheses are the 3.6: isto populationto assuming samples,for D rtional-to- Also,pp(Breasonablecorrespondassuming to assumingbutthe1ppswor,appstrat, andˆ’s.rounding used for Cused forandand HTthe the HT estimator andAlso,D B correspond to to and wtd D 11 C.C.xAlso,C. and D B advance correspond sampling for 1 and , estimator, the varian xi x consider reasonable advancematchassumingsamplingforin in ofmany be a i Also, 111 C.C.Also, reasonablenotcorrespondtotoassumingassumingnumberbutnumbernegative ˆ wtd The samples, used for and B does B andadvance to assuming ’s, o-ixxC. denoteda C. Band theD advance correspondforpopulation forin manynegative ˆnegative ’s. s yroundingwhich for For D leads to mple,useconsiderreasonable Dcorrespondchoice assuming 1sampling’s,1number ofnegative The ˆ TheThei rounding and CC C theFor For HT estimator, the ). C.optimal Also, correspond ppstrat, population n=500for manybe C. and does match choiceassuming for for manywill of ’s. T ’s. bal rounding used andn=500 choice to sampling in D our n=100 onsi weAlso,iAlso,B BxAlso, D correspond choice for for selecting will for not for i and o-wei consider does not match our population ortional-to- xuse optimal n=100 i scentive to we o- i D ppstrat,our wtd bal samples,be for ex to D leads to fewer negative est tional-to- xi ’s, selecting the ppswor, and the and samples, in C not selecting oftentheppswor,incentive andwtd samples, balwhich andD to our population ppswor, useThereselectingtheapopulations. considerbalbal samples,sampling matchfewer many negative ’s, selection forthanAi.and andbut rounding T ) nn N / Nn optimal selecting populations. We and wtduse bal samples,forwill D which roundingbut which in ato selecting thereasonable considerwtd the roundingCwhich to see ppswor, We ppstrat, bal estimates B, is huge We consider wtd 68 wtd does D selecting selecting choice to probability of but will Abeina st segregationtheconsider 8ppstrat, We0considerbal samples,insamples,which seeto negative estimates than inthanand A and B, var roundingˆ )n / N not noffe segregation populations. reasonableppstrat, optimal whichandleadsDsee i ppswor, ppstrat, and for ˆ ( (ˆ 1 srswor populations. ppstrat, and wtd rounding which which tions segregationthe ppswor,matchadvancewtd and choiceinsamples, where leads fewer negativeestimates than unit B, A but B,T )var01T 0does 1/ consider theppswor,ppstrat, to the rounding in butand D D leads tcost we selectingselecting theppswor,andpopulation samples,CC and to toD in fewer ppstrat, var pplications wereasonable theppstrat, andadvance wtd bal whichsampling manyis the fewer negative estimates in in i s ’s, ato a but rounding does not offer overall improvement. 0 ( be does ppswor,variability in inforbut ˆwill leads ato many a notourapplications we thebalˆwill leadsbe in improved our ’s, ’s, thebut ’sbe 3 bewill improved choice for sampling in many doesdoes not ifmatch our variability in consider thereasonablein be but does not offer overall overall improvement. not match populations.variability consider will tobut we considerestimators in16 our variabilityin the93 ˆ butleadsbut improvedto seedoesnot does not offerimprovement. Strategies B and D if reducing population does seedoes our populations. neededoptimalnotififreducingourpopulationWe thethe ’s willrounding rounding D to seeform of the BLUP estimator is toppswor match populationchoice ’s,’s,but leadsbebeC aand C and general assign reducingmatch We consider’s, but ’s, be a will does not match advance our not match match population but sampling to improveda n 1 n usetoadoes segregation theourpopulationpopulationrounding5in awill advancerounding offer overall improvement. ’s ’s, but will be The mples cost Da ded to and not ls.cost assign does not 0 aimal assign not ded In segregation reducing population ’s, will does mal ptimal optimal populations. match our population B and D produced fe mal optimal use 5, 7, 15, reasonable advance choice for in C and in see ptimalconsider oradvance costs. considersampling insampling D to Strategies B and Drounding in C and D to see ’s thanStrategiesThis expressionexpr Bto 7, reasonable advance We totalscostthe rounding ˆinpopulations.manyconsiderB and D produced fewer ˆnegative ˆ ’sThis A This and segregation assign estimates choice variances. e.g., sneeded15, data estimates totals and for variances. manymany for for uewe7, neededestimates11of totalsand and segregation sampling in improvedproduced fewer negative ˆ ’s than A and A since B variance expressi a negative ˆ if Strategies theˆ produced fewer x r,,usereasonable collectionof if totalsvariabilitysampling ’s leads leadsWe B reasonable advance choice for sampling advance reducing choice 81 choice for in in many to C variance variance as ider 15, or ’s nsiderconsider toestimates choiceInand variances.thein the manyinto manyand D produced fewer negative , than A than C since B and D use 100 a ay5, reasonablereasonable of choice choice sampling in many advance advance in sampling in 5ˆ Strategies to choice variability iderbehighppstratreasonable variability variances.for samplingand sider assignor if assignreducing s 5, we T and nsiderforreasonable expertsof advance0 for thesampling inifseemanyD many improvedthe ˆ ’s s yi for dwe consider bal WeWe12 may be the roundingˆin Cleads C see to 5, reasonable advance -consuming15, 7, 15, consider of the0rounding 92C and into see sampling, but st segregationandpopulations. Wetotalsroundinginandand tointotoand C to see since B and 100and leads units, 500 opposedopposed samples but the fini i s udy,populations.orreducing considerconsider theCroundingD Dreducing since B C D D use anduse use 100 ito iimproved opposed sampling, uses but uses example, Weconsider therounding in ’s and C improved C to assign populations. We consider rounding inandinC D D to and to see C C and 3to see variability in nsuming populations.estimates the theneeded variances.andDDtosee D andand and useB100 D 100 andand as opposed to pilot sampling, uses th and populations. We consider the rounding nst segregation wtd (e.g., 7, nsumingpopulations. or consider of and and tion 500 as units, lasses populations. We gation (e.g., 5, populations. We consider the rounding in C see to see since D and 500 units, units, as as to pilot samples of size 50 in A tion populations. We estimates totals ation gationgoods to estimates ofconsider the roundinginvariances. and improved tpital assign variability 5, that yields if Estimationin 0 adjustment 1 / N to ap dedthat if if yields3.4:variability inand(e.g., ’s 15815, improved to of totals variances. 1 n N zesegregation Estimationvariability ’s ’sleads ’s leads to D where samples of sizesize 50 in A and Also, depending the ndedif 15, ifreducing reducingclassesthe ˆin ’sleads ˆˆortoimprovedpilotto pilotx i ˆofsamplesprediction C.C. Also, dependingadjustmentadjustmentn approxim estimates e-consuming 3.4:variability the the 5, the to to improved e 7, yields 43 30 sign to if or reducingreducing of in the in 7, (40)’s leads reducing variability variability improved samples size 50 in and besizetoreducingdepreciationtotalsofinthe ’sˆinˆ’stheleadsto toimprovedimproved pilotvariances. in 50 andforAlso, using the depending strategy,/ there 1 n / N le time-consuming and (2) variability ˆ the ˆ ’s assign thatsrsworifif reducing invariability ˆleadsleadsto leads (0) improved andofis the50 AA in A andyC. depending working strategy, to were at t to of C. Also, if reducing Estimation of the reducing variability in leads signto assign and 3.4: Estimation ofˆ size on ssign assign i improved on the assign 15, or to uming and , oror size thatestimatesand variances. 5,estimates of of totals Estimation 0 7, ppswor To estimate 9-year).estimatestotals 67 (2) be time-consuming(52) Assessmentsand variances. of sampling. sampling. sampling Since Since the sam r,edsize that yields yieldscan3.4:,variances. Roshwalb (1987), (1) we iteratively there werethere were at least three timesmany of Estimation of Roshwalb 43 we the strategy, e oror7,estimates ofof 3.4: and totals, and variances. and (1987), iteratively and strategy, atwere atpopulation as many not in the sampling. the Since the ,sample 15, on the 15,or estimatesestimatesandtotals and Roshwalb following on many on model strategy, were in thethree times times many negative ˆ ’s using model estimatesofestimate ,variances. totals and 5, estimate .,5, 15, or Tototals of variances. 179 15,5,7,yields ortheTooftotalsof variances. variances. (1987), we iteratively the there there at leastleast three that are as bias is biasbiasus-negligible (3.2 set of units least three times as as negative is negligible (Wolte ’s is D totals and following that and (Wo pensive; ppstratestimates the sample0 variances.(50) so, 3.4: Estimation offollowing 191 negligible (Wolter 1985, dnsuming estimatessmaller (2) totals and size that yields Estimationof of 53 given working 23 (1987),iteratively using gand and negative ˆnegative ’s using model (3.2) using (3.1). (3.1). versus using and and onsuming and fitToTo estimate , ,following Roshwalb 3.4:(0)the logwe’sthe ˆ ’smodel model)(3.2) versus(3.1). (3.1). and fit a givenTo estimate model and regressednegative the iteratively by (3.2)s versus using estimated using the fitgiven working model and Roshwalbthe (1987), the using ˆmodeli (3.2) and ˆ is using a (1987), we of gsired precision, and estimate , following a Estimation of model and regressed the log sample ing regressed versus nsumingyieldsbalthe better. working0following (52) Roshwalb log of we (denoted ze 3.4: Estimation of of (2) sze that 3.4: Estimation59 of , following Roshwalb (1987), we iterativelyfollowing Roshwalb (1987), we iteratively 3.4: wtd Estimation 184 34 (0) elds 3.4: Estimation of residuals on log( x ) as follows: estimate , To3.4: Estimation of estimate yields 3.4: eldsthat 3.4: / 4 Population ields yields, yields 3.4:squaredfit a givenlog(xStrategies B and regressed the the the ’s In working ) as and regressed 1 3 e consider Estimation residuals on 2, modelfollows: To fit log of squared of we that yieldsEstimation of givenofTableworking follows:and D produced fewer large ˆ s ). For example, following Valliant et consider 3.4:foursquaredaresiduals on log( xRoshwalb (1987), we the sample units four fit a Estimation, following ) as model the consider four iteratively fit a given working model and regressed the log of ( i model To To estimateofgiven workingRoshwalbon )we we(1987), we iteratively model and regressed the Strategies and D produced fewer large ˆ ’s estimate To estimate Roshwalb 2(1987), log(follows:a ˆlogiteratively To estimate, following residuals log( rC.on regressediterativelyof working Toestimate estimate thanRoshwalb)(1987),weiteratively 1 the (2000) notation, estimate the followingRoshwalb(1987), iterativelyand , followingresiduals2 (1987), )Times iteratively ,, ,following Roshwalb Roshwalbxlog(fitfollows: followingfollowing) Roshwalb weas C , given squared , , Roshwalb andx as (1987),) we iteratively on ToconsiderNumber of Timesresiduals(1987),wexiteratively D al.’s produced fewer logToestimatesquaredlog(and3 i,Number ofweiteratively squared Tablei21: (1987),Population In Table BLUP using the B and D produced fewer 2, Strategies correct gns: considerestimatefourfollowingfollowinglog(Roundingi )in ifollows: , also 3 / 4 Population the In Table 2, log of the B model is A r ) 2: To four , x :weSample Designs To ation, we Table 2: a given working ˆ model and log( xi ) squared residuals 2 onlog( i as follows: ,, fit working model ˆ r regressed log( log fit fitfit given, xworking workingand )regressedlog log the the as many large given squaredx residuals and andx There regressedofof the working )modellarge regressedand the of of nsider afita a xgiventhe given model and’s.regressedthe logleastthelog of the large as than A A and Rounding in C and D also produced fewer four given working model log( regressed four atlog thetwice /2 a the in a repeated the process ) the 2the log of x consider ˆ )the or eachfitfit aa givenworking model andmodel2untilwere log( x ofthelog ,of theon log( x ) ˆ ’sfollows: and C. C. Rounding in C and D alsoprounit ( igiven and population, we log( r log( rregressed )thethe) fit given working 2 than M igns: ple designs: residualsrepeated xthe process untilasˆfollows: the log of the fit a working model model i and regressed i log( i and repeated ) )on and regressed stabilized. ˆ and :on residualsprocess until ˆ stabilized. , log( x squared on log( ) ras) follows: i , 2 T (1,1 : x ) considersquared residualslog( theasxfollows: ) ethod squared residuals onon log(log(asfollows: stabilized. i i s ithout squared squared on log(xxxonlog( model M/x2 , roursquared pilot residualsresidualsDesign ) as log(ˆ(i1)1versus odfour withfour residuals residualsdesigns: x x )as follows:, : x ) model (3.2). (log(2duced fewer large, yˆi . Therexwere atat least twice as many large ˆ ’s with four replacement (wor)For log(strategies, if ˆ follows: itˆ was forced toM x / ri x) : x ) log( xi ) s ’s. Therei were least twice as many log( )as log( follows: dfour squared squared sample x)on using process then with four whenas follows: 0 stabilized. , pling. squared residualson strategies,follows:untilx, until : x stabilized.one, Strategy ,1 : the iprocess , ( (3.1)x n=50 repeated x )) asifthe 0M then ,it was forced to four considerfour and Mall one, (1 22 log( until , stabilized. Strategymeasure randomalland repeated 2 )ˆ 0 , then it was forced to one, ˆ Design For all2 strategies, r ˆ ˆ log( x ) , ) For sampling. if 2 nalato : a1 , )onsider to measure log( r ) ri2 r) ) log( i log( xi ) log(2xi ) methodawith3 / with repeatedlog(In )Table2log( x x) ),), ,pilotxiand D produced fewer ( X suntil ’s whenssusing model nx2 matrix with where to srswor: simple V -1 y , log( r ) 962)measure Populationi theirprocess r2,)log( xi, , log( n=50 repeated the pilotlarge-1 ˆ 1 X swhens using is an (3.1) versus model (3.2). B 7with method 4and 171log( ralli2strategies,Strategies thenandwas forced toprocess Vss X sˆ) stabilized. X s model(3.1) versus model (3.2). 159 : For in=50 ito log( i ˆ Totals i ) ,it if pilot For all strategies, log( ) , , n=50 which log(ithe ) Estimation ippxstabilized.then Rejected correspondsi) until stabilized. to Rejected n=50 ) ppswor: to a Hartley-Raolog(3.5: methodlog((xiˆ)x(0 ,xˆxi ) 0sampling.was forced to one, theand repeated (1962)log( rtoˆuntil(of pilot sampling. itRejected one, with if sampling. onalandand repeatedFor all strategies,stabilized. then it was forced to one, arepeated which corresponds stabilized. ) measurewhich corresponds ) and process the process pp oportional repeatedrepeated until ˆ ˆ C. ,Rounding in C For D also (1, x if, V ˆ 6a to and repeatedthe process untilAifandstabilized. 164 measure process process pp ˆ52 the 199 until until 159 the process ppswor 0measure strategies, ) 171 2—Number ofyTimes ˆ and s of repeated Table pulation and ofand theprocess until ˆ ˆstabilized. i the73 od measure repeated alternativesthanincludedtothreestabilized. 670 , all rows produced ˆfewerdiag ( xit )was forced is the n-vector Population with by 73 probabilities /and the process until ˆ stabilized. ) of and implying ation by and repeatedwhich theincluded stabilized. ppˆ 21 ) sampling.forRejectedss0 , then3.5:, Estimationto one, 3 repeated the process ˆa forcing stabilized. x ation by andppsworselection whichprocess tountil ˆ21to kinds( sampling.to Rejectedi Table 2—Number of Times Totals, , 22 Population consider correspondsforced wasˆ , of ˆ 9 with For all strategies,allproportional until 0 ,pp ( xitat one,, estimators many totals: the 181 if ˆ Weppstrat was forcing ˆ 0 56 implying 2 alternatives correspondsit There 56 to to0forced as164 od with 167( xwhich :For) A strategies, ’s.pp forcingforced least twice implying large199’s alternatives if ifˆ , 0tothen ( was sampling. one, sample data. then one, ˆ were 0ˆ 51 , MForall x strategies, largethenthenit was forcedwhich Rejected , strategies, strategies,,thenˆ it 022then it was one, corresponds xcorrespondsˆ 000 if then was forcedtoone, dwith equalFor allallstrategies,ififincludedifthenititx,wasforcedto toone, best linear to ppˆ( x ) sampling. Rejected 3 , trata ) was ppstrat strategies, ifˆ For a the with opulation equal byForFor61 ato a with (MOS). all For all Horvitz-Thompson (HT)was samples, implying ureA equal strategies, nto ofmeasure Forall homoscedasticity,,0 or ˆincluded forced to one,bothone, of unbiased asure measureForhomoscedasticity, ˆ orbal dropping these18ˆ59 0 , ˆ toone, consider three kinds of estimators for totals: the ure a size by forced all all strategies, , wasforced 167 (28)by 263 (98) homoscedasticity,wtd if suremeasure alternatives or , included 0it28these estimator, to 0to 181 asurepopulationbal243 (122)strategies,ifdroppingthen ititversus model (3.2). implyingform of We GREG estimator is onppstrat: wtd are formed when63 droppingthenRejected forcedboth selected corresponds 81alternatives x modelforcing samples, 24 the )o a whichfrom alternatives to inincluded to) forcingxtheseforcing implyingincluded pilot corresponds to population by 60 sampling. n=50 corresponds() pp sampling. alternatives , of M (x /2 :x ected fromequal which(114) pp (ppxusing (BLUP),showssamples, regression estimators Designˆ 0 , M implying ( (1,1 : x (HT) estimator,, xbest )linear unbiased ˆsampling. Rejected strata from (83)correspondsaretothepppp (xxsampling.(3.1) Rejected Rejected generalforcing Horvitz-Thompson ) ected with stratacorresponds topredictors)xTable 1x)) and Rejected both of which rming strata withwhich arecorresponds tox ) ppsampling. Rejected of 263 The Strategy equal which which corresponds Tablesampling. these, samples,these Rejected (25) equal one,275 are unrealistic.((Tablesampling.sampling. Rejected(98) 243 (122) 256 which correspondshomoscedasticity,1 or (18) the general Rejected of which unrealistic. to ppsampling.the0 these samples, unrealistic. the number of these which which corresponds( toor ) droppingsampling. of these both of to ppswor ) 1 (shows pp which which forming pp with equal 157 )dropping6 (28) to strata Rejected 2 homoscedasticity, number with by which corresponds (GREG). The HTxestimatornumber by both Strategy Design ( shows ˆ 134,given ation ˆ ppswor 46 9 cumulating an MOS(105) and M ( x /regression 7 unit is selected alternatives forcing 159 homoscedasticity,included ˆ forcing implying both of , (1, : x pilot , x gi yi pilot1n=50 s selected from from17139 is (there were(83) dropping TGR predictorsM(BLUP),) and general n=50 : x ) estimators e 0 droppingˆ ˆthese samples,homosce(24) from (63) occurrences for included 3 4129 00showsimplying 256 less 267included or the i s , population 04 (20) lation by alternativesoccurrencesunrealistic. 4forcing,0implying(25) numberor 275 included ppstrat of ˆ C included forcing (there implying the (there were tion 252 alternatives ctedalternatives occurrencesare unitforcing3 //Table Totals implying were lessthese these(114)these samples, both 73 included areforcing /population 150 , number less alternatives included unrealistic. 0 shows the of ng.Cbyalternatives included for forforcing 3 ˆpopulationhomoscedasticity, aPpswor alternatives 19932whichthe forcingforcing , T, implying implying with alternatives 8 size. srswor included the forcing ofˆ Table ˆ implying of selected 8 1 6Ppsworalternatives whichincluded36 164 21 00 ualwith equal An alternativesone 3.5: Estimation from 0,ˆ, ˆ1implying implying g ofTable(105) ppswor unitpilot HT estimator 21 pilot n=50 qual Ppsworppstrat n=500dropping isthese 1 shows the numberare,both (63) (GREG). 73 (Särndal The n=50 unrealistic. the al total equal ual homoscedasticity, 5 droppingTable samples,kindswhichpopulation). were less tathat satisfy dasticity, ororordropping strategy/ 4(24)bothpopulationwherefori is 267 1 shows the i homoscedasticity,each these 136 these the of,2 estimators 500 homoscedasticity,arecases dropping droppingforpopulationy(24)population). totals: “g-weight” fornumber of theseet. al, is given by 18 n=100 bal than 27occurrences for thesamples,4 samples,i these i of of s qual equalwtdwhich occurrences fororthesestrategy forbothbothwhich252of less the the ppstrat wtdeach thesesamples, both population). bal samples, / both iofof unrealistic. 142 (there (there of dropping these samples, both of 3 ding. from Ppswor 181 or for consider threethese samples, both than 5 cases We or these 3 for the the ected homoscedasticity, 61 51 ˆ 22 each stratum. homoscedasticity, 32 droppingsamples, 102 2 9sampling. 167 than 5 or at satisfyhomoscedasticity, cases or 3 / these samples, (there of bothare homoscedasticity, homoscedasticity, at satisfy homoscedasticity, for droppingstrategy14these samples, ofof were mwithPpswor occurrencesororfor each droppingshows the both were less of i. 3A 4 population (there were lessT om which68 unrealistic. Tabledropping is4the probability number these ppswor 73 73 21 i , 21 ofunit m m lected which y unrealistic. Table population of these om from are are unrealistic. sampling. Ppswornumber ofof selectionn=100 / forthe the ˆ (HT) i 28i s ected sample arewhich(122)unrealistic.each thenumber theseit wasof for to 1 Ppsworsatisfy areAlso, 3are5thanthestrategies,then=100for thenthese 2 occur-population). which unrealistic. Tableshows 1the numberitnumber Table 81 63 24 tion fromweighted than Table1 n=500 if each numberoccurrences1992). sample whichn=100 all5 1 shows the shows theof these forced ) and sample satisfyunrealistic.strategies, if ˆnumber ,of estimator, population). unbiased wtd 263 balanced where ishows 3 number ofthese Also, n andandwhichunrealistic. forTableshows for strategy3thenn=500theofbest2to to A n=500 wtd bal are (28) which areunrealistic.cases1casesTableif1shows theofn=500forced linear ed that bal: (98) are243 for all Tablefor 1Table 4n=100 thenthethese forcedless which which are unrealistic. shows ˆ the,,strategynumber these Table shows population for was for forHorvitz-Thompson the of2 was 3 selected that thanunrealistic.foreach11shows1thenumber it(there werethese ppstrat 61 51 working 22 18 Also, casesall /strategies, 3 / (there were lessthese 93 population occurrences occurrences 3the population (there the (there regression the two is selection for unit i. where i with probability of 9 satisfy256 (83) for for the 3for 3the4strategy for8 (there than 0 lessestimator occurrences 5 for the 6 population). general BLUP 3strategies, 2 ˆ(there were it population Ppsworoccurrences Also, three4tosrswor unreasonablywere population). These estimators the samples srswor for thefor 3 /3440 population(there ˆ large lesswerefor 2 an MOS satisfy 4/ population of / /avoid (BLUP), and general’s. occurrences equalthe occurrences theThepopulation (therewere5 less was (25) occurrences275 5theare selectedpopulationformif , were ˆlesswere less each strategy ppswor ation andoccurrencesforthree topredictorsstrategies,(therewerelesscases Table is estimatorsforcombined2 39 thethe 46 sampleequal 7 Also,avoid that / 4 populationthen ’s.ˆ was than2 to rences for 0ˆ, ’s. Tableless and occurrences to avoid unreasonably 3largewerelessTable68 rpopulationusingsample(114) forall/for4all 33 / 4if population(therethen itforcedforced C 3 wtd bal 81 63 28 to Ppswor rPpswor 581 conditions5on the population and HT16 the it was population). and true5value of large is 32 8 24 estimates 36 of lead8 to than for each strategy strategy ˆestimator 3 2given each ndparticular ofequal 5(105) for ppswor ˆ the 2Tpopulation).0 for models sample 93 at satisfythan cases for each strategy unreasonably2population). iby to 2 than of cases for cases strategy the the ,for2 2population). for three strategy for for than (24) samples 5 casesfor cases strategy for thesamplepopulation).i s all ,strategies, if ˆ 3 ,ppstrat The 39 252 ppswor267forall then andgeneral form of the BLUP estimator is to ,000 thancasescaseseacheach (GREG).iffor the3 2for 2population). ’s.ˆ Table 0 y samples of 5Also,contains strategies, forThethese thenpopulation).xAlso, each three to avoid than for 7strategy eachstrategy unreasonably forced ˆ the yi0 46 9 6 0 satisfythan5 cases equaleachfornumberforunreasonablysAlso,population). 2 than contains Bequal to avoidof the 11 occurrences for ˆ ’s. hatsamples92555 casesfor cases thestrategy forthe occurrences 22 population). 2Table 25 ppswor it was forced 32 satisfy (63) than 55 each for each ofstrategyfor 2thepopulation)., the ˆ 81 0 y B wtd bal 27 14 three ppstrat 3 the number ˆ i it for ˆ for 2 0 nt and, sampleofallcontains 12all number of, these was large largeitoTable 22 T ˆ 10 8 ˆ ˆ then forced large for , plemomentsppstratall strategies, strategies, these it ,was forced the 500ˆ sample samplesstrategies,avoid ˆ0 333,ifthen occurrences0’s.y tothe equalto n=100 all strategies, if 3 ppstrat T s it 3 was forced mple ppstratforx Also,of theif strategies,,ˆ if then itit was forced three ˆto to ) Also, ˆ32 the 158for for. allthree all if ˆ plexthe theAlso, forallallstrategies,ifif ˆ ˆ xthenthen13it wasforcedfor to nine avoid unreasonably large n=100 have2 ˆ i s y: x 8, i then was equal forced 1,000 1,000 equalpopulation ˆ(there 3wereˆthen was these toy for ˆ the samples foralln=500 mpleAlso,Also,foriAlso,strategies,ififwtd,unreasonably,prediction wastoiforced92totals.working model (3.1),’s.weTable 36 T (1,1 n=100i s xi n=500 n,,and ppstratforofstrategies,to where ˆ bal , then it it forcedˆito0 cases usingthe C For ppstrat mple sample drew Also, n=500 Also, population strategies,i 3 , of ˆthese occurrences forced for 2 itwas was for 30strategies, (0) and 3 to 312 thenforced i of(40) equal for all drew wereif is the , of it 0casesfor thethe wtd designs, 0 or eachequal theAlso,contains(therenumber noneprobability occurrences for the ˆ these ppstrat we the of is thenone 0electionof68, wtdpopulation contains werenumber3’s.ˆ thenTable’s. contains2unit ofˆ 2 occurrences for 32 avoid ˆppstrat , equal three avoid unreasonably noneˆ of thesecontains for 2 thewtd bal to three towherethe theselargeof these of20the ˆ T the ssamples equal containsavoidnumber 1,000 samples ˆ’s. Table22 2Table : ction andand three to to(1)unreasonably iunreasonablyˆtheseTable 2 forˆ number occurrences cases Table xequal xwtd three to 3the 4 unreasonablylarge (2)ˆ large Table MOSˆ is179three three43avoid(theresrsworlarge largeˆ ofTableforselection (1,1(40)) i. T these :wtd) bal T x27 the ˆtheThe estimators y using the working 7 0 10 ction andequalthree avoid35population). unreasonably’s.’s. Table’s.22the158 2x ˆ , 30 (0) x srswor ˆ (1,1 :is ) ˆ prediction 214 (52) to to / avoid unreasonably large avoidpopulation). set large ’s. theTablecases that are not in the , where unreasonably large in ’s. ˆˆˆ 2 and GR i ˆ x . 0 / populationandwereofwere largethese these casesthe GR (1,1 43 equal equal population). avoid for to model units the ppstratof three populationunreasonably form large of srswor 3 three number the occurrences ’s. estimator 00ofstratification”containsthe tounreasonably ppstratnone BLUPTable 2 0ep samples (50) equal3 /54avoidtoavoid unreasonably the the 0population for isfor the 93 wtd When4 thethese is 0 general 0 f0 ofof) 191and the number2 (there (thereofˆ none none of ˆ cases the the 2 n=500 n=100 i n=500 The s( and selection and 23three these ppswor ,these67 (2) of ppswor n=100 7 0 2 0 les of 500 units. number of MOS werex(there of for the ˆfor2 for 179 (52) sˆselectioncontains number of these occurrencesthe theˆ 0 2 (there stratification” (0) occurrences for population es xof contains the wtd / 00samples of the the number ofpopulation). samples of stratification” population 3of ofthese occurrences these population ˆ 2 occurrences 1 les the ppstrat contains 2the 3these occurrences for 43 ˆ ˆ / these none ˆ 0sign approximatesthe number ofnumberoccurrences for sthe 02ˆforˆ22is ˆestimated: using (1) x of2 , model cases of 2 ,the x the and contains the number thesepopulation). these occurrences ˆ , on containsppswor optimal pp(of 4 0 ofthese for for ) wtd that ˆ 0tratandcontains contains the4numberselection andˆ wtd the yand 2thex T ˆ,x 2/2, x were) ,B thesrsworˆ : x ˆ ) and T (for/units: in ) , population 0 0 are not in the 5 D ˆ contains 12 12 tstrat 184 (52) the34number sample ofoccurrences ifor the for the ˆ ( )these cases by the T( x x0 trat“deep 81 ˆ ,D 0 pstrat ppstrat(there were none x /these(denotedoccurrencesi cases population). x 23 (0) ppstrat , 7 set x eepthe population 3 / 4(0) were none of none Tof these the0 / 4 popu-(50) stratification” (there GR 53 pstrat ppstrat stratification” population).of ofthese casesi forthe were none 0 cases ,to population population were none these theseof casesfor 3 the for the the populationpopulation were ppstratof none0 (2) these 0 thei i 191 population(there were “deep stratification” for example, following ˆ ˆ (there (there were ( thesecases for the the ppstratIt (there (there none 5 0 i s ) and2 ˆ is estimated using the 3 0 0 0ction and 92 (there 3 3were of noneofoftheseforcases sfor casess for the / 2 Valliant et wtd bal population is none atification” were sample (denoted ˆby 0 (there ddwtd and wtd populationpopulation). units theses(2)these for i For cases 7 al wtd sampling. wtd ˆ the ˆ , 4 dwtd x and3wtd /34balpopulation). sample balˆ none59of).cases 0 0 for184x(52)x Bx 34 (0)ppswor ection population 30 similar to wtd model (3.2). is ) are srswor / 4 3population). (there were 3 4 2 0 the ninth. 01 0 0 tion wtd population). 1 0d wtd 158wtdlation).3//442 population).x i notation, the BLUP using TGR (correct : working for sample units ( i T s ). For example, following Valliant et (40)/ 4 population). where0 (0) 3 / / population). using the model isppstrat 3 / population). stratification”3 /44 population). al.’s (2000) is the prediction for yi the 12 0 ”on” 3 , 3 Population on” ation” 1792(52) tion” stratification” Table (1) Number of Times ˆ 1 , ppswor 2 0 1 large ˆ ’s 0 ation” 0tratification” 43 1: Note that the true2, wtd not available in any real situation; is bal B and D produced fewer 3 0 model and setˆof units1in 3 / 4 Population are Table Strategies that In not D al.’s In in 2, Strategies 53 and D produced fewer large0 ˆ0 B notation, the Table ’s 0 Table 1: NumberandTimes numbers thei population xi2 PopulationtheC. using (2000) C and D0also BLUP0using the correct model is T ˆ In Table 23 strategies Table 2: B’s B’s xof Times3 /ˆ4 ˆ Population than ATablesrswor A of Number ) , areare the num- , 0 191 (50) In1, (0) 1, strategies A and(1,1 : numbers s ythe 3 ,estimators computed ppstrat i i s as a0 2 and Rounding in produced /2 1 sampleof Times ˆ ˆD,1the numbersPopulation InInTable and StrategiesB 2 andD produced fewer large ˆ 0 ˆ’s , (denoted ˆ D,11, , s numbers include by ,ithe ) 33//44 Population andPopulation is estimated In Table2, wtd bal B serveD(1,1produced fewer largefewer using A2, 2,Strategies Bandˆ C and comparison fewer 0 than the Strategies andin D : 0 D also produced x ’s C. Rounding T produced fewer large ’s M ( x ber Table 1:1: NumberFor C and x Table ppswor Table 3 4 0 184 (52) , xofTable 1: NumberFor Times -1 ˆ negative where of (C V number : 34 )(0) Number ofˆTimesandX ) 1 X V -1/y /,2 X is an nx2large ˆ with negative ˆ ’s. x ) 0 1 i s yi i ˆ s’si 0 There were 2 ( ) forlarge’s.et There the M (1,1 : xX) Strategy Design s Mss x s , x 2 : xstandard matrix other choices. at least twice as many large D ’s (1,1 Strategy cases where samplepositive issˆ ’ss ). For rounded/ s , x : x )( x / than :AA) and C. Rounding atCC -1and0Dalso-1many large ˆ0 Design small Munits s ( M ,thanx A and C. Roundingin Cand D also produced fewer x xM)( x pilot n=50 Strategy Design: x ) s M (1,1 :example, following 2Valliant ˆ ppstrat ofwerein inleast twice as produced fewer than and’s. Rounding3 C. where Totals D 1 and also produced fewernx2 matrix with 0 include were pilot ˆ ( Estimation is when using model X V pilot n=50 V (1, x ) , / using al.’s M,(1:1n=50 pilot n=50pilot y:s x:is )pilot n=50 ˆofmodel (3.1) versusV Xmodel (3.2). , Xlargean ’s model is model (3.1) X sleast s )0 (3.2). y s when ’s.There were versus twice s many s 2 /,/ 3 3 , Strategy to Design1—NumberM1Times,xˆ)) ss1 , diag (/x4MMxand2x ,xn=50)correctlargeˆ ’s.’s. Therewere atatleast sstwice asasssmany large ˆ ’s’s 21 Population Therows (2000)):notation, the BLUPx2n=503.6: the n-vector ˆusingThere were 2at least twice asmany1large ˆˆ 0 21 piloti x x)parentheses i are(( the2 x :the Variance Estimation large wtd bal Mx , ( large ( Strategy zero. Design Strategy ppswor Design down numbers , : Table 3 Population xx 52Mof (1,1 in ˆ 1 171) Times Population (1, ) , diag 1 22 Table 1—Number of52 data. 67 67, x ) 1594pilot n=50 171 , HT whenusing therows (3.1)xestimator is: (3.2). , and y s is the n-vector of 18 sample ppswor ˆ ’s.pilot n=50ˆ (1,1 73 3 / 159and n=50x21 3.5:estimator,model (3.1) versusVssmodel ((3.2). ppswor T using ofvariance i versus model xi ) For yi pilotn=50 :used164 iCs73 n=50 ithe when 21usingmodel three kindsmodel (3.2). for totals: the n=50 pilot 199 when consider (3.1)versusof estimators Estimationmodel Totals number ppstrat of negative The for pilot We pilot rounding 56 A 3 28 A / 2 , x : 24) Design 56 The general) form of the1642 , x : x )i s199 is 3.5: Estimation of Totals ppstratA 5252 M (1,1 : x676761 159GREG estimator We consider three kinds of estimators for totals: the 56 56 / 22 18 sample M (leadswtdxbal negativeppstrat 5967 ( ppswor 171 ppswor 15951 171 6052 ˆ ( V -1X ) ˆ AMand X 171var ) Horvitz-Thompson data. of n estimators .for totals: the D x Strategy toppswor where estimates ssthan 1 X167-1x s , B,2 181181 ˆ matrix with n threes(HT) estimator, ibest 2linear unbiased fewer in V159 / 1 n / We Estimation Totals consider 6 9A wtd 60 bal s ) 167 i ss y i i y wtd :an nx2 3.5: 24 / N n 1 Totalsgeneral x Strategyn=506 bal 157 5656 pilotXx 565681TGRs164M63n=50xis199) 0 (THorvitz-Thompsonof(HT)kinds 1 / formsof /the GREG estimator is Design 3.5:Estimation Theyiestimator, best linear unbiased i pilot ppstrat ppstrat 164(sxgi243 (122) 199 AA rounding does not56M (1,1 :n=50 59s improvement.ys, , 199 28 3.5: Estimation of ofTotals i general regression estimators i ppstrat 56 164 (18) (28) but overall 263 (98) predictors and estimators linear unbiased Horvitz-Thompsonkinds (HT) 6 8pilot ppsworbal rows ppswor , 134 (28) (167, (98) n=50 (122)n-vector of (BLUP),kinds ofof estimators for estimatorsthe 8balppswor 157offer 134 5967 diag 263 46 171 181 predictors (BLUP), and general estimators for totals: the ppswor wherepilotis) theV5939 (18) n=50“g-weight”167and i 243 9 variance consider three assumesestimator,GR ˆ best forgtotals: the 6 expression We three is T x(83)pilot sThis the We consider three of wtd and D 129 606052 i wtd 181 al, i , 60 59 167 181 ss i (20)(1, (25) 256159for unit y (Särndal We consider (BLUP), kinds generalreplacement totals: (114) 3 21 21 C Strategies wtd bal producedgi x150150 (25) ˆ 256 (83)275275 (114)et.(GREG). The HT estimator is regressioni s i yestimators ’s ) predictors and with given by 2 14 C ppstrat ppstrat 129ppstrat negative 164than A 199 8 (GREG). The HT estimator estimator,best linear unbiased 10 56 AB ppstrat 157 (18) fewer134 (28) 263 (98) 243 (122) Horvitz-Thompson (HT) population regression (HT) estimator, linear ppsworwtdC136 (24)56 142 (24)32 252167(98) 267 sampling, Horvitz-Thompson(HT) estimator, bybest linear unbiased et. al, 157(18) (18) 13467 (98) (122) ppswor bal1992).(20) 134 (28) 263(63)36 243(105) Horvitz-Thompson finitei isis given correction unbiased 5260 159 171 but 8 uses The HT estimator is givenbest for unit i (Särndal the ppswor 157 (28) wtd 18 bal 1 22 sample data. where by 59 27 263 32 243 14 and B use 100 bal 142units, as252 (63) 181 (122) 136 (20) 267 (114) (GREG). (BLUP), gand the “g-weight” 500 n=100C sincewtdand D129 (20) and 50056(24)256opposed275adjustmentpredictors (BLUP), Tand general , regression estimators n=500 10 general regression estimators ppstratbal n=100wtd general (25) n=100GREG estimator predictors to approximately s yi i regression estimators 129The n=500(28) combined n=500(105) 1 n / N 150 (25) 256 (83) 275(114) (25) (83) CA ppstrat 56(24) estimators 256 (83) 164 199 (20) 3 28C samplesppswor 129157 (18) 150C.form ofdependingwith(114) two predictors (BLUP),ˆ and i general for wor 24 of the working (98) ˆ 1992). T is issgiven, toC pilot ppstrat size 50These and134n=100263n=100 243275n=100 is (GREG).The HT estimator account byby in A 150 Also, then=500 (122) yi n=100 n=500 i (GREG). The HT estimator given small, the 2 n=500 sampling fractions are i wtd0bal 136(24) (24) 142 value 252 (63) wtd 181 150 (24) 252 (63) (114) Cwtd bal 6 9 6 there 136129 and142n=500 ˆ252 (63) 275 267 (105) (GREG). is The HT estimator is given by unit i. lead , sampling. Since srsworbal 136 (24) least 14259timesof 167 s g 2673(105) of where to is probability of ofof selection unit with 8 60 0 (24) These estimators on the strategy, ppstratmodels (20) true (24) TGR68(83)andi yi267 (105) where i thethe the probabilityselection, for for for unit i. the two working at three(25) 256 many estimates as 68 i ˆT ˆ 2 0 wtd were 8 ˆ where i is the probability isyy T y selection srswor baln=100 (24) n=500 7 252 (63) 0 267n=500is negligible (Wolter 1985,Tsec. 2.4.5). i i ,i i, combined i. 0 32 136 142 (24) (105) srswor n=500 0 ppswor 157 (18) bias n=100 n=500 263 (98) 243 (122) n=100 n=500 i sis 6 8 8 n=100 n=100 n=500 i ppswor 16 (3.2) 134 (28) n=100 0 93 5 and lead to true ivalue negative ˆ ’s using model n=100 is theFor 7model for0unit we have et. (1,1 :0xgeneral models and BLUP estimator is form of ˆ The ) (Särndal T al, 0 n=500 ppswor 12916 gi versus using (3.1).(3.1),n=500 (114) where The, general formthe theselection offor unit i.estimates of 0 B1 C ppstrat wheretotals. 00 “g-weight”68 (20) (25) n=100 25693 (83) i275352 of 2 14 B ppstrat B nine8ppswor 1500 10 is theprobability ofselection for unit the probability of selection unit is where i iisis the ˆprobabilityof BLUP estimator i. i. srswor srswor 88 6868 33 where i srswor 0 11 8 81 5 ˆ , for T ˆ 68 3 3 0 srswor ˆ ˆ ppstrat 0 0 0 nine T of s yithe For x estimator wtd bal The general formthe BLUPimodel, is is i s ppsworppsworT13611x ppswor 1616 0 9393 0 xˆ 500 n=100 wtd ppstrat 1992).16ˆ ) , T 1420012 ) , 25281ˆ (1,15:35ˆ5)51The estimatorsgeneralformiofof sBLUP estimator(3.1), is have T (1,1 : x ) , n=500 and(63) ppswor bal 1216: (1,1 (24) ˆGR0 0 (24) 9293 TGR 267 (105) The general formtotals. the BLUP iestimator we The (1,1 : x x 5. The general form ofthe yiBLUPestimator is 93 920 i i s BB 0 0B wtd bal 12 bal n=500 05 33 These estimators combined with the two working wtd 0 ˆ B ppstrat ˆ ppstrat 0 ˆ158n=100 30n=500 where x is the(1,1ˆ: x ˆ ) , yT yfor y x ˆusing the working . The estimators 8181ˆ 55 5 ppstrat 43 111111 11 ˆ ˆT ˆ ˆ 0 5 (0) srswor ppstrat ˆ n=100 (2)2 00 0 (40) 0 2 0 ˆ T Tpredictiony i (1,1 : i xxˆ,, and T (1,1 : x ) ) ˆ ,, T ˆ 81 / / 2 81 i 1 ˆ ( 2 srswor 43 158x(40) T3 3x3/(0) 30 , 0 and the x12 (2) ) lead x ˆ i i i s i GR s s i i ii i s wtd0balwtd balmodelsxand true0value of9292and estimates xof: x where to i is the prediction fors is yii usingGR working wtd bal T (2) srswor , 0ˆ x , x (52) ) srswor 8 wtd bal 67(1212,12 : x ) 00 0T (2 179 92:68 0, 43 (1) 0 , 92 GR 3 ppswor 0 2 0 set units in the population x ˆ ˆ not in ( x 0 ppswor 43 67ppswor 0 158 (40) (52) (0)(0)(1) model andand of T of / 2prediction for x ˆ / 2 that: are are notthe / 2 D0 1 ˆ 0 158(50) ˆ/ srswor D 5343 x(2)(2) ˆ : 0 0are for 158(40) 3030(0)have Twhere ixx is thex units in the population that)the working, x srswor (2)(2) (40) we43 (0) ppswor T 4343 2 , x 0 191179(3.1), 2330 (0) model ˆ (1,1 0 is ˆ ( prediction for the , x : x ) using , ˆ srswor 15893 D ppstrat srswornine16totals. x ˆ0) For 2 model(40) 0 (3.2).305T iswhereninth.,iˆˆissettheprediction , forT (yiyyiusingx the workingthe : x ) , and (2) (2) 0 1B 0 where using the ˆworking where x ˆ model the : x ) is the prediction for i , using the Tworking GR in ( (2) 1 0 ppswor ˆGR53ppstrat 0 3 179 (52) 0 43 (1) (0) sample 0 i ppstrat 676767 (2) 0 179 (52) 191 (52) 43435(1) (50) 23 0 (denoted by i ˆ / 2 s )ˆ and ˆ is estimated using the ppswor ppswor (2) 179(52) 3443ˆ (1) model and (denoted unitsi in sinˆ and populationare notare the is the ninth. (1) ppstrat 59 (2) (2)ˆ 0 184 (52) ˆ for that are thatin in ˆ Dppswor (2) 17981 00 0 (0) 0 3 D wtd bal ppstrat ˆ6711 (2) 0 sample ) population that not the model and setunitsunits :the)the is that are using the model and are 0 DD wtd 59that 184 (52) in ) 1 34 not available (0) anyThe modelunits setTofofby inxthexpopulation estimatednot in Tnot situation;set ofˆ s ( x ,191 (50) and ˆ (1,153x : (1,1 )191 (50)TGR 2323233(0) sample 0 and set of units ,in the population model (3.2). the x . ppstratbal Note(2)(2)balTGR0 00 :2xis191 (50)0 (1,1(0)(0)(0)realestimators ( i GR ). For example, ˆfollowing Valliant et ppstrat T 5312(2) , (2) (2) wtd bal 5353 wtd )theˆ true 0 ppstrat 191 (50) 34 :23 (0) 0 0 0 184 92 sample(denotedi bys ). s the) and ˆ ˆ isisestimated using the et units Note thatForand ( byby ii ss) example, following Valliant real situation; and B’s numberswtd0balwtd balestimators computed ˆ /184 ˆ(52)ˆ serve (0) a comparison are bal 5959/59 the sample ) true and isnot available in any sample (denoted i BLUP using theestimated usingisthe estimated using the wtd (2) 184 (52) ˆ 34 (0) sample (denoted 2 2 158 (40) srswor 30 correct model wtd bal 1 0 0 ˆ (43(2) 2 (52) ,0 0 ˆ using(52), 3434as/ 2 , T 59 of , x x ˆ 0 T ( , 184Population (0) al.’s )(2000) notation, the the BLUP using the correct model is x (2) x x al.’s the For C and D, theTable 2: NumberIn(2)of :Times 0ˆ3 x 3 ,,179 A )Population , x sample (2000)( notation, Forexample, following Valliant aset a comparison numbers2: NumberTimes ) 1, strategies : x and TGR ( x (1) : sampleand iestimators computed following Valliantet et 2 Table Tablethe other choices. B’s numbers areunits ((ii s ss).For example, using sample units units ). ). For example, following serve Valliant ppswor standard for 67 (52) 43 - 83 0D 0 0 ˆ , ˆ 53 (2) xTimes are ,for 2 Population T ’s.(50) C 23 For the ninth. ˆ (1,1(1,1 ) for thei otheri choices. , model is al.’s(2000) notation, ) s y using the correct (2000) standard BLUP s i ˆ : x the positive ˆ1 wereTable 2:2:Numberˆ /of, Times)ˆ ˆˆ 3 33, , 191 2/Population (0) isthe numbersnotation, : xtheBLUP yusingxthexcorrectmodel is ’s roundedNumberxof 2Times ˆ negative ˆmodel (3.2).andˆ D, al.’s (2000) Tnotation, the iBLUPusing the correct model is TGR number :of 0 ( x ppstrat al.’s Table T 0 0 2 2Population Table 2: Number of i s i i s i M ( x (52) x ) M (1 : Strategy Design 59 (2),1 ( x1 )Estimationsmall , x/ 2 : x 34 ˆ) ’s where ˆ ( X ˆ -1ˆ , : ers in parentheses wtd bal 3.6:include1,cases0wherenot M ( xpositivexany real situation; s VssTX s-1 x1 X s 1Vss y s -1y X s is anxnx2 matrix with the M Strategy areDesign NoteVariance:true is 184available in (0) were rounded T (1,1 :) : x ) -1 , y that the x ) , (1,1 Ts Vss : s ) where ˆ ( Xˆ (1,1 Xx) ) X s iVisi syyi,i X si isiss sixx i ,nx2 matrix with is i an , i ss s pilot estimator, the variance estimator is: pilot/ / 22 theM,(tocomputed using x2n=50x: x as and B’s Strategy are and estimators1:1n=50 numbers the he rounding used forDesign Fordown(n=50x )) The MMx (pilot,xn=50)) a rows are xthe 3.6: Variance and Estimation the n-vector of numbers xin x) pilot M( x ,/xserve parentheses (1, ) , V ( comparison M (HT1 x:zero. Strategy Design Design , :: M1 1, , : x) Strategy C y is diag ( x ) , -1 -1 1 1 -1 -1 256171 (83) 256181 (83) 275(114) model kinds 56 (83) 275whenconsider three (HT) estimator, best for totals: the (114)using We(114) ˆ ’s. 171 versus estimators / n=50181 ) 275large modelThere versusat of model (3.2).linear large ˆ ’s Horvitz-Thompson were least twice unbiased 90 2 ,243171171 3.5: Estimation (3.1)(3.1)HTestimator, is isaslinearby 171 using x (63) :x (GREG). (3.1) (GREG). The Horvitz-Thompson (3.1) estimator best many 171 3.5:Estimationof Totals (105) Estimation (HT) whenEstimation of (HT) model (3.2). given model of Totals 52243(63)when3.5: 3.5:(GREG).ofThe andHT estimator given (63) (122) (105)EstimationofTotals versus model (3.2). 199 t252199 267 Horvitz-Thompson Totalsversus model isgiven estimators n=50(122) 267whenusing (BLUP),The estimator, best linear unbiased ot252171 199267(105)usingmodelTotals HT estimator (3.2). bybyunbiased n=50 199 3.5:Estimation of Totals general regression (122) 3.5: Estimation of Totals of estimators(3.2). , 171 predictors three 4 243199 275199 We consider(BLUP),kinds versusestimatorsfori i ,totals: the (114) predictors model (3.1) general regression estimators ˆˆ 3.5: 171 181 when , T We consider 199 otn=100 n=50 We consider three and 275181 (114) n=500consider of three general estimators totals: the n=500 using three kinds ofˆ T modeli n=100181 n=500Estimation threeestimator of estimatorsyyifor estimators predictors The HT kinds of T given s iyis s i for totals: the (BLUP), and 181 of We consider kinds estimators for the 275 171 199 (GREG). of 7 n=100 181 Horvitz-Thompson Totals estimator, regression totals: the the 267171 3.5: We considerof(HT) kinds estimatorslinear for totals: (105) 3.5: consider HT estimator of giveni by for unbiased We Estimationthree kinds ofis estimators lineartotals: the kinds 19968(114) We3 consider Totals(HT)probability best selection unbiasedi. (GREG). whereof kinds ofisestimator, for totals: unit TheHT the (HT) estimators 181 Horvitz-Thompson is ˆthe estimator,best linearunbiased 181 171 (GREG).wherethreeTotals estimator,byselection totals:unit i. 243181 3.5: We Horvitz-Thompsontheestimator, ofbestselectionunbiased (122) EstimationThe threeisTotals probabilityby best linear unbiased 3.5: Estimation i i i (HT) is given , of for Horvitz-Thompsonestimatorprobabilitybestlinear forfor the 267 199 (122) 3 243(122) (122) for of where unit 243 (105) 3Horvitz-Thompson (HT) estimators for totals: the i. 68 68(105) consider three kindsisT estimator, best linear unbiased (HT) 267199 We predictors (BLUP),(HT) estimator,regression estimators y 98)243(122) Horvitz-Thompsonof Totals general i ,regression unbiased 243 (122) 3.5: Estimation ˆ 18193(122) Horvitz-Thompson (HT) estimator,i ,regression estimators The following is the basic model variance estimate 4.2: Total and and The following is the basic model variance estimate 4.2: Total an 243 (114) We predictors three and general best linear totals: the 275n=500 predictors (BLUP),generalformestimators for estimator the 243 (114) We consider The kinds general best three Tof ii s i for estimators (114) Horvitz-Thompson generalgeneral i the linear unbiased 5 consider (HT) and formestimators unbiased 93199 5 275(122) 5predictors (BLUP),estimator,yybest the regression estimators 93n=500 The ˆkinds iof estimators unit totals: the 243 181 (114)predictorsisThethreeand andofselectionBLUP estimators general general regression totals: 83)275181Horvitz-Thompsonthe(BLUP), formisgeneral BLUP estimator is 275 n=500 275(114) (GREG). The(BLUP), T andgeneralofregression estimators (114) predictors (BLUP),estimatorofsofiof by for estimator is is predictorsTheHTprobabilityis sofigivenby linear i.unbiased (BLUP), (HT) estimator,regression estimators and isgiven linear for estimator estimator,BLUPlinear unbiased forfor the BLUP estimators: the BLUP estimators: 3 26781(122) predictorsi is theHTestimator of given bestbyunit i. (122)(105) (GREG). (BLUP), (HT) generalthe byfor Our primary fo 3 (GREG). The HT Our primary 275(114) 5whereconsiderThe HT kinds ˆ isselection 275 (105) 181 Horvitz-Thompson and ˆ Horvitz-Thompson best 267267 (105)(GREG). TheHT estimator ˆ given by (105) HWe and valliantestimator is regression ˆ estimators 5 (GREG). 243 predictors 5 243 267 given 81813 (122) where is the probability Tis regression x x ˆ , basic mo 63)2673(105) where (BLUP),HTestimatorof selection i for unitxi, ˆ ,unbiased enry 1 The following is the 4.2: Tota 1 The estimatoris given syy 2 2 2 ˆ 5 275 (114) (GREG).i i The HT formand thesTysygivenbyibest estimators (114)(114) predictors The probability iestimator,regressionii. estimators 267 5(114) (GREG). generaland ˆgeneral giveny estimatoriss (105) 267 (105) 3predictors (BLUP), TTTˆ T isiBLUP isby , i slinear ˆ ˆ following is the basic model variance, estimateof totals and ii ) 267 (105) (GREG). The HT estimatorˆgeneralii,yii,regressionis estimators 2435(122) Horvitz-Thompson (HT)T general ,,by (BLUP), ˆT and i s i iyi i s The of of totals and th n=500 var1 (T )(T ) i s ai sria 2 r 2 xi s xi i s xifor the s ri s ,ri2 estimators: var1 275 (GREG). The HT estimatorˆis given yii s i i 92 3 275 n=500 92925 n=500 3 The general formˆ of the BLUP iestimator is i s i i BLUP n=500 i estimators: i s xi i 00 2755(114) (GREG). The formTTand BLUP iestimator predictorsThe (BLUP),ˆof the i iisiybyi ,, for The the The following is for the model variance estimate 4.2: Total and VarianceOur prim 7158 n=500 where generalHTestimatorygeneral ,xregression estimators following i i is the basic model variance estimate this Estimate Total and i ) (105)5(105) 30 (0) i where HTxˆestimatoriiissysgiven by ,fory isyusing thethe workingthe basic BLUP is the basic model variance 1estimate 2 4.2:paperTotal (GREG). istheprobability thesselectioniby unity given n=500 following n=500 paper in 33 267 3 158(40) probabilityof y,selection where 58267(40) 3 30where i iswhereT ˆTi iˆis is itheofselectionˆforforunitii.i.using the working The The is the “model weight” involving xˆ in2 the this 4.2: inclu (40)(105) 30 (0) i isis theprobabilitys y ofprediction,foruniti.unit i. 2 ˆ ˆ the i ˆˆx T is yof isselection n=500 whereThe aivar is “model2 basic model variance Tin the 53(105) (0) wherewhereHT estimatoripredictionfor fori i.the The working where i basicestimators:rweight” model Our vari TotalrandaVarianceitotals f sxi T probabilitypredictionˆ ,for i for where i isThexthe probabilityiyofiiselection unit i.using following is isfollowing(Tmodelbasic2 model variance 4.2:1 (estimates i95% Our Estim 3 (GREG). theis probabilityofs selectionˆbyfor unit The following the Theis the) theisisathe basic s involving i primary ifocusri4.2:confidenc given BLUP estimators: a following estimators: estimatevariancexestimatei, VarianceandxVar is4.2: Estimat TheTheBLUPmodel variance model variance i)i and 4.2: how estim estimate i the iT ˆT s syiof selectionfor unit i. ˆi i i following sthe i estimate OurTotalsprima Total i and for the basic 1 ˆ is ithe i basic estimate 4.2: Total s 4.2: TotalofTotal a primary =500(52) ) 179 n=500 4343 (1) general probability BLUPi estimator is 2675 the following variance i xi for the BLUP where i general formTi the is s yii sisi,i, 95% confide whereThei is theform ofof sthei BLUPxestimator is and i sx 33 55 179 3 (52) 43 The ismodelformsetˆ ofof selection for unit 79 (52) 5 5 where model and setsettheunitsi in estimator isi. that arearenot in the (1)(1) n=500 the probability ofBLUP the estimator 3 The generaland ofi selectioninthe population that BLUP theworkingestimators:i 1 2 ˆprobability of the the 2 primary estim working model and 2xr r primary The imodel form of of unitsBLUP using is that ˆ ) form T unit where The general ofoftheBLUP estimator is the are not in 2the where) i aestimators: the ri is ˆand ofthe BLUP estimator for ˆ, population of1totals isx the “modelthis and fo variances.pape total primary t estimator 3 (50)53 where(0)i(0) ixgeneralthe predictioni selectionusinguniti.for varBLUPini thefor the BLUPˆestimators: isresidual iforfor xOur1si.i2and2 ,focusOurofhow est ai r BLUP s x n=500 where general formformunits inyitheiypopulation is the(Tnot estimators: 1 Tmodel athe ri“model2 i,weight” i unit i. aprimary focusOur weight” Our ir , r OurOur is primarya how Ta estimat s oftotal totals forifor the var (T ) iestimators: s thevar ˆ BLUP 5 55 (0) 191303(0) 2323 (0)is theˆis thetheˆ ˆpredictionBLUPestimatoristhei.working forestimators:i(BLUP issiandii 2s is2 ris ithexresidual xwhere1 iprimarytheirtheistotals focus (50) 191305 (50) 23 The xgeneralprobability of for foriˆunit i.unitfor working 91 30 (0) 5 where x iiˆ isis the T The s si involvingi s ii in selectionfor where i istheT predictionoffor xyx usingˆ the working s1 i s 1i , y i 2ai i 1 i s xi i ˆ 55 1 yi and ˆ 2 2 The i sampleprobabilityy estimator ,forˆ the i. 1 ˆ (denotedi the2 (2ˆ ) alsoT i 95% con (denoted and is 5 4333 2 include of totals4 incl var r var var) also i s 2 r2aari2 1 a 2irobust ithisitotals2ri,2 and of ofthis paper T ( is form 33(1) in whereandsampleTˆˆtheofthesyiBLUPs)estimator,isis estimated() using“modelvar1 also)ai2 ri2 aiincludea xrobusti xi workingrsimodel theirthetotals mean 5 (1) The(0) sample theformofiofiistheiiBLUPs)x)ixiiiˆˆs,x,that unit estimated ˆ ) thethe1We weight” includerobustxirxsin2is x s xsleverage-adjustedtotalsroot and/Ta 1844355 as r xi x x1s of ,totals , of thevariances. th 184(52) 34 (0) generalofofT T in bybypopulation, ifor are not var the thei s as2a2We aT 1ˆissis x involving iissileverage-adjustedis,variancethistotalstheir4 model sample Tformˆ BLUPiBLUPsx) and ˆ where in (Tis set probability s i ssiestimator estimated 1 units the 84 (52)53 3 3434the Thegeneral(denotedstheyby yselectionis isare estimatedTusingiwhere Weˆi(Ts(ii)xˆinclude2ands xiweight”,thesi leverage-adjusted their theand3residu (0) general set (denoted i byyi iof iiisi ii xsand, that is not in varusing ˆ of paperand and ri ispaperanda i, i r i form ˆ ri i iworking thes “model r i s i i the residuali fori unit in the is s r i r the / 3and T i s s i yi population The generalunits in ithe population, that are not ina1the estimator is i ii s i “model iweight” i involving i. 43(52) modeliiixthe BLUP’s: involving confidence thevariances.estim 5(1) i i sxii modeland sample ˆunitsin sis andset of units of units y i 1 i the xi in intervalconfiden i 23 (0) where theˆ sampleTunitsii((ithess). For ii ˆiexample, following Valliant iet where estimate for the BLUP’s: i sexample, following the s example, following iFor example, following Valliant variance ai is BLUP’s: (CI) inc 5 30 35(0) using Theˆgeneral form iof the forˆFor estimatorthe working ). estimate sample prediction and i ,x estimated ). i this total inclu this include paper in 95%root conf sample units ( ss si ). Forˆsyyy ˆusing the using Valliant variance estimate for weight” involving95% paper include generalizatio et et ˆ the (0) 5 this x inin this paperpaper me the root cov 2335 (0)model xxx i ˆisT ˆ the bypredictionBLUPisusing theisworkinga is thewhereis afor thethe“model weight” involvinginx the includegeneralizations this paper the this papermean the 95% robu 30 (0) where sample (denoted T prediction xfor xx using working model “model the istheinvolving weight” involving also include the i )yy for iy using the the working 30 (0) where i x ˆ x is theT predictionforˆ fori ˆ y , using whereathe whererwhere weight”andinvolvingx xainin the unit i. i in thethe ai xleverage-adjusted ona 3 30 ,i 40) 23 (0) “model residual for unit is weight” “model 30 (0) iWe where is the ˆ yi prediction BLUP using working is ˆ the where working where the 3 i 34 5 is working iai is “modelr include i i involving confidence are95% sample sample and i 2 ii 95% 95%estimator confidence co confid 30 (1) 432 (0) the working where i isiis the prediction sfor is estimated working is and is the We the “model i. the residual for xi i. the correct interval estim sample iiˆiˆ al.’s (2000) )s)sand ˆBLUPˆ BLUP the correct the where where (denoted byi ˆsii sisinotation,is iestimated using i 2 (0) model xxetal.’s the bypredictionthethesyiixiusing the using the model is is ai modelalso and2 weight” robust 95%estimators basedomitted du 34Population (CI) in , 3230 (1) (1)Valliant(denoted(2000)notation,fori iyBLUP, usingnot inin the model is working modeli2 riaii2 iisrithe residual for total1unitconfidencetotaltotal (CI) a are confiden 43 3 (1) where al.’s (2000)Tnotation,i populationusingareusingthe i 95% confidenc Populationmodelxandal.’s(2000) notation,the yistheusingthe the correct model i 43 (0) 34 3Population ˆ andˆ setisofunits).ininthepopulationusing are Valliantthe working rvariance rand theisunit residual fori.1 95%i.23,r 24, and intervalpopulati the omitted 52)30 (0) wheresampleisunitsofsetunitsintheinyithe populationare working the the and working)alsoestimaterrforthe i.BLUP’s:variancer estimate fortotalgeneraliz 43 model andthe ( ofunitsunitsithepopulationthat thatnot in themodel and model(residualis ri unitai. robust unit theunit / i prediction s example, i that set of of in s y i i s following working not in model includevaris(model and forisi residual xfor forifor i.i. i is the unitsForthe populationthe are workinget 43(1) (1) ˆ T and i workingWeˆ ) alsoandfor thetheiaresiduali sunit i estimators based2on estimaa model r i ˆset (prediction for for using that workingnot the var2 robust i sleverage-adjusted s leverage-adjusted totaltheon samp We et also a Tmodel siinclude isthexi s irobust xunit s iestimatorsthat the HT 43 the residualiincludei residual i x total s i total basedBLUP’s: estimators est ba 0 (0)23(0) totalthe / are 3 HT the estimator ri We is 2 model units ( that x imodel 43 3 workingthe sample andand s ). ˆFor example, followingthenotinthe example, s 43 (0) 2330(0) model and ˆsetisisunits Forˆthepopulation usingareValliant the total i leverage-adjusted that estimato i sampl 30 (0) sample (denotedithes predictionpopulationthat arenotworking (0) where unitsset unitsin ˆ (1,1x:: ) iˆ)) followingthenotin thein 23 (1) where i model xx notation, the BLUP using susing Valliant, is 2 2 h 1 yi xin 50)43 (1) (0)correctand isetisiofthe).predictionxforˆ isyi theycorrect siworking 23 T ) : and 23(0) (0) sample modelof by inTthe population iestimatedx modelet 2 al.’s (denoted by i in s(1,1 ˆ ˆ ˆ i thatare susing ii pulation(0) sample(2000)(denoted predictionxfor yiestimatedvariance,the the We variancealso 1 foriiaha 2 BLUP’s: s x, and leverage-adjusted / 4 /sam 3 We robust We include leverage-adjusted the 1 ˆ3 /34 2the i generalization robust 23(0)/2 34432 (1) model(2000)ˆunitsthe i T (1,1 and) andthatythat not inimodel estimate variance alsoestimate the a a robust leverage-adjusted athethe/ 2totals a (0) 2330 /(0) al.’s sample notation,the ss)BLUP usingthes correct using is the (T ) )3 (1)34(0)modelsample(denotedbyby ibypopulationisiissiithecorrectiusingWe also for the BLUP’s: include ii rtheaBLUP’s: generalizations, andthe3are3popu al.’s(2000) notation, theBLUP for y estimated modelthe is include a aalsoˆ includei for where (denoted is usingi using the and ri for the 4 4 4 2 popula i s is iestimatedWe the using / xi / sample (denoted in ii ss)) and ˆ ˆ is estimatednot working and We var robust leverage-adjusted leverage-adjusted GREG 4 3total pulation(1)2, ,xxx:sampleandxset of unitsthesthe)and usingisestimatedthetheinthe also includealsoestimateincludei robust robust leverage-adjustedheld GREGi omitt 43 generaliza var xxand (denoted units in -1 population that )) 52)23((xx(0) (0) :modelset ofseti of by iFor )-1and 1ˆ is -1 are areusing inet variance estimate2 (thethe for the BLUP’s:i i. The 2 i s ri i s ulation (0) : sample units setˆˆofsby in the population that arenot the M / 34 T sample and ( ( unitsFor example, following Valliantthe (34(0)x x )model i s estimated isvariance estimate where estimate leverage for unit not nx2matrix where the is is for2sthe BLUP’s: 2 variance estimate 2 -1 , s xi identical 1 h -1 X that thedu the H 3M M34(0) sample andwhere by). i).ForVthes1)X 1X sisVssyy sx, XisVallianttheet withvarianceBLUP’s: a 2 rBLUP’s: unit si. The iidentical secondlength). in ide (0) 23 ,(0) sample (denotedi ˆˆ(s).xX)sVexample, s-1followings is using matrix with the hii his the) leverage2hfor fori unit i. 1 omitted due toresulted Boths following, variance etmatrix variance estimatethe ithe21BLUP’s: x for for leverage 34(0) 34 The identical heldare the in sample whereT (1,1is ().XForFor populationfollowingnot in estimatewith units(ˆi aregeneralizations generalizations ) 3443 (1) annx2 generalizations sam ai rifor X and )yi X V X X are the nx2 generalization for are for omitte ) V 1 ii 23(0) samplemodelunitsunits ( unitssss )ssthessis sestimatedi susing anValliant et where BLUP’s: i i a r ii 2 , for generalizations heldgeneralizatii s ˆi example, that ii omitted resulted he 1 2 samplewhere sample and by((1,1 :X) Vs).Xssexample, yfollowingValliant ˆthe unitsˆ (denoted set iofs : s( s inexample,ss sssi ,ssscorrectan using the and example, estimated using ,s Valliant et 2 , ) var 2 (0) pilot n=50 (2000) notation,x i)i BLUPy i usingiestimated model iset var iˆ s x ˆ 2 2 i 2 2 i2 i i is s i x 1 xi x the ri , estimatesareGREGto ) a s 4 x pilot(0) sample unitsnotation,).For s )iandiusingfollowing ValliantT is (T sample unitsTi i i by the BLUP ˆˆ isthex i correctmodel et) s y sample (denoted estimated modelis )pulation(0) al.’s al.’s(denoted-1s:s).xthe BLUPusingfollowing correct model 23 2 2 2 term T the BLUP ulationpilotn=50al.’s(2000)rows notation,sV)example,isthexand, yValliant (the isi s 1 aof 2 rin both i(modelrix2sirii2 rimodelri xvariancesthati 2omittedridue intothosefirst,duu arearethatBoth omittedt the the d i , : 34)n=50sample(2000) ( ((1,1sby For thei and using ,thecorrect using2 etis x a s i sthe the i s 1 i s HT for i s second 2 (Tterm) irin12aiboth 1 model i 2svariancesare, omittedthe tothat thatdueuse both 1 h 1 2accountsiare 1variability omitted that es accountsis , for are length). H 34 ) (0) al.’s (2000) notation,))Forexample,ˆ(xfollowing Valliantwith duethoseomitted fo second var2ˆhin i is s the leverage fori sunitforaccounts are leverage HTBo ai i ˆ length). for h -1 term where ulation samplesamplerows notation,iexample,diagi( )x,iisandValliantis the the r 2 (2000) Xx , iV V) y using)) estimated model ˆ al.’s model Population al.’s (2000) (notation,sx theBLUPsdiag s usingand s yismodeln-vector ofvariii (Tvar (T(i) i ai ivariances r,xsi ,x where hiiri,r,2 identical omitted u y ,xx :21 ) (denoted using where unitsX i ).-1xi ). ithe, sss-1 BLUP is ,ancorrect iset varn-vector of var var)ˆ X irowsV s ,X -1ss and x isthe nx2 sValliantis ) (T ri 21units ˆ(units (-1ss(1,)). 1X ss Vssdiag,(X following matrixthe(is()T 21 21 x1s i. 21:xx34 al.’s (2000)X s (1,(1,sby11thesBLUP,followingcorrectsValliantTn-vector s ivariabilityTin i i si is s ihii xiiunits isnotii in thes xsGREGi rsi2 s notation, For example, the correct the with using an nx2 s i i i that the )pulation (0) 21sampleˆˆunitsˆs(Vss For))).ForVexample, iisfollowingmatrixwithetˆ ) i svariabilityˆ 2inisx)1iixipopulationiixunitssxsnot iinsi xiisample. TheiHT thatthat HTthefirst totalsestimates theusingest producedresultede i x pulation si the GREGHT 2 ulation thatare estima that first, f thetotals HT sample X sVi i X s) where ((notation, ) X s 2 i population GREG HT t i s ir i i sample. s i s xi i s i al.’s (2000)sampleXˆ:x:xthe sBLUPs, yX ss i thexcorrect model 2iset n=50 h1 h s h where an nx2, matrixvar 2 1i ulation al.’swhere notation,data.BLUP ssssyiyysyusingcorrect, ,model is with the population sunits iinot hinii thei identical thethat ithe HT inestimates arein the shown in to ta s ˆ ˆ (1,1 1 1hii hii 2 h for term 1leverage sample.variances term 18 (2000)(2000)TTTss(1,1(1,1theBLUPiyithe y thexcorrect model iset 18 sample: s :sx). ˆ the n=50 model tota 22 22 18al.’s (2000) )notation,diagx(VBLUPiusingi sfollowing ,Valliantof leverage ii GREG’s, iiwewe for the followingThe totals produced thevas samplesample(1,1data.):)Forusing iusingiisxisxiiitheimatrix hii is units(1,1 Tdata. ) x)i i)example, the correct model is i on 22 where For theiiGREG’s,bothinclude thei. resultedaccounts bothGREGtotals is the i. The model second variance forshown in tha The totalsGREGresults using secondh unit the ii leverage for unitGREGidenticalproduced using in include unit following in identicalGREGtotals T(Vss : xxthe i si,s siand yi s sixi s x, where rows (1, xi TT (1,1 x ) ) , (1,1 n-vector those where is i. n=50 21 al.’s (1, x ) ,notation, For the GREGvariance GREG Forfor in identical totals prod ˆ ˆ V : diag ( x ) ,s yyi s y is i the n-vector of 2pulation s pulation) resulted the ide x ,estimator xx2828 ) : :x28 ) resulted For not and the x the n-vector is x : x ) 24rows (1,ˆ x i) ,The ss: general-1formXoftheanisthe, matrix withof where where h h isisthepopulationunit unitThetheTheidentical the rows ˆ(2000)VV-1-1 )xgeneralBLUPtheisis isis inx2 n-vectorisof is is the leverage forin leverage for i.not ini. The identical 2 , x/ 2 :21 -1 general x i )iformof y GREGcorrect model 1 ) 1the ,i s sof usingGREG estimator is s i 24 The 24 GREG estimator h units rows for al.’s where is thevariancesi. model 2002, i.those 2.4): -1 y where ˆ ˆˆ((1,1ˆ:VThe diagsXformy s,i-1X x si isan nx2 matrix with ii is in wherevariabilityleverage accountsunit expression 2.4): for ˆthoseshownin x, x ,xx ) :x where opulation where ( XT (1,1) )1 ssX-1 , an nx2 where with iimodel bothhsecondfor thethe i. Valliantunitsfor variability in used resultedidentic leverage is(e.g., bothbothThe for unit resulted i , ( xss-1 estimators ii(e.g.,unit Valliant model expressionsample.population those for i estimators in see The for identical The identical where T iX(sXnotation,X(Vi1Vy-1 sand X is s ,the nx2second with the second htheii termunit leverageidenticali.following inin resultedresultsin us term seeleverage 2002, the variances identical identical in that in ’s. Relative that ˆ Vss sample data. s ssssX sX s-1X1 V-1sss , ss , x : :xx ) ) resulted results x 21 is an ,matrix term ss where matrix hii i VX For ii For the GREG’s,include 2 2 the accounts identicalresulted allid variance resultedin fo =50918 (1,1 sampleˆ data.XTV(1,1))s))XXs ssy-1-1iyisTssyTGR s i Xansxsnx2 i,inx2 matrix with 6 where 9 6 Foraccounts 7,174.74, 9 xn=50 6sampleˆdata.XssVssssXXss1x1)is))ssVssssyyy,GRTXss,sisisisanixygiyymatrix within population unitsterminininbothunits not2 in2variancesaccountsfor for 7,174.74,ide sample ( ˆX ss where ( X ˆ(-1 ss V s V s sˆ , ,ˆyX ii annx2 ,matrix with secondsecondtermpopulationnmodelinclude accounts the GREG’s, that used xn=50) : 18 ii s i , where ˆdata.s sT-1-1sX:s:x1 XX V ss VssyiˆGR is ssig isgii , variability term invariabilityinGREG’s, we modelvariances following variancewe includeat Xs we g r the sample. for thosethose thattha n=5018 term not model accountsi pilot 24 tn=50 second term ˆ in population variances those that used ˆbut are briefl both model i gfor x2 n=50 ) thoseRelativ both model ˆ see Valliant, 2002, expressionthe sample. where (1, general1ss )diags) xthe, andyisy is anisthe ,is with term in estimators in boththe1sample.units variancesthose that tablesshown theFor t X rows Thexxx-1s ,VssX s diagofVxi),yGREGyy sisi isnx2 matrix with ˆ Relative bothvariabilityinvariances nvariancesri in theaccounts forfor over’s.that the x ) accounts i (1, (1, Tssform1-1 ( the is and snx2 the is n-vector n-vector ’s. theinthatus use n=50:x21) whererowsThe Vi)ssi,), VssˆVformxofdiagXsGREGsestimatorn-vector ofof of secondmodel(T3 (T) bothsee accounts 2002, expression2.4):seethoseshown1,00 , , x 21x ) x 8 :8 iis second rowsrows igeneral(1,1 :diagsi( i,ss -1( xand an y isismatrixn-vectorof GREG’s, we includevariances1 Valliant 2notfor shown in (e.g.,used Valliant1,000 (1,ˆ X ) ,s V-1-1 V diag ( x )-1,,s, sand estimator thesecond of 8ˆ The(1,(x) Xx ))VX s VX(ythe GREGand y xthe n-vector the in 8 8 estimators 2002 var(e.g., is 24 8 rows( XˆsgeneralV,ssX sssstheVi )yy ,),,X is anunitismatrix with variability in thevarGR(e.g., we units snotthein the sample. 2.4): estimators GR ) units include the variability For 2 24 n=5021 population where where form) ˆ X V, sand y y sfor sthe ii(Särndal al, variability population units include the following shownover table variability not the sample. s i For units population not Xx ) i s is1 ( “g-weight” for nx2 matrix i rows (1,xx(iX,s Vss gi the1(“g-weight”s forisanunitn-vector et. population unitsininthe following iin itheini2following variance 3 / butForinbrief is the (Särndal GREG’s, units rows data.where X sdiagGR is “g-weight” isunitnx2n-vector ofet. al, variability 3 GREG’s, sample.notnot sample.shown the tables 4shownForth )( X i s y , x1821 :6 populatio For the inbri 0tn=5018 )21 10where(1,wheresVssissgis )isofthethe andgiXestimator (Särndal withinal, population For not populationN variancethe 2sample.ininvarianceshown arethet2 i) 21 i) 21x butin the gi 146 10 Thedata.,, V formTofi )ssss-1GREG is an nx2variability(e.g., sample(1, data.XVg ss-1iXsdiagXxi)V, and yyisss,,s estimatormatrixet.of in see Valliant in the inin theN we n in the2sample. the the tablesinn7,174.74( 14 10 shown variance shownare theTta ˆ 1421 sample xi ( s ss rows sample data. ss diagGR s , ss iy ssgi X the the n-vector of is n=50 where data. , i with -0.41% ˆ 18 T )) , sample , general estimators the ˆˆ ( x 618 For weFor2002, expression 2.4): following variance ( ˆ ) -0.41% forfor the GREG’s,seeˆ see Valliant 2the 2 include following 2.4): estimators the var theValliant 2002,the g following 2.4): sampledata. V (1, x 1992). we include we include variancefollowingFor 8 estimators GREG’s, we the 2002, ri7,174.74, 3 the (e.g., var For the(e.g., following variance2 GREG’s, following variance 1992). For/ 4over the the were 21n=50 18 n=500 The xxss),theVss ( xsTGR ( and),iGREG,sestimator n-vector the GREG’s,For the GREG’s,)we 1 includei expressionFor variance 31 4 Forpopula For 18 rows data.i , diag diag (the ,i ssgi yi s s is n-vector of of rows g 1992). “g-weight” y is the the sample (1,V 18 ot n=100 rowssample)(1,generalformdiagthe iGREGisestimatoris is al, For of GREG’s, For include2theTGRweninclude the2sri2i2irexpression theTGRestimators Forthe3 248 /3 Npopulatio 7,174.74, i sthe 24 n=100 n=500 Thedata.) general ofofthe) GREG yiy(Särndal et. is where general Vssformof x thesunit estimator n-vector g 2 i2 . i iis n=10021 24 n=500iTheigeneralformiformxthefor andestimator isis estimators (e.g., see Valliantsee rˆ)3 ( see Valliant2.4): i expression 2.4):all 21 7,174.74, 24 The(1,is the “g-weight” forand i (Särndal et. al, ˆ Valliant 2002, expression. 2.4): 824 estimators varvargi(T2see1 1 n n 2.4): expression 2.4): 18 samplerows gThe These estimators, combinedis with estimators of working n(e.g.,2002,iGR )Valliant 2002,i2gi gexpression 2.4): sample i general formestimatorsGREG estimator the two ˆ see estimators(2002, expression2002, ri estimators 4 (e.g., (e.g., ( (x ) ) theall GREG samp TGR where generalTheseˆof thefor unit estimator withal,the two(Tworking 10 r Valliant(e.g., expression ni 2002, x combined et. 18 246 4ˆ data. i data. “g-weight”of GREG, (Särndal with two (e.g., ) estimatorsvar (T see Valliant Ns2i g2 (12 ih i )over the 1,0007,174.74,x samples 24 where gdata. ) ,TheseTGRoftheGREG yi estimatoris n-vector working 1 The is ithe Vss ˆT formestimatorsand of is The generalform diag ( xi ) gunit ,y s -0.41% f 21 610 all we var3 GR 7,174.74, thea 7,174.74,all over since all1 the were ˆ 2 1 N1N 2i s is 22 2(1 ii h7,174.74, all samplesover 1,000 ˆGR ˆ GREGcombinedthe the 7,174.74, allestimators mode estimatorsthe 18 2 18 0 The sample 2 0 6 s GR 2 246 6 0 sample general form TGRthe valuegyiyi,g, y and estimates of 7,174.74, esti 3 var ) 1992). models and Ttrue value iiof 10 ofGREG estimator ,is is gy estimator ii ) ˆGRT s of i and estimates working sam r working mo n the 1,0 lead to Nn ( ˆ i varg 2g(Ti)n ) n n N r gi s 2 2 -0.41% for ˆ over the 1,0001,00 lead models andˆtrueGRi i siisgigigiiys,i, i and estimates of of leadˆ to to var nTvar) ˆT(ii231ii2 rGR 1 1N n gini gi2 irigii ri i2 gi2 ri2 n=500 true 24 n=500 1992). models formT of value of estimator is The general formandtheof the GREG estimator is r ˆˆ 68 sample general T over ˆ xusing over the (TGR samplesx(since sa 1 ˆ the for T 24 26 0 The data. “g-weight” for GREG (Särndal et. ( x over 1,000t . 1,000x)ˆ / 2 ,-0.41% sincefo 0 Theseis the of TˆGRcombinedyiiwith the GR T 3 GR ) ˆ 1 818 2 8224 01992).Thegeneral “g-weight” forgsunit(Särndal et. al,workingvarT(T ) ) 1 same variances2N ) N1 i 2 si s i2allall over the designs, samples)theandˆ var ( ˆ The 1 3 var (3 (GRTGRTGR ii s i i iunit i 60 i g i the ˆ ˆGR samplevar41,000 over: -0.41%the 8 i var4 (2 werei used for s s ) where ggisgeneral formGR theifors forwith ithe two al, Relbias ) ˆ / where The isisestimatorsˆcombinedunit i(Särndaltwo al, al, ˆ 3 iy , 6 n=500 8 whereThesenineis“g-weight”the iss modeliestimator weworking ˆˆ 3GRGR Thensame3i variances were used i2for2 2i2sample designs, ˆ ˆ /Relbiasˆ andsRM i 8 i 810 1006 24 2 2 hii ) form iof where is the T“g-weight”iforandiyyi , (3.1),we et.al,lead T (1,1 except for s g22 ri githe totals. ofFory ,unitunit (3.1), ofis al, the “g-weight” model the et. have (Särndal where igiandestimators Tcombinedgwith(Särndalwehave T toT(1,1 ::xx, )),, N N the ˆ ppstrat .Nn N nNs g estimatorsT ˆForgfor unit estimatestwo working ˆ the“g-weight” model i (Särndal et. et. ˆ / 2-0.41% for2 ˆ i Theseiisnine “g-weight”For GREG,(3.1), (Särndalhave (1,1T: x ) ) except for thei i ppstrat design-basedg(1-0.41%forforT-0.41%,ˆ xNallxTforusi 2 -0.41% ˆ ( x T gi2irivariances samples.T ( x-0.41% ) ˆ strate 0 1 0 10 1 16 10 0 where ginine truetotals.GR s fori iuniti iunit i totals. models g ˆ design-based 2 22 ivariancesforthe( x though: xdiffere (ri. -0.41% forˆthe For for sample 810 )usin where i istrue GR the value GR x ). where g andthe “g-weight” for andi estimates of al, lead al., ( GR the valueˆof “g-weight”unit ii i(Särndal et. al, et var4 for (Särndal et. 1 , x x ()working( : x for sam ( though) diffe T ˆ 00 var4 (var4 (TGR 1 1992). and true value 2 i s TGR ) 2 to models 2 8n=500 1992). 10 i s r 22 2 whereis i is“g-weight” for unit is s gi y , et. al, 108 1992). The used Nn lead to TGR and estimates of 38 6 The( variancesr)2 n 1 n where isg (12 h same variances (versussamp 3 0 i 1992). ˆ 1992). 3n=500 0models the estimators combined(Särndal ˆˆ two ˆ working 0 g ˆˆ n=500 where0gTheseˆ isˆ(1,1 :ˆ“g-weight” forxunit andˆthe (1,1 et.al,.. The estimators and 4 nTvar) ˆ4iˆ(variancesnwerewhereiigrsuccessive model ofx()(wereall GRE ˆ estimators, successivex pairs of 100n=500 1992). gg TisestimatorsTof (1,1for ,unitiii(Särndaltwoˆx.al, The varT(T ) )HTvar GREG GRGRg)i2iii)were usedi iig(1r 2forworking pairs( (2.1)using)sample i s with (Särndalet. ˆ ˆ samples. fo .( . ) )samples. workingthe GR ii GREG n ) allmod ˆ HT 1 samesame T ) estimators, i isi s for.iiall (sample samples.using Relbias a The 1 andˆGR(4T T (1 i rih1 1 1.Ni. s N2 g used2 2hri ).alliix)sample designs, xForsamples designs, For)workingstr the strat x x mode 00 n=500 ( 1 08 n=500 1992). i the :xx, ˆcombined), and T 1992). totals.(1,1 For)),ˆ,model x(3.1),,with Tthe twoT working estimators where T estimators ˆGR where )The estimators 10n=500 var ˆ m ) var var ) nine These the “g-weight” have xestimators (1,1 (1,1 2 1992).TheseT estimatorsTGR (1,1:: )x )andwe the (1,1::ˆx(1,1 :working sampleNunitsi forswerehsample iidesign-basedwerefor the for three though C GR except were2 grouped, s varianceshii )variancesRMSE values 10 combined with have x ) working ( working 0 These 0 0 0010 1992).0 Thesei(1,1 :the) Tmodel:combined iTGRhave:twoworking) ,, 4 4 GRGR werefor theifor theNppstrat designs,(1 exceptcalculatedppstrat design-ba nineThesei estimatorsGRcombined with with two (1,1 x (3.1), with the we GR 0 same used variances) were 2hii for iiallworking designs,working R all )design-based h ) d i The same 4 s GR2 (1 (1 grouped,(1 for i)all weresample the (2.1) Relbias(2. 00 where These For combined with the the working g for we the (1 n=500 0 0 1992).totals.istrue “g-weight”(3.1),estimates ofofTtwo: :xxto),variances sample unitsppstrathiiNN N variances iihsample calculated thethree andmo and measure working an Relbias wher i i except N variances were usedused(1 variances designs,HT(2.1)modelmo for model working mode modelstotals.true/Forvaluecombinedunitˆ ˆ (SärndalTet.lead to and truevalue ofof and estimates ofThe al,2 0 versus m modelsandˆˆestimators combinedˆ ˆ/ 2and the two workingto and / 2 value of ˆˆ /andˆ estimates two of2lead lead working modelestimators, measu 500n=500 0nineTheseandand/22truemodelˆ ofˆand/2with ˆ estimatesˆ (1,1lead) to toppstratThe same GREG iiestimators,i where)Relbias GREGworkingversusmod 1992). models ˆˆ T ( xˆtrue xvalue , , and 2 0 10 0 (1,1 xT ) x ( x , x(1,1x:: :xvalue x T , x (1,1 xxˆ,ˆˆ))),. Thexestimatorsx )),,and The variances ppstrat werefor strata all though designs, ofare using dras 10 estimates TT / /2 ,x models and T ,value ) T and ,the ) ˆ ˆ ofˆ ((x ,/lead totheThewithindesign-based ppstrat useddesigns, sample for for theRelbiasRMSE 0 ˆ ˆ ˆ xx 1 010 0 models:estimators,valuex,ofof (TT((xand, estimatesTworkingx ,samex,variancesexceptvariancesvariances usedstrataall successive pairs Comparingand andwereeach variancesused and forfor variances designs, ComparingR n=500 and same and theallwere used forall sample designs, RMSE values the except to sameHT used stratum, where allthe HTsampleofandRelbias and anddi designs, Tˆ x x T modelsestimators ) variances the The were used for all design-based same variances sample for sample Relbiaswere ˆ ( , true : except sameestimators, sample successivevariancessample valuesdiffer the were were design-basedRelbias and were Relbias and C : not RM These Theseestimators ))ofandwith(1,1x:: :xˆtwo,The(Thexlead:xx: :varianceswithinforeachforstratum,and designs, variances differencesthough andand combined and estimates ofofGRThe for ) str 00 modelsxˆ ) estimators combined with the GR GR working Relbias 00 same lead ˆ1992). These, truevalue ,combined and samplefor T(1,1 and T GR(1,1 ) combined with ). . T ˆ0(1,1: : and ˆ GR(1,1 xxmodel T ˆGR with the two ˆ working ˆ GR 0 0 n=500 0TnineThese, Testimatorsofand and (1,1 :wehaveoftwo (1,1x: xto) , exceptcumulated.the ppstrat grouped, varianceswere pairswere grouped, varia modelstotals.trueFor: :modeland estimates )havetwo(1,1 : x )GREG The GREG Since successive pairs variances forunits RMSE though me 000 were 000 models0 andxtotals.GRFor ofmodel (3.1), estimatesTheTTestimators theestimators, GREG estimators, wheremodels pairs theforthe estimator result ) (1,1working :ˆ three diff (3.1), we the HT lead ,: have ˆestimators ˆTexcept cumulated.where ppstrat working of sample specified thethough GRE exceptthe GREG working where successivecalculated areRelbias,d exceptand the variancesfor of HT and for ppstrat were design-basedvariances specified nine totals. ˆˆ estimators,2of ˆ areand we x we2(3.2).T (1,1ˆ(1,1is, the ninth. HTforfor unitsbothvariances for successivethedifferences theare notRM GR (3.1), though using the ˆˆ ˆ 00 the str 000 0 nine xninexˆandvalue/2valuexofare:(3.1),modelof(3.2).twoTand for) xthe, ninth.forweretheSince design-basedmodels calculateddifferencesRelbias, dra ˆ for 2 ppstrat variances were were totals. nine//2These(trueˆFor ( model areand estimatesofexceptis and, ) except design-based both design-basedvariances eachthethe thoughanddiffer (3.1), T though GREG not ˆ ( totals. x ˆ /For value and xand modelthe lead working ˆ model models,ˆtrueTGR()(x,xFor,ˆxxˆxmodel (3.1), estimateshave ˆ )leadxisfor wereppstrat design-based were calculatedstrata variances stratum, difference modelsandGR truex2,ˆT model ˆ (3.1), we ˆ (have xof:Tx to:xx )to ninth.grouped, variances variances and the within for withinusingCompari modelstotals.xtrue,For:For/)2xˆ,combined estimates(3.2).ˆ(1,1lead)thethe ppstratGREG grouped,estimators,successive thoughof pairs ofthoughdiffere and ,2 ˆ/value:/ of ˆfor,for we xhave sample : units HT units within estimators,design-based the using lead to , we //2 , ˆ (1,1T ),to ,ˆ T : model )for with have T ˆ T )x T 0 1 0100 T T : ˆ ˆ nine 0 each successive pairs 100 0x nine 2 totals. ˆ0 totals. ˆ ForT model) where pairs over samplewhere thestratum, variances three measures werethethe resul HT all unitswhere estimators,where of formulaethepairsof using the GRE HTandGREGthe grouped, where successivethe and and three GR sampleand GREGwere grouped, variancesusing pairsof over DGREGG over strata,were model variance successive var1GREGthree measur model T(models :and ˆ (1,1T value xˆ : xˆ ))ˆ T:estimates estimators, to :x ˆ estimators Tx using and over C and , estimators, units successive pairs formulae pairsand bothusing over were GREG estimator mea the m 00 00 0 0ˆˆˆT(x(1,1x:,xxˆ ˆˆ,GRT ˆˆ)),trueˆˆ(x:xˆx/ 2 )and:(3.1), (1,1GR(xhave x of(1,1 andandGREG and andall GREGestimators, where successivecalculated ofestimatortheres nine /:totals.,xTFor(1,1(xx ) ), ,,,xandxTTT,(1,1have)./)2.ˆ,x :TxxHT:,leadGREG estimators, strata, estimators, variance of were var1calculatedusing D equiv we:x )xˆˆ .The HT), ,: xand ˆˆGR( ˆ , ˆ :estimators 0 and ) ˆ TTtotals.xx ,Forˆ model , of GRˆ (1,1 :: T(1,1 ˆ ) using and ˆwe ˆ GRavailable in ) estimators HT cumulated. Since both working models The estimators TGRthat (1,1 : true , and T have : T (1,1 : ˆx)any and situation; stratum, and cumulated. specified working e The any real situation;sample units were grouped, variancesvariances were 0 0 00 ninenine T:,(1,1) :)Noteˆ(1,1thethe(3.1),(3.1),not:weˆ haveinTheanyreal ,situation; units eachstratastrata variances were were wereSince (1,1 each ˆ ˆ 00 ˆ, T , . T that true GR we x The in is real ˆ were the (1,1 for(3.1), we(1,1 ˆ . the grouped, strata within were variancessamples variancescumulated. Since three tomeas stratum, variances strata calculated three overtheany eq and selected using ppstratwere lead measur nine totals.GRthatmodel),x andisGR(1,1available) isestimatorsunitsstratum, wereunits grouped,were variances wereusing measureslead to theequi nine /22Noteˆ)xGR(1,1 :the) and (3.1),GR xxˆx) .havewithin : x ) ), (1,1 units werevar and usedwere were and selectedwere calculated Comparingare im ˆ (1,1 ˆ xˆx ˆT ˆ, For model Tnotnot : sample o ˆT(1,1 ˆ: totals.GR(1,1aremodel) ˆ ˆ GR(1,1 :(1,1))T The TT (1,1ninth. were grouped, variancesvariancescalculated Comparing strategies, there any within variances three Relbi 00 000 0 TTˆGR (:x:ˆx/xˆx),),ˆ),Note)TGR :xx/xtrueand TT is (3.2).availablesample:xeach samplesample were usedgrouped,calculated overthreemeasures over Comparin sample units each forfor samples were three calculated modelmeasure calculated var grouped, were stratum, 0 (1,1 T: x For ) , model (3.2). .The estimators ppstrat measures st ˆnine : 2 is estimators 000 B’s numbers TTTGR(xx/x,//x2,ˆˆestimatorsˆ:Tx(ˆ xforandx:ˆTGR, (1,1ˆ::.ˆxTheˆˆ//estimatorsninth.Since both working models were specified over all strata, the Comparing C The: the)as and B’s 0 00 T areare the totals.GR, arecomputedxGR: (3.2). x () /Tserve2xthe,(1,1andcomparison2 each eachstrata theworking variancethe allwerespecified pilotCIˆ ’s in numbers T:((1,1 the ):/xestimators for2 ,Tˆx:/ ˆ2 (1,1GRusing ˆ / 2 xservexˆ as aax each stratum,2 andstratum, andmodel strata variances varthe threeRelbias, ov TGR ˆ : x 2ˆ2/ 2 ˆmodel ˆT ˆˆ T , ninth. s0 numbersˆ (1,1ˆˆˆˆGRxˆ/()2:ˆ2 /,xT,xxˆ,2x(1,1,ˆ),)) Tareand/,xxˆmodelˆusingT weˆxˆ.2TThe:/cumulated. comparison cumulated.stratum,bothand strata modelsRelbias,were andandand cove both over all models were strata were variancesstrata, Comparing varianc working strata variances strata formulaewere1 the specified D 0 0 are( ˆ, 2 x ,ˆ : :xˆ)x) :(1,1)(,(ˆx ˆ)/ˆmodelx )(3.1),T ( ( xˆ have:estimators ) , within within and strata, variances models were specified pilot ˆ ’s in R ˆthe 2 ˆ computed ˆˆ) variances were RMSE, Comparings x ,, ˆ ˆ ˆˆ), : xGR x For computed) ,xusing , ,2xis within :comparison ) ) , each within stratum, variances cumulated. Since stratum, and were Comparing strategies,the there both stratum, ineach Since theand strateg x within and 0 x 2 x, GR estimators is( not: ,availableGR Tany,Thexas : a ) and 00 TTˆ ( x/T 2that ),:,xˆ ˆ)true ,T (ˆx/ˆT 2,x,andxxˆˆ : x )GRˆ in( x 2 2The ,:estimators and within each estimating theworking of the BLUP. were Comparing str (ˆ (1,1(:xx ) the : x )T ˆ) ) ˆGR : ˆ TT ,(1,1 : :xx ) ) . x,servesituation; Comparing strategies, Relbia there ar ˆ , , ˆ ˆ, x , , (1,1 T xˆ Note sampling in variance of . realxx x and / sampling estimating working and the BLUP. used ˆ (1,1 0D, ˆ ˆTT , GR C0and0000 the numbers,xxx:ˆtheTˆTtrueˆˆT :x:xis2/ notand ) ),)GR (1,1GRxany,( ,real:xxestimatorsSincecumulated. Since formulae variancemodelsspecifiedvarppstrat theleadCIsma and 0D, 0 numbers2/that theGRˆ (1,1ˆ ( x/x , not:availableTTin any realsituation; ),GR nd D,00thethe numbers/ 2x ˆ:standard(for 2theˆ, other, choices.xGR .xxx:over),),alland cumulated. variancethebothworking1 modelsvar2the specified RMSE, andoverov and TT xx , ( (ˆ that xx ) trueT ˆ is , x :availableGR ( /ˆ x xx GR TGR strata, modelworkingbothformulae models werethe Relbias, C, forand Relbias,no modelSinceSince the model specified wereD usingand using Dtoco Note : situation; ) cumulated.were working variance selectedwere1 used soA’s, andRMSE cumulated.strata, modelworkingand formulae models were var RMSE, samples se were specified the that over 1 the Relbias, isR all strata,models var2all Since bothwere cumulated. theoverovervariance used were varvariance and werespecifiedfor andRelbias, D that C ˆ /in estimators and ˆ T (1,1ˆ 2 T are : for andchoices. (1,1 for : /using ˆ )ˆ ˆ TGR ( ˆ isˆ cumulated. other x / ˆ( x / standard (1,1 the ) :otherˆ (3.2). / are ers 000 the T estimators/standard,xxfor,2x2ˆ model (3.2).2(,:x Tas2x2 a iscomparison Since both working modelsfor samples formulae Relbias,the Relbias,the cove are T ( Note/(ˆ /ˆ,/2ˆ2,: ) ,: ˆx ˆ:,computedˆx/ˆ theˆmodel choices./ 2),ˆ The the ninth. ˆˆ :, xthe ninth. ) 0 0000 the x Tˆ2ˆ(Txx ( x:/ˆx2xx2/,xx,ˆx:)ˆ2x)xˆ,)(ˆGRˆ T2areˆforxmodelTˆTGRservexxˆT/T/ˆxisT the , ninth.ninth. bothstrata, strata, both model specified formulaevar andand A’s, CIis ˆRM ˆ ˆ( :Tx )ˆ )are ˆforfor ) model (3.2). ,ˆ and 00 , TGRx T x ˆˆ GRˆ ers are0 the rounded ˆx/ˆ2 ,xx ˆ: :xcomputedx using:xx ), ,(serve xxˆ:T , ,xxis :xoverninth.used samplesall variancemodelppstrat samplingthevarvar1 improvements. the’s pilot is were (3.2). the ninth. all sampling inthe formulaevarformulae using inover var ˆ((x ˆ) overall selected using variance and and to over and sample varianc over strata, for all the model variance the ,alland T D the the an ˆ veˆ ’sare were rounded,/ 2 ,x,xxˆ:computedfor , model)GR(3.2).( ( asˆ aa)iscomparisonstrata, over model werethethe variancevar1and samplingDppstratC, andoverDC, WC ve ˆˆ’s0 were areGRx(xGR ( x ˆ :x:xx),x,:)are( ( are 2,xfor modelTT(3.2). iscomparison overthe 2Simulationmodel samples variancelead in BLUP.C, sososampleanysm ’s rounded all rs numbers estimators , x ) ) xarex for model (3.2). T / 2 :the) ),ninth. x were x2 ninth. for var strata,Results ppstrat formulae 1 1and strata, thevar samples usedestimatingsamples formulae 1 andestimatingandtodesignis model variance formulae variance selected D any ppstratandleadoverC using over ˆ GR using to model (3.2). ˆ the the and desi he 000 0 Testimators ˆ) ˆ , TTˆforˆ model ˆ (3.2).ˆ any is overninth. using 4. Simulation selected usingthe 1selected ofand estimatingleadDDover so used ( 4. were Results 0 numbers TˆT/GR (xˆˆ/ /2 xforxtrueareTisfornot:ˆ available GR asT realvarsituation; 000 2 ˆT T( that ˆ the the not, : serve any ,real situation; GRˆ he numbers Note(xthe2,,,xthexxtrueotherxis/ Estimation TTGRanythex thesituation; var thevar were of thefor forsamplesselectedlead usingppstrat be leadtotoanyW standard that )ˆtruetrue isnotxavailableGRinisx realsituation; ) for ( choices.x ) , that)3.6:the modelavailable in inany any :real) , and the Variance notavailable ˆin (Tin real situation; ˆNotethat the true x situation; 2 that : : ˆ ’sany Cleadleadimproveme to any impro Note the 3.6: lead A’s, in im pilottoˆ ’s that ˆ e 0 (3.2). were samplesselected samples selected lead GR var2 were forof the theusing ppstrat pilotppstratany were using not Note available n parentheses( Notex( xˆ :/ˆ 2x3.6::the)Variance Estimation in any realthe ninth. estimating sampling in estimating variance of theusingtoto in improvements.favo are that T bepilot ˆ mostany parenthesesGRarestandard2forthextrueareisEstimation (3.2). any ˆrealninth.2ninth. used2varforsamples used forforBLUP. ppstratusingusingppstrat offers mostany ’sC the samplingwere the variance samples the var in 2 used used BLUP. selected sampling in usedselectedusing selected the BLUP. 0 parentheses xare , (Note ,forareˆotherischoices. available any realsituation; usedforvarianceestimatingsamples variance of BLUP. ppstratimprovements. fa ˆT that ˆ for available is situation; the standard/ , thex trueare for ismodel in s are00the estimators2xxˆ ˆˆ Variancechoices. serveinas TT is var2 were 0 the Note x thatthe :theˆ)other isnotmodel (3.2). ˆa comparison not available ere rounded Note 2 true ers are the TGR ers rounded for CGRandˆ /Forcomputed estimator,the in any estimator is:is: Simulation C pilot improvem pilotA’s, that inoff d pilot in ˆthatn using ere are are estimatorsFor: xthe HT using theserve as realcomparison umbers used theestimatorscomputed isestimator, invariance Taascomparisonin estimating the inin estimating the variance ofA’s, thatˆ is in offers’s sample C true) bersrounded for thatestimatorsthe HT usingavailable varianceaestimatorninth. sampling 4.Estimates the variance of the BLUP.BLUP.innotCthe ˆpilot ’s’s’sbaC are for and( x offers ˆimprove serve as isa the computed using ˆ pilot samplingCis in for (3.2). unding the Noteand the computed available serve as ˆ comparison C samplingstra 4.1. the invariance ding are used C3.6: that truetheEstimation using the varianceaestimator is: estimating in Simulation Results variance of 4. the ˆ ’s ’s Resultscase ˆinbased re used the estimatorsFor HT estimator, Estimates 4.1. estimating sampling estimating BLUP. sampling estimating BLUP. the pilot BLUP. TGR comparison sampling variance of the the A’s, in computed not serve as comparison any any comparison ers are the estimators, xthe true are usingmodel serve asas realsituation;in rs numbers estimators computed isusing serve inany comparison sampling ending the Variance computed thenumbers NoteVariance Estimationnotavailable serve aaarealsampling esesthe 0the estimators the true notisusing are Note 3.6: the the other not situation; Results Simulation of the variance of the BLUP.designs, results be thatthe Notethatforcomputed choices. n n that theis situation; 2 22 he numbers standard forcomputed choices.availableinreal asituation; rs are numbersstandard for the otherusing sample from is D,than inthe Astandard for the other choices. serve as the numbers A3.6: Variance Estimation numbers estimators A’s,A’s, sampleisde thatthat ininno is is th A’s,suboptims A’s, most that sample notstrn other is1 choices. n in any/comparison yy / / . WeSimulation Results 4. 11/ /nn ses the the andNote that thevarthe)othernot//N are in standard for forˆ (other choices. choices. as a y /yy is: 1Simulation  calculated theResultsover eachA’s,set that 1,000 theiscase design A’s, ofis is is suboptimal 4. calculated the Results ˆ re than estimatorsB, tes not ates are and and standardestimator, the /n serve estimator i / i i / n 4. Simulation average ses forare the For the HT var0 (using 1n n N B, computed TT he numbers of e than the and B, standard thetrueˆ ) each that1,000 not the case N (T )ˆ ersroundedA For the for estimator,1the varianceserve as i i real situation; bersrounded standard HT vartheotherchoices.availablei s i i scomparison i s iyis s/ i ii i i .. We4.1. Estimatesaverage ˆ over 4.1.set Estimates ed numbers C in s i e for C and i sampling 0 0 ere are improvement. be sample favorable.proportional most For comparison sample designs, th sample from t most thes 1 dforrounded standard for computed using 1 results most Simulation Results be desig proportional re overallimprovement. HT estimator,choices. n nserve as ais:4.4. SimulationSimulation Results both populations. Results are designs, besample desit sere rounded estimators the other choices. overall rounded estimators computed using sample designs mbers improvement. erallrounded For the for the other the nvariance1estimator a comparison Results Simulation Results bersnumbers estimators computed variance estimator a Simulation 4.4.4. are the Simulation from drewere and standardthe other choices.using n serve as is: comparison 4. Results drawnResults both populations. sample designs, results fromfav C standard for 2 ereA are the 3.6: Variance Estimation n expression assumes y / Estimates samples drawn from he are Results only onmost ismost fav the numbers 3.6: Variance Estimation re rounded n rounded Variance Estimation and ˆ esnegativeB,ˆ ’sstandardˆˆfor the other choices. be reasonablybase favorab sampling based1,000 xmostthe the be in For eff We eachmostfavorable.be mostˆfavo calculatedfavorable.samplingth the average subopt Estimates 2 This n / N n 1 variance var0 This thevarianceexpression )ThisEstimation expression assumes i with.. replacement 4.1. 1 n sampling ma eses are theB,ˆ’s3.6:A A (TVariance Estimation i s yi / assumes i 4.1.withreplacement samplescalculated the average ˆ overbebemostare only be reasonably fa replacement We Estimates standard )for 1variance choices. y / i 1 //n other 4.1. Estimates set of re negative For ov ses numbers 3.6: Variance n runded the than swith i 2 erA and are thanVariance Estimation 1 n A and negative the 3.6: the are ntheses B, eses are the 3.6: Variance1Estimation / summarized here. nses are and ’s than theAˆ ) estimator, but variance/ estimatori ipopulation correction summarized Estimation n but choices. 1/ mprovement. theFortheTTHTestimator, theusesi i s s iestimator is:is:yi4.1. correction4.1. Estimates standard sampling, n es rounded For Varianceestimator, the uses the ereforC and 3.6: var ( sampling,//N but 1usestheiestimator spopulation correction samples over from both wereareCthe For var00(HT sampling,Nthe varianceythefiniten population.Estimates 4.1.Estimateshere. each set of populations.sampling are onlytheproportio for rounded 3.6: Variance the other n the variance estimatori is:i i sampling based ba the the HT estimator, variance estimator is: / is set of 1,000 onsampling the sincebothxpopulati variance finite sy Estimates 4.1. Estimates edforCforandandForForHTforEstimation variance thei i finite is:We calculated4.1. We calculated the average ˆ 1,000 each suboptimal fromis populations, basedsampling the on onsampling m suboptimal drawn Estimates When ˆ drawn/,4 average B B samples setmore populations, th inin bas dusedunits,as opposed based mprovement.opposed the HT estimator,nthe es for and For 500the C 1 500 than ed units, as Variance Estimation n variance estimator is: replacement the ˆ and sampling based D had 00ˆforCCandas For the HTestimator, then each had provement. and Results 1,000 When 3 / 4 This expression were units,3.6:opposedHTadjustment the nn//N assumes with is:4.1. 2 forforwor averagecalculated3 thestrategies overoverproportionalmorex. Nonetheles We calculated the strategies over each set of1,000 isxis subopti re ’s areB, dfor rounded 3.6: Variance Estimationvariance approximatelyaccount wor C the This Variance expression to to approximately account dses’sareandA ForvarthevarianceEstimationvarianceestimator is:account2.2 . for wor calculated the average, average each setisofDsuboptimal suboptimal sinc adjustment n 3.6: (variance estimator, n to / estimator samples2drawn reasonab is proportional tosinceuniformly adjustment/ 1 11 / N assumes / 1 estimator is: with y yWe calculated Weaverage esesandand dependingvariance nexpression theapproximately replacement2 We theWecalculated over average areˆover each is setof of 1,000 isissuboptimal w suboptim here. uniformly wo AC.and the depending(ˆthe estimator,thenn N assumes 1with i s replacement from average ˆ ˆover each ˆ set ˆofonly summarized1,000 sincesuboptimal the varian the (ˆ inˆ’sAlso, A ThisForTTˆ) ˆbut nnN N the si variancen/ A Also, and For B, summarized here. set of over each suboptimal over over A the bothboth 1,000 Results are are average A and of the dfor thandepending0 T )ˆ HT the variance syy i i i is: nCC. than B, the3.6: Variance Estimation iestimator /1 / nn 1i/ in syi/correction. ndAinCA and B,sampling,)estimator,N / N ifiniteiy/ //population We// icalculated the nearly drawn from Results ˆ than1,000eachsetof toare only the varianc Also, .. bothcalculatedtheaverage populations. Cset 1,000to the populations.estimates populations. C due onlyonly proportion each populations. andResults1,000 var0HT 111 /uses nn C. and the i i i i ˆ samples drawn estimates We samples drawn / var / HT0 T anAopposed For varestimator, 1 / n Since the samplingi fractions yi2 2 ismall, the calculated the from than in A and B, due Results A reasonably proportional to x iy ) s esesand and Forthe (HT T estimator, nN populatio ,provement. sampling,0)(but1 nn /N nthevarianceyiestimator issummarized the nearly unbiased from both as times B, but uses i 3 / strategies B , strategies When rovement.as A opposed Since finitei estimator sampling /1 fractions s 2 /. small, dthreeareB, edforandand many00(var sampling. n thethesyyii/ spopulation s yii/areiare small, samples unbiased populations. Resultsand only only toproportional Bt proportional As expe mprovement.as sampling,HTestimator,nthe111finite sampling/nn iis:yscorrection drawnthe samplespopulations. /both strategies BResultsD areefficient.proportiona , Nonethe 1 varthesampling./Nn approximately i 11 n fractions are. (T ˆ three times tnasopposed asmany0(TˆT ) )sampling.N Sincevariancepopulationiis:iycorrection samplesboth pilot here. from Results are the Resultstwo.hadThe 4reasonably an times reasonably reeforC B, from are are n s samples drawn both 4populations.latter proportional morestrategies ef summarized here. A for C and This variance1 isusesto n 1 1assumesi i account for wor. andC mprovement. many varthe)HT1estimator,then1variance/ 1985,/ sec.ssfor/i/samples here. samples drawnsample 3 sizes populations. proportional onlyx. x. Nonethele fromdrawn pilotfrom Results in inonly bothWhen here. smaller populations.both sizesaretheonly Results Theto two. samples n NN tonapproximatelynaccount y2.4.5). n var variancen/ /negligible ii iassumeswith i2.4.5).ii i / 1 sec. replacement smaller drawnsample For0 ˆ bias1/ nexpression y(Wolter 1985, ysec.2.4.5). the /s1s yi 1985, /with/replacement asimprovement.adjustmentn isNis/negligiblen(Wolter /iestimator is:2 .wor2 2 drawn from summarizedfrom both populations. latter are only RMSE’s efficie summarized and’sand B, This ) variance expression B, i o,ˆ’sAthanAB, adjustmentbiasn / expression iassumesaccount for replacement 3 / 4 , strategies B and D had more nearly unbiasedefficient.uniformlt depending mprovement. varadjustmentbias negligiblen(Wolter assumess iwith wor . . provement.A 0This variance Nexpression assumes with replacement summarized here. ˆed’sthan A (3.1). T Thisˆ ˆ variance to n i 1 1 ) )1 11 nn /Nexpression i yy / / 1with ireplacement (ˆ 1 populations, efficient. For than e theestimatesAstheeff when n depending A using inˆusing (3.1). This (TT andthan s i 1/ n reasonably As When rsus ˆthan (3.1). var ( population ersususing A Asampling. Sincen/the approximately i i with i replacement here.summarized inhere./ 43 /C C strategies andthe average ’smorereasonablythe nearly here.3 C more populations, ,depending summarizedunbiased, estimates D B B reasonably ˆ reasonablyFor exp summarized strategies and D made theand had due ˆ ’s the reasonablye here. strategies than A and D had to reasonably us D When provement. sampling, but uses Nthe nfinites fractions/ n replacement summarized here. rounding strategies are yyi/ / 2 tive than ’s eˆ ˆˆas than A This varvariance expression assumes with small,i ithe in average worse than strategies and 00 This var ( but uses nn assumes variance ement.and B, sampling,ˆ )but 1 uses1 samplings fractions 1are ismall, the. es’s’smany This sampling, but / N samplingfractions are replacement sampling, butexpression assumespopulation summarized the finite yipopulation scorrection population icorrection ’sthan A sampling. Since the uses 1thei i finite iwith n scorrection the sampling inasopposed sampling.T Since expression1finite ipopulation small,/ correction rounding When 4 ,3 / 4 4 4 ,strategiesmade uniformly moretheuniformlyinwo A thosethe sizes n y / mprovement. bias is 0negligible uses the 1985, ssec./2.4.5). nearlythe mprovement. sampling, asasopposed opposed variance uniformly strategies sizes Bmore DD had two.sample populations When When /pilotBandstrategies BA BDDpopulations,the The14,304.74, th the finite s ’s manyThis sampling, expression the finite with populationi When many smaller3 A /estimates hadtoand and hadDmore nearly populations,fo in latter more 3 / andDdue hadmoreAandpopulations, the RMSE’s wh ,,ˆ as than A adjustment but uses approximatelyipopulation i correction , C D B the and had C pilot to 14,304.74, t ,strategies andthe C more the RMSE’s the t B smaller hadto populations, when When unbiasedsample thanthan and and duehad more populations, R iWhen i s correction / 4nearly (Wolter 1 ,strategies estimates than 3 4 , estimates /34 ,Whenfrom nearly strategiesB true value, unbiased s (3.1). opposedadjustment but expressionfinitesec. account for wor unbiased 3further from theandstrategiessince ’s ’s close due and variance assumes population replacement When nits,opposed sampling, as For since close to in three- uniformlyw threeapproximately with replacement wor gasdepending sampling, 111nnnN(Woltern1985,assumesaccountforcorrection mprovement. adjustment 1 NNtotoapproximately2.4.5). forwor further to worse thanworse This negligibleuses the finite population for for variance uniformly the an B as’sopposed bias isvariance N expression assumes ˆ A as ˆopposed biasThis C dependingA Thisis negligible expression assumesaccount smaller pilot correction ehan’sthan sampling,adjustment/ 1// (Wolter approximatelywith replacement sampleunbiased the true value,CAand Aˆ CˆandC theduetotothe D.strategiescorre strategies to adjustment o,dependingA adjustment but n/usestoNthe 1985,sec. 2.4.5). correction wor nearlyestimates inpilot strategies sizesCin Athestrategies Bstrategiesusingthoseco rounding the latterCA (3.1).than sizes unbiased and than than themade due the pilot uniformly wo to replacenearly pilotestimates sizes due Adue rounding average using the those nearly estimates estimatesthan the andC uniformlyworse 2than and D m but but / N the sampling fractionsaccountnearly the the finite account nearly as depending sampling. variancentosampling populationare withwor theunbiasedsmaller unbiasedestimatestwo.in and and uniformly The Thestrategies BD opposed (3.1). than and to approximately account replacement wor adjustment Since uses approximately populationsmall,the Also, dependingsampling.Sincen/Nthetoapproximatelyaccount small,wor unbiased nearly unbiased A AsampleC thantoThethe due Cduetwo. totheˆuniformly theBf o,dependingA sampling. 1uses the / samplingassumes withfor correction 1but uses finite finite populationforcorrection smaller than estimates and toD toto thetwo.to the to ’s strategies a latteror or unbiased were in rounded down latterstrategiesupstrategies14,304.7 than due estimates than samplerounded This variance n ˆmany are small, population For the B andin terms For of easas many adjustment 1 Since expression fractions are for wor the the ’s sampling, Since theapproximately fractions small,the sampling, n /Since the sampling account are small, the B of pposed fourths were either sizes in downˆ to one-halfsmallerandtrue in Forand ant up the D. value, B l ,depending fourths pilotsample average ’s latter two. one-half to sampling fractions strategies esasopposed ment sampling, but uses the finite population correction smaller C andsamplesamplesizestwo. made latteraverage BThe strategies since sampling. to the sampling account are small, the strategies pilot pilotineither true value, since two. from ˆ The sampling. / Sinceapproximately fractions for wor smallerinpilot samplesizesinDstrategieslattertwo.in latter further close to ’s ˆ D. asasopposed bias is negligible Nthe sampling sec. 2.4.5). aresmall, the pilot sampleroundingfromthelatter andthe inthemade ’s two. Thethree- terms , s manyadjustment 1 The smaller pilot the the sizes and DThe ˆ two. smaller rounding depending times many sampling. n 11 nn(Wolterapproximatelyaccount rounding esas many sampling. Nbut (Wolter1985, fractions are smaller sampling, uses 1985, fractions the finite 2.4.5). smaller sizesin in strategiessizesD thethe rounding in strate-For Relbiasthe latter sample sizes made theC two. The the the average ’s2coverage the the biasis negligible theto approximatelypopulation wor sec. adjustment the the using v endingas manysampling.is Since/ /N(Wolter 1985,sec. sec. 2.4.5).forcorrection For For one. further samplelatter C in The sas many one. s,(3.1). populati For s,as many opposed adjustmentnegligible sampling1985, 2.4.5). bias is negligible g3.1).(3.1). adjustmentSincen(Wolterapproximately accountforrounding inrounding inCstrategies ˆC and C averagethe ’sthe average the’s’s coverage theclo depending bias isisnegligible (Wolter 1985, fractions small,further from the true value, ininD madeclose andthree-ˆ average were ˆthe largest For close o,(3.1). depending bias biasnegligible/N to approximatelyare small,wor wor strategies andsince ’s the and D ˆmade14,304.74,theˆ 14,304.74 roundingin strategies therounded madethefurtherˆ ’s either to 2 population rounding Dthe true CCaverage down toaveragethree-the14,304.74, the strategiesvalue,made ’s fourthsFor from strategies D D the to or the N (Wolter sec.are fractions 2.4.5). rounding the the rounded terms bias SinceSince (Wolter 1985,sec. 2.4.5).aresmall, the sing account for wor adjustment Sincethe tosampling fractions are small, g(3.1). sampling. isnegligible sampling1985,sec. 2.4.5).account forthe in strategiesWhenandwere2thethe andto Dmade ’s usingaverage ’sˆupworking mode (3.1). 1 to 1985, sec. 2.4.5). C fourths made average roundingfromD made the average ’s ’s wereone-halfto three-14,304.74,larg further and since ˆ in the corr manymany bias is negligible (Wolter gies further from , the true averageˆ wereˆ ’s closercorrect GREG thedown CWhen 2 either value,ˆ since close thetoto the largest Relbias sampling. sampling. thesampling fractions close the largestusing estim closer o, as , the usingRelbiasth (3.1). GREG 14,304.74, s 14,304.74, 14,304.74, estima es as many is sampling. (Wolterthe sampling fractions es depending negligible Since 1985, sec. 2.4.5). bias bias is negligible the sampling sec. 2.4.5).aresmall, the furtherfurther the sinceˆone-half or to ’s closeclose toto the 14,304.74, the v (3.1). since one. value, ˆ down 14,304.74, using Relbias the small, the thefurtherfromsincetrueˆ close sincethree-to to three- or lower the coverage rounded from the was not much difference were correct notable cor further from true fromwere There true close much one-halfto three- notable the(but true from either ’s was down ˆ ’s close up to bias is negligible (Wolter 1985, fractions . (3.1). further from the further value, theThere ’s roundedsince ˆdifference the eitherworking the co true values. were value, value, three- one.using between inworking of w value, the rounded to three-fourths the threetrue fourths since eitherclose to up to ˆ close betweenup using the mode value,down terms mes as many sampling. Since(Wolter 1985, sec. 2.4.5). arefourths were eitherfourthsvalues.truetotruevalue,notsinceto ’s’sin one-halfofthree-tousingcorrectno terms and true in (but g(3.1). using or correctthe usingtermsˆ When averagemo bias negligible (Wolter 1985, sec. 2.4.5). When tototo rounded or up ˆ one-half closer in tocompared corre 2 downdown to ’sone-half or to coverage clos fourthsfourthsdown ˆrounded thepilottoupto one.wereorA upto2 C termstermslowea were eithereitherrounded or down to coverageand C totheincoverageto or A bias is is negligible(Wolter, 1985, sec. 2.4.5). one. fourthswere ˆdown for one-half upone-half intermstoofofto,lowerRelbiastoan were either ,the down strategies termsand in terms of ot rounded or study strategies closer 95%;GREG l one.one. wereeitherone-half averagetoone-half uporup uplower compared are there of g (3.1). fourths were eitheraverage downfor the pilot rounded in Relbias termsof rounded fourths were either the average ’s ’s one-half study fourths therounded in true closerestimator GREG estima values. to with notable ( was closer not clos one.one. When ˆ ’s ,were the averagemuchGREGcloserbetween91.7%.there( When 2 , the average 2 There was not theˆ ’s coverageto closer91.7%.modelto closer one. trueWhen one. values. the averagetoˆ ’s were coverage TherecoverageThe cl difference the thecoverage muc coverageGREG es one. coverage m one. 2 were rounding The versus the no-pilot strategies B B and the average closer toto 95%; close The versus the no-pilot ,strategies and D.D. The rounding 95%; there a pilot stud to When theaveragemuch difference estimator with estim the not strategies GREG gains esti The following is the basic model variance estimate There theWhen much’sdifference ˆbetweenthewere UsingtoAthefornotable estima true values. was did When 2 alsothe When ,makethethe averagedifference.closertoestimatorwithcompared 2 ,alsoaverage the There average ˆwere closer closer to’sthe GREG gains (b average ˆ , the closer ˆ ’s notable (butˆ not drastic) dropach GREG When ,When not 2 There’s forwas of to’sthewerecloser the betweenGREG (butin trueWhen not ˆ makewereaverageatodifference.closerthethe theGREGmode values. 2 , ,averagethe ’s ˆ ’swereGREGbetweenthe theestimator a was of pilot notable true average2 2were closera much were Usingtoand Cthe modeln did values. ˆ ’s muchnot not studydifference the much 391.7%. / (but (bu B Totalvalues. There’s not pilot difference D. the to the notable (but a true values. ThereEstimatesmuch between notable (butdrop the estimate 4.2: the theand Variancewas not notstudy(3.2) notablein no-pilot notable 4 not dn average 4.2: values.values.much Estimatesbetween andresulted(but and corresponding the notable rounding notable popu for the The following is the basic model variancetrue values.truetruewasandnotthe’sTherewasnotthanbetween comparedand’s thestrategiesdropT BLUP estimators: true not versus (3.1) the much compared was There no-pilot theA (3.2) versus for correct and study for Estimates and C difference A C The following is the basic model variance estimate ˆ ’s forTotalpilotVariance differencethan difference between notcompared toin Total modelˆ There Estimates much resulted between drastic) ˆ model ˆstrategies rather The following is is the basic model variance estimate 4.2:There averageVariancewas strategies study strategiesbetweendrastic)to theno estimate Total andaverage was rathernot much differenceA (but The following the basic model variancetrue values. Therecorrect Variance difference pilot Bstrategies between ’s C correspondin 4.2: truevalues. much(3.1) was much difference Theinˆ not comparedthe /The m alsocomparedversusstriking4differt makefor the /3ofgain much combin for the BLUP estimators: Ouraverage ˆfocusandhowthe pilot Astrategies Adid AAUsingCthe comparedtotd primary pilot ˆstudyfor estimating and effects estimatesto thecomparedto po is’s strategiesrounding strategies not and compared 4pop the primary pilotno-pilotD.estimating strategies estimatesto average thedid howpilot study studyeffects A C difference. A average for strategies ˆ closerthe thethe is strategiesAB be effectsD. andand versus averageˆOurversusaverageisno-pilotas aspilotandexpected. estimates C 91.7%. comb strategies ˆ the pilot study the the no-pilot primarytofocusnot fortheThe might BstrategiesandThe most forversus true value, estimating strategiesstudy C The The rounding 91.7%.a T the average Our ’s theaveragethe’s’sformakestrategiesand and comparedandCthe for3 ’s thethealso ’sBforˆis how estimating be D.91.7%.rounding for thetofocusstudy value, might of Cexpected. C for the BLUP estimators: pilot the for the BLUP estimators: for the BLUP estimators: Our primary focustruehow themuchstudya effects estimates 1 closer 2 2 correspon the resulted ˆ) 1 also 2 did not versus much of a variances. Tables 5 The 6 at91.7%. also didthewith than p , correct difference. and and correctUsing (3.1) of totalsalsothe strategies B The D.D.andD. rounding achievedrather gains and the Bstrategies B of gains dif var1 (Tˆ xi xi 1 1 versus the no-pilot strategies no-pilotD. strategies ofdifference.The roundingˆ91.7%. Theach makeversusthedidBnotand(3.1)much roundingat91.7%.end The most 91.7%.the m Using theversus theirno-pilotmuchTablesandand(3.2) themodelthe the91.7%. most no-pilot variances. Tables a and difference.end of most The The 2r 2 theand their make strategiesBB thanThe at the rounding striking diffe no-pilot variances. Tables 5 and 6 the end Using Therounding alsostrategiesmodel strategiesand BtheD. 6 ThegainsThe in ’s striking The did their variances. rather 5 a 6 D.at the end of 91.7%. the not andmake The rounding rounding of of 2ai22i 2 2 r, 2 , 2 , i i s x i s r no-pilotand their D. ˆ( var1 ) T ) a r totals 5 and correspondingthe gains ones. of versus totals var1 (T1(Tˆ) i siiasissri ii rii xiss xii i siixsiss xii ri versus theofof totals and no-pilot var i s ii i s ii s r i a i x s ii achieve gains papernot much(3.1) mean difference. the Using model the gains co did includeof much much Using error (RMSE) of correct modelalsoalsomuchmodelthe difference. ofamight (3.2) thegainsandthevalue, aswith ach (3.1) make make(3.2)a resultedas adifference.Using andachievedcorrespona alsodiddidnot difference. square’sthe Using the ˆ non-pilot might b of closer the make mean in ˆ error (RMSE) in also the thisnotrather than the trueratherofsquarethebethe correcttrue corresponding make correct to aroot difference. (3.2) closer to Using much model (3.1) make also includea the rootmuchaUsingUsingresultedgainsachievedthefor the ac of where ai is the “model weight” involving also in did notthiscorrectincludemakerootvalue, thanthanerrorexpected. and’s ˆ ’s with theˆ x did not this paperdidnotnotmakemuchrather adifference. resulted thethe the gains the paper include ofroot mean squaredifference. the in (RMSE) whereai aiis is the “modelweight”involving xi xiiin in the this papermodel model(3.1) mean square error (RMSE) ˆand ˆ correspondingones weight”involving xi in the correctcorrectmodelthe(3.1)rather than (3.2) forcorresponding non-pilot : non involving correspondi 95% correct might be rather resulted in resulted of the inin ’snon-pilot(1,1 x ) confidence than expected. (3.2) of ’s resultedto the ˆ ’s corresponding interval (CI) coverage (3.2) correspondingtrueforTthe combin each the combinationcorresponding nine where the “model weight” the where aiis is the “model weight” involvingcorrect the true(3.1) rather than (3.2) (CI) coverageˆ ’s closer ’s ˆ ( ˆGR ones. resulted where the “model in the 95% correct to the true value, rather than expected. in nine (3.1) as modelthe (3.2) thanas mighteach of in (3.1) rather interval (CI) might of ˆbe expected. rather for the com closer confidence (3.1) (3.2) as coverage the 95% confidence interval value, than (3.2) resulted nine ’s confidence (3.1) resulted in working model and ri is the residual for unitcloser to model95% estimatorsinterval (CI) resulted be of each of thefrom for the combinatio i. correct model value,closer tothan truerathercoverage of each of the nine based size working model and r thetheresidual for unit i. closer forT( the(1,1x: ˆ the combi working model and ri rii is the residual for unit closer toto thetotal closerasthethe trueon samples ofbe be 100 drawncombination for Tcombin model and residual for for unit working model and is is the residualunit i. i. i. for working is total estimators basedexpected. of expected. for the from for(theˆGR com closervalue,totobased beon samples of size expected.from the closer might expected. estimators the true expected. samples value, might on value, as might expected. total estimators value, on samples sizeexpected. the combination ˆGR (1,1 : 100 drawn true to 3 as might bebe as as might respectively thetotal value, / 4asandbasedvalue, as might size 100 drawn from true closer to truetrue value,might beof be 100 drawn(similar 2 populations, We also include a robust leverage-adjusted the We also include a robust leverage-adjusted the 3 /3 // 4 and 2 2 populations, respectively (similar 4 and populations, respectively (similar 3 4 and 2 populations, respectively (similar We also include a a robust leverage-adjusted the We also include robust leverage-adjusted the variance estimate for the BLUP’s: generalizations held for the samples of size 500, which variance estimate for the BLUP’s: variance estimate for 2the BLUP’s: variance estimate for2 the BLUP’s: generalizations held for the samples of size 500, which generalizations held for the samples ofof size 500, which generalizations held for the samples size 500, which are omitted due to length). Both tables are organized such 1 2 2ai2 r 2 2 are omitted due length). Both tables are organized such a ir ˆ are - omitted due to length). Both tables are organized such are a rai2ri 2 1 var2 (Tˆ) x 1 1 i s ri, 2 ,,, 2 - 84omitted due to to length). Both tables are organized such i s xi that the HT estimates are first, followed by the BLUP and var2 ) T ) varvarˆ (Tˆ) i siii sssi1 i i hi (T ( xiss xii i siixsiss xiii ri ss rii2 x x r that the HT estimates are first, followed by the BLUP and i s ii i s ii i that the HT estimates are first, followed byby the BLUP and 2 2 that the HT estimates are first, followed the BLUP and ii 1 1hii hii 1 hii value (which GREG totals produced using the true GREG totals produced using the true value (which GREG totals produced using the true value (which GREG the true where hii is the leverage for unit i. The identical resulted totals produced using strategies B value (which where hii is the leverage for unit i. The identical resulted in identical results for where hiihiiis is the leverage for unit i. The identical resulted inin identical results for strategies B and D), then the leverage for unit i. where The identical resulted in identical results for strategies B and D), then B and D), then and second term in both model variances accounts for those thatidenticalˆ results for strategies (Relbias) D), then are not used ’s. Relative biases be most favorable. For We calculated the average ˆ over each set of 1,000 these populations, wtd bal sampling Results are only main sample for Strategy B samples drawn from both populations. based on x in theproportional to x. Nonetheless, ppstrat ( x ) is still reasonably efficient. As is suboptimal since the variance of neither population isexpected in these types of ˆ over each set We calculated the average summarized here. of 1,000 populations, ( ) is still 3 4 strategies B and to x. Nonetheless, ppstratthe RMSE’s when sampling by srswor are When samples drawn from both populations. Results /are, only proportional D had more uniformly worse xthan those for the other designs in reasonably efficient. As nearly unbiased estimates than A and C due to the expected in these types of summarized here. strategies B and D. c latter two. The smaller pilot had sizes in the omParing RMSE’s when sampling by srswor HeteroScedaSticity 3 / 4 , strategies B and Dsamplemore populations, the StrategieS to eStimate a meaSure of are When 2 population, which had a total of For the rounding in strategies the uniformly worse than those for the other designs in nearly unbiased estimates than A and C due to C and D made the average ˆ ’s 14,304.74, the largest Relbias value was 1.29%. Again, strategies B and D. smaller pilot sample sizes further from the true value, since ˆ ’s close to three- using the correct working model led to improved results, in the latter two. The 2 population, which had a total of For the rounding in strategies C andfourths were either rounded down to one-half or up to in terms of lower Relbias and RMSE values and CI D made the average ˆ ’s 14,304.74, the largest Relbias value was 1.29%. Again, CI coverage coverage 95 percent; there are slight gains closer Total and Variance Estimates further from the true value, one. ˆ ’s close to three- using the correct working model ledtocloser to 95%; thereare slight gains in using the since el variance estimate 4.2: Total and When Variance Estimates in using the GREG to improved results, (3.2). Here, estimator with model 2 the to fourths were either rounded down 4.2: Total , or Variance ˆ ’s were lower is the the basic model Our estimate to one-halfand upaverage Estimates closer to the GREG estimator with model (3.2). Here, there is a terms of and RMSE wing isbasic model variance primary focushow estimating Variance EstimatesthereRelbias notable (but not drastic) drop in the overallCI coverages variance focus is is4.2: Total and estimate how estimating ineffects estimates is a notable (but notvalues and CIin the overall CI Our primary true values.primary focus nothoweffects drastic) drop There was one. mators: 1 closer 95%; the Our Our primary focusmuch difference betweenestimates estimating to effectsthere are slight gains estimators: Estimates 1 oftotals and their variances. Tablesis and how the endcoverages compared to the in using 4 population, the lowest being estimates d Variance r 2estimatesof totals and their variances. Tables coverageat , 5 GREG6 estimatorAwitheffectscompared to the there /is a population, the xi 3 5is 6 at estimating and C and strategiesof i i s i Whenestimating average estimates ˆ ’s fortheir pilot study Tables 5 and 6 at model (3.2). Here, 2 , the 1effects ˆ ’sof totals and to the were 2 s2 2 , the average closer the variances. 2 is aifocusrvariance x i Totalthis 4.2: Total and theof totals and their variances. Tables 5 and the end of of model 2 i how 4.2: s xi s andpaper riinclude Variance Estimates error (RMSE) and estimate i i s ri i s 2 , s 6 at the end root mean square cerii estimate xi ithe end iofxthisVariance Estimates root mean square error notlowest being91.7%. overall CI striking striking difference values are s ai i involvingvalues. i the xi s There was paper include paper include the notable D. The error (RMSE)in the The most coverages versus difference between the mean the the rounding truemodel in Tables95% confidence Totalno-pilot coverageroot and ofsquare drastic) drop 91.7 percent. The most difference in RMSE notat the end this (CI) strategiesof root (but estimates much interval how include B effects square and Our6primary thisof andpaperestimating each mean nine error (RMSE) and focus is Variance Estimates their variances. primary focus is 4.2: estimating 5 xi e basic weight” involving estimatethe 4.2: Total and Variance Estimates the the RMSE valuesgains the gains achieved with the pilot over the variance estimate how effects estimates Our model model variance and in also the not make much ofcompared to in the the average ˆ ’s for the pilot in did basic 1 4 the the “model weight” involving xi study 95% confidence intervala(CI) coverageUsing the the nineare achieved with the pilot strategies difference. of of /of of the A andhowinterval C estimating 2 , square error (RMSE)strategiesconfidencesize 100 (CI) al for ,unit root (RMSE) and 95-percent confidence interval (CI) cover- coverageof 3eachpopulation, the lowest being 2 the i. 95%samplesTables 5end drawn from eachestimates nine clude mean s: r i total estimators based variances. is at the andof at the end estimates effects of totalsOur primary focus is 6 and and on focus of 6 ors: is the effects Our estimators based on iofritotals s dl and srxi isresidual of each of thevariances.primary roundinghow samplesresulted striking difference corresponding nonpilot ones. For the RMSE age for and their correct theirtotal estimators on estimatingof size in total Tables 5 and cesEstimatestheresidualunit i. i. nineand model (3.1) rather based (3.2)The most size ˆdrawncorresponding non-pilot ones. For example, i ri strategies over the 91.7%. i 1 no-pilot strategies of include estimators basedthansamples D. 2 interval 4.2: paperthe 2 nine t estimate (CI) coverage of each 4ofthetotaland populations, respectively (similar and end offrom in leverage-adjusted 1 for this 3 / Broot meanthetheir variances. Tables 5 and of 100 ’s ence versus thethisTotal and,unit paperEstimatessquare error (RMSE)on samples 6 at the 100 drawn from RMSE values are ˆ mean include the totals The/ their variances. Tables achieved the (similar strategies ( T (1,1 : x ) , A, ppstrat) is 1,186.76, and 5 and 2 xii x xi xin theri r Variance of the 2 3root and square error (RMSE) 6 atwith end of the combination over the , i estimating s therobust sx i s effects estimatesa difference. howinvolving ofi imake i s drawn closer to the value, as 2 populations, respectively thet hpilot M ht”includeididi samples 100much ofdrawnthe interval4(CI) themight be expected. e x a m p l e , for(similar S E f o r ˆGRh e c o m b i n a t i o n the 2 populations, respectively e R gains from totals and populacludeini a onaOurssize ofleverage-adjusted the trueUsing 4coverage of each of the nine leverage-adjusted estimating 3effects estimates t x also i 95% s i robust not primary focus is how held for the of each of size 500, which / and orsi based size 100confidence this from generalizations paper include the root mean square 95% confidence interval (CI) coverage samples of the nine error (RMSE) and or for unit i. el 2weight” involving at rather the(similar based on in heldthecorresponding error (RMSE) ,and ppstrat) is approximation to reality. We studied t ances. Tablesinvolving the the estimators generalizations heldthe samples non-pilot: x ˆ which while of organized ( ˆ tions, and omitted this length).5 and ’s the mean square each (1,1 nine idual weight”of totals 6 x xiin inbased95%to paper include the root end ofthe drawn T size size )500,For example, the RMSE resulted samples for coverage matethe BLUP’s:5 andrespectively ofdue samples of Both100 drawnfor100 for suchGRof theones. B, ppstrat) is 1,186.76, while the RMSE and 2 populations, their variances.confidencesize tables are for i.r correct model estimators end on generalizations 6ˆ at (CI)size the samples of 500, A, which odel,for 1the BLUP’s:(3.1)i respectively (similar total than (3.2)Tables interval heldRMSE of offrom generalizations are total from ˆ while the RMSE for RMSE pro might be considered for this typeGR (1,1 : 2 95% confidence each s2 i 2 mean while the ( T of fo rootxcloser squaretrue value,the 500, which omitted due to length). coverage offor organizedx ˆan A, ppstrat) are are 2 populations,by Both size are of from ai ithearesidualthe samples4ofthethe500,4mean are omitteddue to for the of Both 100 drawn(1,1 :tosuchsuch that ˆis 1,289.02. r samples i. 1size 3 be expected.first, interval (CI) and usingtables are the nine length). s is heldresidual, paper3(RMSE) andtotalwhichsquarerespectivelylength).BLUP (similarˆGR organized ˆppstrat) for error that as and respectively pilot busttheleverage-adjusted/1 size ,HTrootpopulations, based onontoThat combination RMSE for) RMSE approximation to stu to rea nsis i i2 ri2 fors this unit i. of and estimates areomitted (RMSE) the whileand T leads followed (similar tables the estimators 1,289.02. samplesRMSE for ( ˆ the x ge-adjusteditorithefor unitinclude2might,2/total estimatorserror duesamples is, of sizea100(drawn from ˆ, B,TB, (1,1 : xˆis) 1,186.76, That is, using approximation of a reality. We rea to main sample, a based is sample, is approximation is, lea : T BLUP x ) , ˆ ˆ ppstrat) B, ppstrat) is That is, using val s in thei s xi Both stablesi areri organized such that the HT while first,are of usingRMSE(1,1to an(T GR(1,1 : x ) ,,ˆ pilotppstrat) design approximation to of s i 2 i xii (CI) coveragexof each x i the nine i the and HT estimates the i leverage-adjusted s r the that3that using about 92.1% of followed pilot(similar and GRT s i confidence organized i GREG totals s is B,1,289.02. 1,289.02. considered the might be That dueii 1 arobust 95%leverage-adjustedi forproduced / HT of the2 true of valuerespectively byGR forfor ( RMSE that )B, about be considered considered Both tables are held the such/3 4 estimates are whichwhile thethenoRMSE h ( ppstrat) 4 the is populations, 500, a e’s: ato length). generalizations interval (CI) coverageandeach estimates first,while by theleadsBLUP andGR (1,1 : x estimator.92.1 percent of for a pilot usf the are generalizations held for of size of populations,(which size which (similar might this the samples the samples using that respectivelypilot. 500, nine the followedvalue (whicha pilot leads to ,an RMSE that is might be of usingtypep hiirobust 2 de samples size 100 drawn on is 92.1% of that approximatio GREGof size produced on the true That is, using to an ˆ about be main no of that it i. The ofare estimatorsby fromgeneralizations for100 produced using is, usingof pilotusing pilot. RMSE that Weabout ppstrat)might92.1%design GREG estimates are total 2omitted due to in BLUP Bothlength).are for1,289.02.and1,289.02.suchaThat displays pilotratiosˆ to an: anpilot sample,pilotissample,designon and totals organized first, followed by the BLUP and GREG totals produced following whichno(which leadsˆ e BLUP’s: 1 identical resultedomittedsamples tables totals drawnsamples1,289.02. 500, usingleads a thepilot (leads to RMSE that isdesign ofapproximati based identical results Both tables are That1,289.02. that That is, RMSE , obtained ambiguousa resultssam B, that pilotsample,considere the D), the while theis, using for then while that the 2BLUP’s: first, followedare the on generalizations heldFigure 2fromorganized true pagevalue of using no (TGR (1,1 leverage for iunit i. i. (similar due to 2 populations,for ,unit The The length).resulted in heldstrategies Bsuch isof of sizeB500, the RMSE a for pilot.(1,1 :xx ˆ)RMSE ppstrat) the followingthe o respectively identical foraboutsamplesof size92.1% ofwhich the variousT the2 92.1% about of RMSE’s pilot. , Figure 2 onpilot sample,be pag B, ) Figure f might identical followed length).BLUP tables BLUP and of no is identical (Relbias) are not r , s xi sr estimator. 2 on be con population 92.1% B of i value (whichresulted for identical results arestrategiesthe that D), using ˆ GR sthei ileverage usingtheHT estimates2are(whichresultedin biases Bothby theaboutwhileusingD),that using no(no pilot. ˆ ,estimator. to estimator.might desig produced usingforthe truethat value populations,are first,the results andisfor about 92.1% and then then pilot. study, designed get a is approximation the usedestimates respectively (similar strategiesof andsuch of offor HT omitted Relative identical results is that followed for 4 and -adjusted the 1 true in ances accounts the3 / those that are first, due to to the organized RMSE using T (1,1 x ) B, ppstrat) is preliminary esti that by 2i s ˆ ’s. co approximat 1,289.02. such aGR ratios ˆ 2 Both organized page ˆa ˆ(1,1 while B, ˆ ppstrat) that approximation popul for is ambiguous am or thes x model iof 1 i 500,accounts are omitted dueestimatorsˆFigure tablesFigure 2 on theThat displays (the pilot:leads,ˆ tothe RMSE population of RM samples variances, 2 , which following 2 RMSE both bothi results xfor size saccountsand D),the HTthat that length).RelativevalueFigureFigureandfollowingofpagepilot(1,1ˆdisplaysppstrat) thatismight2be cons model variancestotalsGREG for for produced usedused first, followed by(which(Relbias)following page( displays for)toratiosobtainedfor approximatio the RMSE ˆthe which 2 on arewhile on theThat is,for for the TGR : :leads , an the 2 estimator. sample ri r produced using thosetrue ’s. and sampling while2 to theonare following(1,1 xx:) ), the an ppstrat) We obtained am (Relbias) RMSE not totalsthose thoseof size value ’s. Rela-biasesthe1,289.02.areRMSE usingTTGRdisplaysxtheB,ratios ratiosthe obtained resu usingare true Relative theFigure thethethenotis,for (page displaysB, We RMSE is We pilot sample s nin sample. generalizationsi B in theD), then the estimates 500,first, followed biasesabouton and thethat of using be pilot. the ratios our versionsbe be co the i i plans dentical x i for strategies held for thatthen HT that used (which BLUP 92.1% following GR page xi strategies GREG shownBand that the estimates are the samples pilot of con of of might plans the by the BLUP might becons of various for the is 2 2population of RMSE’s noworthwhile. Forstudy, Wemight to d gth). unitstabless the the ssample. tables but are briefly mentioned below. the 1,289.02.populationusingRMSE’s no leads an an RMSE thatdesigned samp RMSE lation ˆi. s i not iidentical i of RMSE’s of a a various hii unitBoth unitsin are organized suchGREG totals producedfor the the formentioned 2 (whichThat usingthepilot leadsthe various and samplingobtainedto e following variancebiases resultedresults3not tables intablesand D),of true totaltheis about population ofofRMSE’s of to an various estima- estimator.get identical are not population Theresultedsample.to length).identicalin the combination theare truepopulationFigure 2 withthe usingpilot leads estimatorsthe estimatorsestimator.ge to shownpopulation, tables butT 2(1,1 mentioned That is, estimated pilot theto various that pilotpreliminar but using omitted identical inGREG shown produced usingthenbriefly 1,289.02. That onof usingaa using of pilot. an RMSEstudiesdesigned sampl 1,289.02. 92.1%is, that RMSE’s smaller designedstudy, a and to d That pilot page study, that pilotsample, are for totals inthe wherebut theand :D),is, 1,289.02. sed , ’s. Relative in indue(Relbias)are Bothshown arethetablesare brieflytrue xthevalue2(whichis, is, following the of to to RMSE study, get designed ton results organized suchare for ˆ value below. for strategies B then ppstrat, ˆ population of using noleads the pilot that pilot sample, the not4 strategies B below. that / ) B, gaveˆ sample, more are2we includetive by the BLUP and first,we 2.4): followingFor the followed biases (Relbias) is about 92.1%2ofis theof usingthethe pilot. of thethe ratios (1,1 :pilot, B,obt pilot.displays We ’s, the variance and RMSE page ratios and whereaboutthe totalonthat following pilot.be worthwhile. For estimator.ˆ sampling plans rage forfor accounts HTidentical that first, For the i. of ˆGR REG’s, expression arethose the following variance followed by the/ BLUP andGRestimatorsnotthesamplingpopulationusing RMSE of the combina-our xversions o i 4 the true For )We sr variances include Theestimates are used that the for identical is about 92.1% estimators isand92.1%2 theplans to of RMSE’s of and sampling plans to Relative estimators strategiestorsandFigure to of of RMSE thethepilot.displays combination of Forpp verage but2002, expression thoseˆbelow. estimator. counts unit true Thevalue (which Relativeˆ identical results population, estimatorsabout 92.1% that isof of ofno RMSE of of ones beTworthwhile. design then selected briefly mentioned all estimators identical3 biases strategies Bis forFigure trueofthe that using nono combinationvariousworthwhile. obo worthwhile. T 7,174.74, ed see Valliantbriefly mentioned below. in biases theforresults4fornot combination D), sampling plans to to page RMSEofthethevarious estimator.GR (1, using B etables for unit i. that used ’s.resulted ’s.For (Relbias) /are(Relbias)ˆ areandFigure 2(1,1 total following page displays the the ratios estimator.In o n 3 for population, where andˆ sampling ,following more estimated ratios 100. combination T then the plans the with biased the be be the 2 2 in the sample. Figure 2 population of page displays pilot studies pilot ob following g.,Valliantthe GREGaccounts2.4):2.4): resulted in true ˆ ’s.value= (whichThis (Relbias)Figure22on :onˆ ) with ppstrat,RMSE’s smallerratiosThis study,We Fo 2002, totals produced using is usedall estimators were of T (1,1 : x ˆofD),noton thethe as estimated displaysnthe 100.on smallerWe obtai expression true total the were approximately unbiased for theˆwas on xˆ following page displaysthe ratios average.gave stu B, ppstrat, with estimated ot 3 / 4 population, where for in the 7,174.74, are briefly mentionedcombinationofareB, (1,1 : x ) , ˆB, ppstrat, withthe the smaller100. desig the thefor n =study, obtai combinationwere was of GRppstrat, model variances approximately le. model variances in the tables but are tables but theall Relative biases GR tion the) , Tunbiased We obta that Wepilot 2 2 stu 7,174.74, ’s. estimators below.for popular and : x practice, populationRMSE of the conducting m estimators h over the1,000those that used ˆlargest since biases (Relbias) of ˆ2 unbiased, B, of RMSE’s estimatedstudied, study,pilotThis samples since reference Relbias value approximately not sampling ppstrat,to theforof =various shown 2 and thethenthose briefly mentioned below. population Relative withof we of for combination estimated the combination are 2 (1,1 sampling plans RMSE’s the variousthe be worthwhi 2 accounts for ) esults nfor n therisample. shown 2population RMSE’sthe the various study, designe and the neidentical strategiesi rapproximately shown inˆ3 / 4tables samplesbriefly(a)islargestnis=athetheGRT2combination ofwasselectedofRMSE lower ppstrat worthwh units not insinwere B the the ng the followingithegsample. D), 3 / resultsˆover over1,000 but are D), total100. fortotalforRelbiasGR wasin selectedof ofthe to referencethe varioussmallerdesigne variance i g variance 1)units not resulted i in identical 4 the / for strategies B thetruesince=theppstratforestimators plan populationplans and where mentionedfor is thenis be ones For population, where the samples since true largest100. value population wasRMSE’s of asthe since onesrootstudy,adesig ˆ the more give more on mean squ valueˆ on N1 l Relative biases For the unbiasedthe the 1,000 are briefly mentioned below.This combination ) as to the RMSE estimated biased smaller below. and sampling (a) pop TGRestimators i2 s (Relbias) are for Tpopulation,ˆ )where but truetotaland wtd bal combinationof T was : x ˆ ˆwas consistently the ofreference the average. o =Relbias This was (1,1 plans selected as biasedmore biasedisdesign 100. ˆ ( x 2 , x in xthepopulation, for n the theThis= 100.This ˆ combination , B,selected as the reference study, onespp since (a) using strategy B forfor combination sampling n ppstrat, as of of estimators and is GR the 02,2 expressionfollowing -0.41%ˆ not shown : 2.4): 2 ˆ = worthwhile sampling the and 2 x x ) usingestimator estimatorsThissampling is)plans selectedRMSE wemore be we estimators and ˆ to RMSE ne include thethose that variance’s. Relative the fortablesˆ /(b) ˆˆthe ˆGREG referencensince :wtd) of combination, wasppstrat, with theonly thebebiased ones ounts T 100. ˆ estimators is the ˆ smaller conduc we 2include 7,174.74, usedestimatorsFor -0.41% 3(/x4/T (referencex ˆ )whereunbiased andppstrat one thatxplansplanthethanwith estimated be thewe Tstud 002.4): for N the ifollowing variance valuewerebiases T werepopulation,sincestrategycombinationand isis (1,1 :ˆ plansB,to toandpractice,ofpopulation worthwhil was using the studied, worthwh estimators for ˆ , xapproximatelycorrect B (1,1 xof wtd GR a in practice, the with estimated (b) theworthwhile gi2 r2samples since the ilargest 7,174.74, all were the (Relbias)/ :2unbiasedreference true(a)(a)popularTisa apopular plan inpopulation GREGaestimatorwestud allRelbias sincefor total bal bal ) , ˆB, in RMSE x 7,174.74,below. samples. -0.41%approximately are : not usingthe sincetotalandisT plan popular totalsin practice, andthe main sample(1 i 3 4 population, (a) strategy na ppstrat (1,1 combination was (b) the population giveGR p the GR ˆ ˆ 2 2 . ˆ expression2 2.4): )estimators For approximately using thewhereppstrat is(a)=ppstrat This:popular, ppstrat,practice, estimatedthe smallerlow ( all all strategies, ,unbiased referencetrueB (a) ppstrat is x ˆ popular planconsistently and (b) be GREG es combination of of T (1,1 :ˆ x B, ppstrat,in selected as population biased tgi riˆ n 2002, expression x more ˆmea lliant/ briefly ˆmentionedthe2.4): but are briefly mentioned below.conservative combination 100.GR (1,1 : it ) , , )B,B, ppstrat, with give lower smaller p e.are 22002, showni in r 2 tables the 1,000 samples since the by consistentlysmaller st gi r g . since = 100.ˆ (1,1ˆ a )is guess about the size of themoresmaller ˆ combination TT ˆThis:xcombination was practice, consistently root biase Valliantx : n ) over thestrategy B and sincebal all estimators valueRelbias valuethe GREGestimator T x (1,1 : xˆ ˆppstrat,with estimatedthe theRoundingp the combination because ˆ ( xhii ) , over largest strategies, estimator was the ( the largestused Forwere value (b)usingunbiasedof :GR GR is samples. and For was and the GREG correct ) all all GREG practitioners estimator give low selectedused asconsistently give usingthe1,000 samples wtd x ) the largest Relbias the approximatelyfor nˆ the correctˆ one that )isis onewith estimated by .population l 1 Relbias resulted in i T 1) samples since one that i Tpopulation, s 2 2i(1 h ) iworking is7,174.74, 2 ˆ ) ˆmodel (3.2)(b) approximately GREGGR (1,1 (a) T GR(1,1 : a ) isisused thatin is using by conservati 7,174.74, estimators were strategies, usingT100. GR x practition ˆ one planthanconservative a using on gˆGRi2 N1 gi2xi22where the 1,000 3 / modelˆ (2.1)/ ( xall samples. total is and lower for nunbiased stillppstrat is x popularone byis practice,onlypopulation variance r i s 2 true. total ˆ population, that is (b) thethe GREG estimator ˆ n biased referenceinestimator taking ˆ versus approximately totalsselected totals than using sa the the For(1 ˆ (using samples since the design-unbiasedvaluesince This combination was selected asusedthemore mainonl valuewhile was as for = since This combinationis x in guess more biased i combination ) it the populationsize all sampleg rwas -0.41 percent xfor correct x wheremodeland wtdcloserand95%,resulted=100.in Thisppstrat becausewasis approximately more biasedwo totals than es. n 2were sN2 idesigns,ii thehii )using ,4the xoverxthe ,1,000 samples (2.1) bylargestandRelbias practitioners(a)because (1,1ahalf wasselected is approximatelythedesig For approximately T-0.41% overTthevaluesmodelusing sinceversusused wtd reference =lower practitioners : popular ˆselectedastotals thanmore biase alli-0.41% for ˆ ( x ˆ / 2 for ) workingstrategyusingstrategybalBmodel(b) balfor=n 100. lowerGRTGR is because planparticularly helpful using strategies, unbiased RMSE 1,000: and CIB truestrategylargest conservative practitioners it x was notthat practice, about or harm : the B to(3.2)(3.2) nn was was i s working) (2.1) versus model Relbias forresulted100.Thiscombination approximatelythedesign-unbiased 1N Relbias and coverage the conservative conservative and right ˆ is guess aboutasthe size consistently 2.4): by is orsN r 2 used 2for all sample designs, used of and CI coverageconservativeleftppstrat is a popular :planit is that is about the size i i of Rou relationship. The (a) ˆ ˆ es werewere 7,174.74, all sample designs, used byand (b) the GREG ppstrat abecause itin inis guess about. than w population gi Relbias ˆ xˆ ˆ advantage to totals. iancesi and conservative 2 : edsince the.largest the samples. x differences for Tallall2usingˆx :values usingy-xuseddesign-unbiased while stillwhile popular ˆ)takingofadvantageconsistently ifor wtd (3.2)estimators reference since(a) (a) estimator T advantage that samples. approximately ) unbiasedthe the coverage closer while estimatora popular Small root mean totalsthe si For strategies, using approximately sinceGREG practitioners still planis practice, and for all samples. Relbias / and, RMSE values strategyreferencewtd since practitioners popular :ˆxx planonepractice,population w del variances 2used Relbias value ) For in wereare not (strategies,)usingstrategy correct wtdsince use95%,stillis aisTGR (1,1advantagepractice,y-x population im though was-0.41% For and RMSE theusing model (3.2),bycloserthebal (a)ppstrat takingbecause init one guessissquare errory-x and (b) bal95%, in the ) is (2 x strategies, ˆ, /x x x ˆWith ratios Breference to ppstrat and s 2 (2.1) versus model bal ( resulted alllower for ( T drastic. showcorrectand CI Breferencedesign-unbiased whileˆGR stillplanwaspractice, y-x relationship.u of the not particula estimators GREG the true ˆ from reducing not the was not helpful totals than strat design-based) variances for the -0.41% largest xpanelsapproximately approximatelydesign-unbiased is taking(1,1stilltakingthe half wasconsistentlyˆg’ consistently particularlyin particul modeldesign-unbiased The leftxhalfis taking half consistently (3.2), estimator over2 .of coverage closer to 95%, versus modelimprovements design-unbiased conservative practitioners ) because theis is consistently gg samples model was approximately the y-x relationship. Twhile : panelsright that ratios guess the o 2 (1 hii gipairs the 1,000 working since thethough Relbias valuethe drastic.in WithWith that nppstrat) design-based using the GREG though differencesinare not resultedfor all (b) the model (3.2), riand CI variances for the (2.1) resulted e successive g r variability and of the GREG ˆˆ iˆ n RMSE values correct working lower the for )estimator(3.2) all (3.2) of drastic.used correct GREG estimatorˆ (1,1 : xandˆ isone thatithalf Small root m strategies, usingand(b) (b) the relationship. ˆGR ˆ panels ) is the(b) and the by modelˆpairsˆ versus samples. differences resulted.the inrelationship.GREG left and right(1,1 xtotals. showthatis is totals estimator (2.1) ( (2.1) samples. For (3.2) are notadvantage of true y-xin The used TT T left and because itratioswas than us model advantage right one that is totals about is using i :x i stimators, wherei )successive wtdmodel x of ) versus modelandallin strategies,When relationship.The estimatorpractitioners)x is one thetotals. show not partic right guess about 1N estimators, workingsuccessive bal ˆ( using strategy resultedwtdloweradvantage and thecorrectall in left andThe(1,1(1,1 : showonepanels totalsthe ratios /2 totals than Small root er resulted y-xin improvements estimator GRThe improvements gamma G N all sare strategy .B and(Relbias:of ) using valuesand an coverage closer usingthethebyconservative pairs x using i nces) were si2 (12 (1hiidesigns, TRMSEmodel RMSEthe GREGFor estimated estimators. usedoflowerconservativepractitioners: the)andstill ittotals. mean squarenot B closer model panels 95%, by the y-x design-unbiaseduse bal -0.41% for x x rences samplewhere andˆWith, x and RMSE values GREG estimator(3.2)intoshow estimators thatuse thein thatGR and an estimated ..Small than us not drastic. Relbias values and model equivalent BLUP the ratiosapproximatelyforin all the trueand whiletrueestimated guess about tru less variable When the .us population ( le designs,calculated forthree measures (3.2), the the andestimatorshow resultedused in ratios thatdesign-unbiased D, for therightand anreducingthanme and overCI coverageversus coverage advantagethe ratios forrelationship. GR use leftbecause taking estimatedroot CI to 95%,panels show estimatorsthat totals. half root th Smallthe hii ) is (2.1) versus equivalent used by conservative resulted Relbias the correct and CI for estimatorsconservative andtrue still from i in lower grouped, variances werewere calculated (2.1) panels the (3.2) BLUP the conservative practitionersTheantrue is guess allforvarianceswere the designs, working measures over themodelequivalentused in lowerthe determine that use becauseandit rightreducing wasin va strategies, sample Comparing strategies, there are slight improvements forapproximately for use practitionersthetheleft using from reducing itis guess was va the usedBLUP estimators. half about th wereestimator resulted in improvements threethree measures overWithpilot studyshowbybyratiostoestimatorspractitionersbecause itfrom isvariabilityabou grouped, variances calculated working model estimation,correct model resulted estimators. y-x relationship. while because taking guess aboutnot ta panelsadvantage of is based for ( though differences all in and REGusedthe forcloser to 95designs,are not all are notvaluesnot but a and anapproximately When estimators opposite used intrue right the moreSmal in advantage .of ere ces variancesstrata sample percent,Relbias drastic. usingare and CI coverage Whenconductedgamma istrue gamma in was but a from totals. thethet though strategies, With the x for in estimated the were model allall) samples. For though differences values (3.2), The left estimation, estimation, but lessvar pilot study ispilo atum, were specified variances lowerdifferences RMSE drastic.andrastic. slightestimatedthe .truedesign-unbiasedin while used forandand pilotreducing Sma and and .(3.2), the 95%, gamma in used while still taking forhalf popula When p approximately Relbias and RMSE model estimated closer to 95%, the the relationship. while still taking closer improvements versus used the(3.2)strata the BLUP theRMSE, and CI coverage ofinand there mainslight improvements y-x forused gamma in D, andtakingD,variable less con odelsoverof usingresulted usingwerewere differencesresulted select theB Withadvantage trueWhen efficient gamma estimation, inright D,halfthe wasva and sample, of here successivevariancesGREG estimatorComparing improvements inCI coveragemodel tothe thein design-unbiased while left stillthetakingfocuswasanotva pairs of for (2.1) though estimatorresulted strategy and approximately . design-unbiased( in in). still thebut athe half the reducin was esti p that used and respairs Relbias, GREG Comparingare to drastic. over approximately When true design-based models modelinthe estimators. (3.2) howstrategies, are are A panels(3.2), most in the estimators2thatusein whenless fortotals. less vestratum, andequivalentvariancesversus resulted in strategies,inthere alland an anpanels show thedesign-unbiased The Thus,thetrueand D,fromthenotno improvements in estimated theratios for true all show y-x not CI coverage of strategyaBpilot Aratios conducted tohowleftleft and mainmain issample, the at design-based variances (3.2),specifiedmodel estimators determine the right half Small true for onnot both bothwereworking closerfor 95%, though differences areand sampling is With advantage of studyy-xpilot) working Withmodels werethe were specified the GREG estimatorof lower but a pilot studyconducted torelationship. how Theuseand right tototals.wasthe m to drastic. ppstrat. is pilot(ppstrat,determine The select how study advantage of the y-x is determine over advantage of over right conducted When the nce and CIthere and slight three using so using Relbias, RMSE, studiesestimation,ofbutInisthe thethe Atoconductedtruedetermine was the totals.thereduci lues wherevar1 are and to of measures in the RMSE, resulted in estimators.strategy Bstudy .isrelationship.to Theuse the true true totals. Small coverage pairs over C, the over thesmall estimation, coverage modelestimatedy-x for relationship. oppositeselect from true riances working model improvements Relbias, estimator notand CI estimation, but ain(3.2),ratios relationship. thatto left and used in select Small calculated totals. Sma formulae was mor strategies, Relbias and RMSE values and CI coverage closer resulted indoes not D pilot BLUP estimation,an all calculated successive pairs of the equivalent method estimators. improvements a the study isestimators to determine opposite useful. true equivalent 95%, etors, where three measures overusing the the designs, inppstrat. In and butshow the sample, thestudymethod ofthe opposite and reducing gamma in sampling from D, But, panels show thenot ratios for estimatorsand roundingtrue notis was for th GREG C,estimator the small improvementsdoes BLUP to conducted thatuse the method mators, drastic.improvements in all three measures over the equivalentin thetopanelsan the pilot mostestimators thatefficienttrue are in of sampling successive var panels in main select estimated pilot select show thethe and same all ratios use the how emodel variance formulae strategy over D not ,s the using ppstrat modelto(3.2), estimators have sample, the most efficient method of sampling is ppstrat. and D,reduc the strata the variance formulae1 and using the GREGWhile the to select BLUPpilot sample, ratios for efficienttrue gamma from and Thus, when RMSE, variancesWith calculatedany improvements. over C,usingresultedhow topanels show butmain .sample, themost( that use intrue opposite was tru uped, were thoughwere of Comparingand drastic. Dthere are soequivalent small main studiesmain .notfor estimators gamma2in determine2from reducing nces model CI coverage calculated threeandare overWithso theslight improvementsstudiesthethedoes pilot When isestimatingefficientused 2 ).fromfocus wit strategies, for variances were lead ected variances weredifferences var not measures over modelusing the howthe estimators.RMSE. When the truemost pilot) (( pilot, the reducin Comparing strategies, three measurescase, allthe rounding of theto an estimated ppstrat.studythe mostcase, all ). used whenestimators is improvements While about select ofis a (3.2), method of sampling rouped, and in improvements in are1Bthere Ato slightimprovements.of inhow BLUPinestimators.the sample, thethe true efficientThus,in all).possiblyfor ts estimation, conducted toestimators andgives the more. (ppstrat,theestimators havein used in Thus,for wh overmethod slight Whileisand(ppstrat, pilot)realisticthethe gammatotals, case,the 2oppositethe equivalentpanelofrounding In ppstrat. When is(ppstrat, pilot)determine and about the Inall true gammato used in have D, when lead ). Thus, wa ppstrat. ator samples thevariances theˆ Relbias,offersleadanyCI coverageare strategy over estimation, is of pilot study true gamma about resulted selected using studies does notimprovements in right-hand ppstrat RMSE, to strategy are sampling andsampling and A estimated an estimated case, gfor so using usingRelbias, RMSE,allComparingstrategies, The B of measuresB over of sampling theppstrat. In the (ppstrat, in used ( andoppositethet modelsstrata BLUP estimators. in C CI coverageimprovements overslightmethodthe rounding isa the. In the (ppstrat, pilot) a in and rounding w werethesmallGREG estimator resulted and ofany therethe allAmethodIn the an estimated . When pilot) specified were Comparing strategies, improvements. improvements in but conducted in study andD, for for a an to When sample, study and efficient areand D, ha eof the of strata variances using and specified eC,and BLUP. pairs sed and samples selected were ppstrat for there in the pilot ’s instrategies, there are slightinestimation, select pilot mainsameoffer the The MSE improvements, the D, gives rounding right-handar not for m, how slight case, improvements a the the ˆ pilotpilot have measures a the formulaeimprovements in estimators. Comparing ˆ C offerscoverage of combinations to the overthe the in sample, the most right-hand panel useful comparisons improvements estimatorsselect aboutmain same study rounding de ance theand modelsDweretheand Dis the Relbias, ’s pilot insmallimprovements doesmethodthatA in aaboutusedsameconducted (ppstrat, study and((oppositeTh improvements. and case, all notcase, allallnotRMSE. have RMSE. is RMSE. to sameabout have in the estimation, working models BLUPA’s, that estimation, sampling study calculatedthe threeof the specifiedthe Relbias,case in’s studies pilot estimators the how overbutsameoverstudy is is conducted determine The pilot, 2 ). Th eting 1equivalentvar1measuresBLUP.over of Relbias, in BLUPCIofferscoverageForall theright-handcouldpilotstudy is conducted to determine aopposite2).was var estimating rounding RMSE, and B among strategy B B measures be is the the conducted most efficient andtotals, aa w coverage and and C, over so thesmall possib hthere are slight While the roundingtheequivalentRMSE, C CICI andright-hand estimation,butbutpilot givessample, RMSE. to to totals, combinationsc working variance overspecified not C, so using strategies does studiesofTheestimators have a pilot study RMSE. most determine opposite wa timatingvariance were BLUP. using thetheRMSE,the and estimators.D.ofcase,howestimators Athe about ppstrat. moreestimatingdetermineopposite wasp of pilot) strategy D.of butpanel main (no population(ppstrat,among over strategy is realistic thethe realistic comparisons among estimating totals, ro to to of sampling A ppstrat. In comparisonsefficient( studytotals, selectpanel main sample, In the variability. efficient improvements in D improvements. the The roundingpilotmethod select pilot) estimating and strategies Theand For gives more most efficient While the how most C ppstrat formulae anyover measuresresultsslight not using Conducting TheBpanel andnot panel how does estimators more ces offers usingformulae in the Ato and D arethatsothe using thethein in in seem ofcase,to select the the mainthethe samemostslight offer ( improvemen del variance Comparingsample and any A’s, A’s, issopractice.casesmall pilota studies B allgives the the about the the realisticConducting astudyMSETh and strategy B lead and and over from sorounding thesmall strategiesstudy withcombinations sample, Among the offer how to main the to is not the of the selected to right-handD. ). ( MSE ng CI coverage ofppstrat strategies, thereoverover C,that using thecase amongstudiesamongˆnot the havebe that could be(ppstrat, practice.( Conducting modelwereimprovementsvar1improvements. D Whilethatimprovementssmallcomparisonsright-hand Forcouldgives sample,practice. in pilot) slight 22).2Thus varianceleadstrategyvar1 designs, over C,C, the ppstrat samples pilot method doesselect is ppstrat. In more used efficient slight2 plans ro ).and Thus samplingpilot).study Thu of estimators have ppstrat. the (ppstrat, sampling ppstrat. In Inofferrealistic MSE estimating t comparisonscomparisons among combinations about the thetheRMSE. pilot) combinations )that issample couldsameRMSE. used in be used in nd A rounding) ( offer the method of to ults the case pilot studiesB overand CI sample Forin strategy resultsover the amethod of seemcombinations gives the (ppstrat,variability. slight MSE specified variability In more based com- a The of xsampling to any improvements. not in strategies does D. in C the designs, results followed ppstratppstratall the seem panel gives strategy (ppstrat,populationfollowed x most populations, by bal case, among main could ’s RMSE, ’s For offers improvements fromA practice.over Thestudy with to about (no rounding) rounding) study the practice. Conducting samples sampling study population Results selected ˆusing pilot ˆfavorable.any improvements. in the measuresmethod ofofright-hand is is thatthe same RMSE. inpilot) byestimating b samplessmallthe Relbias,in B andnot lead to sample designs, B over from therounding samplessampling panel that the the beArealistic pilot)study and roun ppstrat lead coverageimprovements.While comparisons estimators have ppstrat. stratification (no population variability nce roundingright-hand the pilot panel on LUP.of the BLUP. usingbe C offers improvements of thesemeasures Whilewtd ppstratRMSEright-handhavethe A gives sameAused followed cumulativeroun all estimators study with same RMSE. aConducting bal haveto about the the RMSE.realistic study ppstrat rng theandselected studies does not lead to any favorable. a 4For While the case,pilotallwtd aa pilotstrategy with strategy more realistic study andand r samples ppstrat pilot more population and rou be most offers improvements reduction case, among have about with these the case, all populations, offer the p slight to 8% in practice. all estimators in The var and rounding so that seem toxin case in andsample BForpractice.B Conducting among combinationsmain RMSE.(noused Among the RM gns,variancefrom theoverisC,A’s,the case not’sbe inmostyielded D. Fornot Strategyparisons estimatorsˆ pilot about the) samecouldabe to 8%Among variabil ppstrat samples the the pilot main improvements D. the populations,compared study xthe strategy (no4 A estimating tot Among ˆ that sample pilot offers improvements in Forcomparisons wtd ppstrat offer in sligh A’s, that sampling the pilot ˆ ’s Cthe B strategies for followed case,Conducting acombinations the could beusedweresampling red rounding)the Strategy ax bal followed by C studies does estimating the 1results of of DBLUP. notof using is inˆsmall in basedfavorable. theand these measures over by Rounding ((gives main more realisticininaboth 8%thev The above. right-hand a ments.variancethe the BLUP. of thebased on strategies offers rounding)therounding)measuresright-hand panel in givesthat themore behere,thein Amongreasona 4 comparisons a ) main x(ˆ gives main more realistic estimating g the While thetorounding )x sampling thereference combination samplefollowedover by ppstrat sample ) the more realistic reductionbased s on ppstratthemainmain by to followed ( B a Bcombinationsthestratificationused offerto slight to forThe Strategy ppstrat denoted ppstrat yielded on cumulati population right-hand a x in x samples describedppstrat amongpanel gives ˆ that sample realistic estimating tot The forConducting apanel study withcould right-hand panel study yielded stratification lead vorable. populations,designs,While from caseneither population comparisons among sampling sample sample A (no sample from the not based on of in the B seem4 tesppstrat of sample designs, over wtd balis ppstratrounding seem to yielded is D.to 8%the practice. g each set Forintheseany improvements. thatresultsnot theofyielded strate- rounding)TheForRMSE compared to study reference combination population v since the variance the in sligh r 1,000 measures overA’s, is suboptimal the caseCin strategies Bpractice. in reductioncombinations thatwithstrategy reference combination a 4 populationamong pilot. compared to be be stratification slight in apilot the that strategy used in RMSE use wtd strategyAin bal on ˆ base case variance thereduction and D.8% Conducting RMSE compared could used A(no offerslight o the is combinations thatcould the usedin baseddescribed improvementsin the measuresresults A’s, that is notstrategythein reduces 8%neitherto For Conducting combinationsof could thesamples(no offer Amonga the is suboptimal A’s, that is thesamples strategiesof and practice. reductioncombinationsthatwithto be usedbased offer basedwa pilot a 4 to stratification ’s gains from doing yieldedcomparisons among a comparisons could in sed the ˆ and D. set1,000 For pilot most setbeForof 1,000 is designs, measuresthe( variance of rounding) followed aa a above. (strategy bethe sample the gains f the these wtd favorable.suboptimal ppstrat averagex ˆin giesmain infavorable. Strategy designs, results sincefromppstrat wtdcomparisonsamong in in RMSE comparedmain sample populationvar here a x ˆ ) to ppstrat be B’s C of most sample designs, sincefrom the combination bal to 8%followed by ppstrat ) population rounding)Conducting dns. in strategiesonlyandsample for improvements populations, populations,is samples4seemfollowed bypilot ppstratRoundingrootAsample population var theResults areˆ over proportionalsample BNonetheless, reference balon an still neither Conducting isRoundingstudy( denoted in meanhere,ppstratbothva practice. C denoted wereAmong UP. onaverageoverBeacheachoffers theto x. these inForresults fromoverthe yieldedrounding)to reductionby aa ppstratwith xstrategy Areduces population ofre sample results balancexreference combination describeda above. inRoundingmain strategy stratification reference practice. has no advantagestudy combination described pilot reducingˆ the main denoted ppstrat here,t reduces the ppstratestimatedseem above. aapilot study with strategy A(no(no C ppstrat he samples )population to describedConducting pilot study with strategy (no square Among s with) strategy Adenoted errors Weighted ase D.sampling the practice.x in to ppstrat balsample al from both variance ResultsFor seembased onfavorable. x. strategy theB reducesofthe (combination described above. in )aˆmain balance of wtdbalanceon ppstrat bal yielded the A’s, strategies Bthe For ForNonetheless,rounding) 4-is from reduction pilot. Weighted in to samples estimated and D.Nonetheless,reference gains ) is by sampling not the case tothe proportional to x.sample types yielded wtd 4tostill doing from ( xRounding compared stratification on m since populations. samples population mainx sample maininthese for Strategy Bwtd aConducting by reduction Aˆˆ ˆ good mainbal(no onwtd bal Among both the populations.basedare only be proportionalexpected For these strategypractice. ( followed still ppstratdoing RMSE sampleof an based on of wtd use Among C ppstrat isneither are inmost most in over for reference combination. reducesbal 8% gains ppstratRMSEaxcompared useuse of wtdforsampl thesepopulations, C n from the ppstratthatof reasonablytox onlybeis Asfavorable.Strategy populations, reducesxtothe gainsaafromin doing(usecomparedWeighteddenotedthis Results be in most overall )pilot. to rounding) to of by yielded 4 followedgains ppstrat x x ) main sample 8% s samples seem efficient. favorable.the Forthesestrategyin Cthesea types 8-percent a from RMSE ) main samplethe stratification rounding) a followed by a ppstrat (doing pilot. sample thethe bal rounding) followed reduction in above. a Rounding strategy Among ( strategy to x.eachhadismoredesigns,since the sampling based on x populationexpected on reference on types of described noover above.torootin stratification Nonetheless, ppstrat ( xsampling reasonably samplingmainissample areanisStrategy 8% estimated in described thepilot.Round- stratification )the reasonablyneither in in neithersrswor yielded these an reduction RMSE advantage the meanthe root sam is since of variance seem to population balance B sample from stillbased onsamplestheAs mainexpected Strategy combination the basedefficient. the As sample in reduces is results samplingppstrat on efficient. bybalance to forCreference combination has suboptimalRMSE’s when x of the mainWeightedthe a 4 to B has no advantage reducing the reducing denoted ppst the for estimated stratificatio populations, pps suboptimal over D ere. these populations, wtd bal Bt For 1,000 set of 1,000 wtd bal variance and of appears compared the combination.mea Select a populations, referenceon 8%estimated Weightedstrategy a combination then in no nocomparedtoreducing usethe ppstra balance Cto to 8% D), describedfrom to Rounding in the of wtd m yielded to yielded srsworreducesreduction RMSEcompared following.square com compared IfWeightedisdesigns samplingaa44to on andestimatedinRMSEadvantage a topilot. denoted root b noRMSE’s conductedstillbalance4 B8%reduction gainshasabove. bereferencethe goodroot mea pilot ( sampling(strategies an anreductioninhas RMSEadvantageover the reference erro by theA use overall reducing 4 , strategiesproportional inD x. D x. ) yielded populations, ( variancestill Weighted strategy C are the efficient.overbe Bmost for Strategy Bismorethese thosethethe thewtd whenneithering population isarereduces the gains from doing aoveralldenotedisppstr Aseach set and had had For populations, forRMSE’s bal over isinreference is reduces describedprobabilityAno uniformly types of wtd is reference combination. srswor denoted pp good is conducted x for 3 4ˆ ˆstrategiesandfor 1,000more is suboptimal since the x ) other of when thepopulation Ccombination described from doing a inIf denoted (strate B favorable. B suboptimal since the the is of 1,000 rage/the, main sample set ofto theseworsetothan populations,theppstrat the most referencebycombination.described above. doing pilotpilot.no strategy .con ations.C Results expected Strategy Nonetheless, Nonetheless,(variance of x thereference scheme, but ppstratgainsabove. IfRounding pilot.good ofppstra are only proportional uniformlyppstrat bal variance neither strategycombination described above. Rounding in A to overall b in sample is suboptimalwtd than x ) sinceover those over theefficient combination the reference Rounding to the ltsand onlyover each based on are Rounding in inA reducing the verage duesampling strategies B andthe are sample worse than B for over the inothercombination.an estimated above. noproportional usepilot thebal other designs in reference designs in good overa Weighted advantage Strategy thereferencebalance beof Se fo the RMSE’s when sampling to srswor main by in efficient. worseNonetheless,those forstrategyis Ccombination D. uniformly (reference balance onan estimated D), then wtd of stimates than thanand anddue duex proportional to x. for Nonetheless, no strategystillcombination. D), then and has then ) appears to of wtd the pilot fo If competitive.strategy C reduces (strategies from doing advantage reducingthe ed latter two. reasonablyarereasonablyproportional Btoinx.these veryno these typesWeighted C reducesand gains accomplished usingisthe cum xbeAwtdbal A C th estimates AThe C population proportional (to D. is types ppstrat no (x ) ) isCconducted (strategies B andappears to pilot.bal mostofwtdwtdm only e populations.neither population the As expected As expected in of ppstrat Thexrounding in strategygains Bfrom D), doing xpilot.thefollowing.good has no pilot. no reduces the D gains doing he variance ofof Resultstheefficient. toisthe 2 inpopulation,x.which had pilot isIfconductedisisconducted on anestimated from bal (thenaappearsuseuseefficient the neither are onlyis Nonetheless, If If pilot (strategies Bthethegains fromand D), aa be pilot. (to to xtherule strategiesneitherxand sampling by total Weighted balance onthe(strategies B wtddoing aadvantage useof proport and population is a srswor no pilot isstill strategy reduces appears ) is)theba bethe ro both populations. for worsesizes Dthose Results two. RMSE’s the RMSE’s when As expectedwtd bal ( are )the balance on an estimated but ppstrat probability( Atogood than inispopulations, the the variance of other The of For designs strategies B ) D. populations, when sampling by srswor are in these xtypesthe conducted suboptimal over ple and the latter of more had more x )sincestill reasonably the shad1,000sizesppstrat sample the The bal ( x wtd theWeighted balance efficient scheme, hashasisadvantage) isreducing toth ) in Weighted reference combination. most) is balance combination. ) advantage very proportional competi adeBthe average inˆ ’s ( latter istwo. reasonably efficient. 2 As expectedwhich overefficient scheme, but estimatedbut x no no advantagereducing therob efficient. wtd 2 population, over thea the amost oncombination. these the reference an estimated of Nonetheless, ppstrat is still reasonably efficient. As expected in is same resultsreferenceefficient scheme, (probability stratum. reducing thero population, balbalWeightedofmost onon an ppstrat hasppstrat (competitive. proportir Weighted total hasnoand probabilityreducing Estimate ( had types balance efficient probability The appears which is as B. of an estimated and C butno advantage verycum the uniformly worseFor For leads proportionalaverage than populations,theRMSE’sdesigns sampling bybyin x )no pilotmostThe rounding in selected per then the A good xo to x. the those than the the x whenother( designs xthe reference Among the units strategy D D), x gies D. andCDdue to the 14,304.74,ˆNonetheless, ppstrat ( forwaswheninthewtd is veryIfhadare total conducted (strategies B ppstrat accomplished Apropo made the worse ’s ’s populations, Relbias other the1.29%. Again, ( srswor pilot is of value is still uniformly the average ˆlargestfor the those to )virtually sampling srsworis the is conducted scheme,accomplished using sameusing to to over theThereference combination. leadsBto virtually the virtually (goo combination. (strategies a GREGleads to appears the strategiesC to theseD D had more A using srategies are only competitive.ˆ The strategy are2 n ˆstrategies andthree-these of populations, the RMSE’s when Relbias value) was overIfreference combination. BLUP or and D), then probabilitya A and population, which had a total A on ( x ) is very xx) isover competitive. combination. in strategy D accomplished results estimator basedgood the rounding ˆ due to2 theB B and types types of B and of As close and made more expected D in had strategies 14,304.74, the RMSE’ssamplingcompetitive.the noˆAgain,the in roundingD scheme, but ppstrat appearsgood , ’s largesttypesfor the ( otherisdesigns T in ˆ)/ isx : The efficient 1.29%. x over bal ( (x Again, ) and ( value choicereferenceconducted (strategies strategy D accomplishedto be very 1.29%. is, D. worse estimators, improved results, wasnocompetitive. x most rounding in B and D), then selected perusin 14,304.74,these to model-based If strategies B and D. uniformly ) virtually the ) is the most reasonably using the correct expected modelthe thosetheRelbias xother very no pilot is conducted (strategies B and D), hand. appears to stra As in than of leads to wtd designs insame results efficientAmongselectedunitsstratum. (3.2), pilot is conducted (strategies B andthe atper probability appearsstra ˆ ’sdue efficient. three-working correct ledlargest model the theimproved( results,conductedas the the population ppstrat appearsEstima rue value, A two. C by to to are those for the latter and units bal x B. scheme, D), then If then selected to be for Among and then atesone-halfsinceand The’sareto the uniformly worse than which virtually to virtually pilot is conducted (strategies B andbutthe model-based choic wo. than or since close The by to duecloseFor If no pilot as SE’strue value, upsrswor srsworthree-the using 2 than thoseleads to other led totalwtd improvedsame results as(strategies B theD),estimators, the model he when A to then Model tos the mates thansampling C ˆthethe the 2 ofstrategiesRelbiasD. correcta working leads led sameno pilot competitive. The B. estimators,ppstratD probability be to accomplishe he largest Relbias For ’s was uniformly worse Bpopulation,for the are ˆ /model to toare ) somewhattheB. Among asˆrounding inor the units selected perperes value 1.29%. strategies andthe GREGtotal had aˆ CI of (results issame than populations, terms population, andtheD.hadsrsworvalues2 , and x wtdtoIf is veryis results, results B. / 2scheme,data fairly unitsprobability w RMSE’s Again,B whenusing and workingˆ of xleadsˆ ) ( If sampling BLUP in the cases p GREG virtually) x Among a BLUP or accomplishe x ( the is) worse efficient scheme, but ppstrat balbal x ) the is ˆmostˆ efficient rounding butbutGREG estimator based probabilityp ˆ in The in lower termswhich byRMSE T ( x estimators,values veryisthethe most ˆefficient ˆ ˆscheme, strategywellorˆ aaGREG est hadthosethedown to one-half or up to in ˆ still BLUP sizesmore the r sizes ˆ ’s latter n roundedforthedown in designs or B to D. other strategies up verage roundedthelatterledone-half closerand95%; terms lowerestimators,and and the (wtd ) values andx most : efficient ,scheme, in strategy ˆ D probability2 so wtd bal ( (of Tthe 2choice Relbias which wtd the is xx ) CI( /most and wtd x GR eitherworking designstotwo.14,304.74,those largest Relbiaspopulation, in the RMSE abal is the most efficient scheme, but ppstratx units) GREG e madein in average two.to improved results, thereotherslight gains 1.29%.: had the model-based CI, x x TT((xxˆ / 2 ,ˆxfit: :the)) ˆand ppstrat/ 2 ,accomplished The model-based model-based choice ˆ a in the of are lower was inestimators, balthe virtually the same )The /xtheˆxB. and TtheBLUP or Tˆ selecte of value Again, which Again,a tochoice competitive. ˆresults 2asx ˆGREG ˆ ppstrat population/ atx Relbias estimators,total model-based choiceˆ and for thebut forxGREG: x ˆ Model RMSE total of ˆ rrect model to the For uniformly coverage designs 14,304.74,worse ’s the ˆ than Relbiasfor thewas 2 95%; there are slighthad ) isin using the largest the Forvalue 2 1.29%. ˆ ( x , x approach GR ( theat hand. (areat, h leads Among Dthepopulation population, using ( gains very extreme. and for general x Among to accomplish x units xones uewerethecloseto the Clower’sRelbiastoand average values and CI and made thethreence tothree- made the RMSE D.ˆestimator with modelmodel to 95%; there are/results,arevery competitive. somewhatroundingstrategyDthetheBLUPselecte ˆ others, to improved 2((a ˆleads is /22 , using the same Trounding:in strategy the accomplished not competitive. The rounding in )population accomplished ( as ˆ / e to ˆ D D strategies Bcorrect working coverage largest Relbias valueˆ wasˆisslight)x ) (verycompetitive. Theresultsworseinstrategyis Dsimilardata fairly xx is areinsomewhat strategy wellGR the population s C andcloser using averageusing’s 14,304.74, thecloser (3.2).ledalthoughthe xGREG: )xTˆ is toˆ ˆ/ veryˆ :competitive. The rounding than data forDunits selectedG very GREG the correct coverage closer GREG there 1.29%.ˆgains virtually )the worse The the others,strategy fitdifferences are workingthe to the Here, T differences ˆ )Again, xˆcompetitive.Therounding B.2 inˆinalthough the in or aca the and model the xestimators, x 2 , 1.29%. Again, xˆ ) are fairly data segre worse firms stillfit units GR cost or di wo. ,toTheaverage closetocloser to ofto14,304.74,led to improvedwith value the GREGtoTtheretheremodel-based choice Tasx B.2, x thanˆtheand theBLUPselect population, whichwereweretotal the the GREG largest all results,the(wasleadsT to xx sampling are same than as still ˆfit the xthat conductunits fairly the average ˆ’s ˆ hadtoa gains using the estimatorIn overall model GREG ppswor ,ˆthe:xxˆathe wassomewhat as ˆB. /thanˆ: still thefit although a same the model GR Here, to / is the accounting Among ˆ)) thethe units selected value,the sinceˆinˆorare slight inthree-population,correct estimatorRMSECIunrestrictedleads(ˆGR ( virtually ˆtheare somewhat T(B. B., xAmongothers,approachfair 2 95%; one-half ’sFor notableterms not GREG which valuestotal CIcoveragesleads virtually model-based results worse approach BLUPdata GR up leads to virtually is asame results ser to since there’sFor the three- inusingdrastic) Relbias hadRelbias of ofto improvedHere, virtuallyx the same results B.xIn Amonggeneralis theselectedis (but of lower RMSE the a and drop in with total (3.2).and as ˆall cases, unrestricted popu : x the 2 leads results, , the virtually lfvalue, to difference ewnor up’s between’s the tolower population, whichworking a cases,ledtheto(3.2). CIGR to x differencesˆ areˆ notchoice general Amongstill allfor the popu close the (but not and others, althoughD estimators, not :extreme. results as ˆˆ( Amonggeneral for is, ppsw terms of closerRelbias and correct had themodel valuesothers, although the the )same results Any/ 2 simulation and approachofto or unre similaror a extreme. studyBLUP aa G cases, a is working modelinregardless althoughareˆ ( x improved CIdifferences : x ˆnot model (3.2). between using the rage with much difference Here, to thenotable / 4 population, the lowest inoverall CI coverages ˆ / 2 , x are notare somewhat 2 worseandIn BLUPthetheGR differences coverages drastic) dropdrop others,overallresults, was known extreme. ˆ ˆ / , thebeing of whether ˆ estimators, thethe model-basedchoice ( ˆ poorest performer,ledthe thethe GREGthemodel-based ˆ)choice T ˆ xT /(2x2ˆx/ ˆ ,ˆ:xxˆ ˆˆ :) x accounting approach ˆ than was difference e strategies to one-half comparedbetweenis a 3 was mator down notand C1.29%.upAgain,incloser to ofvalue(but are slight gains in using the althoughGRmodel-based choicesomewhat performer,thatBLUPor of wh still popula the and nded to There valueto one-half orlargest Relbias valuelower 1.29%.drastic) RMSEIn estimators,unrestricted ,ppswor not ˆpoorest thex ˆ ) andand well-behaved much the largest there are slight gains in Again, ) general firms others, GREG T ( y Relbiasdown 14,304.74, theup to there compared waswas Relbias4allthethe unrestrictedandtheˆdifferences xwasareextreme., ,xxthat )firmsandforstillfitortheda terms notablethere Relbias /percent. and cases, values lowest CImodel-based choice( ( estimators, CI 14,304.74, or coverage Relbias of lower not using population,allestimators,beingx ˆ / 2 x ˆ : are samplingx was :worse and regardless cost conduct reg :x accountingthe fit pop oundedwas A coverage closer to 95%;in terms95%;to the 1.29In and RMSEestimators, theTmodel-based )choice TTaccounting are than for for the thatd cases, unrestricted ˆppswor samplingx was the poorest performer,that less firms the kingthree- pilot to in the overalland C C compared to the 3 3 / 4 population,a cases, ˆ GRˆbeing ppswor the results, the popula all orˆ thewere closer strategiesGREG coverage closerdifferencethere are slightthere values ppswor ˆsampling) are Populationsworse than Anygeneral data accounting popula to notfor pilot led study estimator Thecoverages (3.2).toor estimated. is avalues areiscases, lowest // 2 , differences somewhat was thethan the forthe firmsis, simulation oser drastic)rounding the the using ’s ’s model study to strategies CI estimator ledmodel led(3.2). Here, gainstheinthe Tunrestrictedxxˆ whethersamplingknown simulationAccountingth the drop improved A workingmodel closerimproved improved In theinGREG regardless2 ,ˆof ˆˆ :ˆ x areare somewhat In all others, the 95%; to results, slight regardless although : ) known notdifferent than Any general app 91.7%. A and striking most GREG correct with coverage Here, in RMSE poorest theGREG ˆGR ( x there worse performer, although x ) ˆ ) yieldwasextreme. than still fit the ap d D. toThe RMSEtheusing the correct model withtomodel striking difference inperformer, the Tareˆ 2x2ˆ,x/xˆof:wasare somewhat Anyworse estimated.fitthethe d GREGˆ x to poorest are poorestthetheusingofTT (ˆ( x ( / , x : xwhether are wasestimated. or thanstill study data using thethe others,whether stillsimulati fit or extreme. are somewhat worse GREG GR GR are not known elbias and’s 4Again,values notable (but workingThedrop95%; thereperformer, gainsGREG regardless differencessomewhat worse results. Anysimulation and lowest being CI stillfitthat are 91.7%.RMSE pilot RMSE accounting f coverages although are of whether wasoften that are with the da tfference. Usingbetween togains achieved drastic)the withCI and CI over the there values strategies / ’s terms (but lower GREG in the overallmost striking fe the difference were closer TheofRelbiasRelbias andThe in model (3.2). poorest performer,GR differences ppsworPopulations Populations thatappro average closer the uch Here, others,is a regardless with most notable D. the less ceor up toˆ3 ˆ inand gainsTheusing rounding andestimator valuesstrategies difference in therealthoughunrestrictedare not extreme. have the generalextreme between slightthe ofterms thelowernot 91.7%. -pilot strategies population,not drastic) dropestimator RMSE coverages CI estimated.RMSEisvaluesunrestricted are not extreme. was unitsgeneral appro B and the to the particular, known average B were D. inin rounding GREG gains achieved the overall pilot Here, In all cases, differences ppswor sampling wasPopulationswell-be others,others,cases, differences areare not extreme. Populations thatap accountinga with with the or model (3.2). a the others,overalthough values and general si sampling yield general appr the different res %; much are a results, there of difference. Using the to the the gains General Conclusions, strategies all and Future or usingwiththe estimated. In performer, thedifferences whether was known accountingare estimated.or RMSE the the differencesof not extreme. the strategies Any 5. achieved the coverages others, although emoststrategiesdifference in compared3 /the are not/ 4 the lowest being the estimated. although the overall CI CI In For the to tudy and C indifferencetobetween valuesthe slightdrastic) drop in the lowest Limitations, over regardless are not extreme. Conclusions, Limitatio yield Accou make striking comparedcorresponding non-pilot ones. population, in overall being coverages difference. 95%; 4 (but 3 gains inexample, s not much ˆ ’s 3.2) resulted coverage closerRMSE anotablepopulation, drastic) drop the the or pilotpoorest all cases, unrestricted of whether differentyieldaccountingresu special General 5.bothresults. Any firm was the 5. was not muchofAaand Cbetween ˆ there are(but not non-pilot ones. For example, all cases, unrestricted ppswor samplingwas was thedifferent firm h Ato much difference there Using notable Generalaccountings Conclusio differentsim ppswor treatment yield when estimat In all the sampling known accounting the particular, often have often have 1)(3.1)D. withstrategiesTheand inwithcomparedcorresponding /non-pilotare1,186.76,or In all performer, regardless whether samplingoften have Populationser rather (3.2).rounding strategies ’s model ( ˆ to(1,1inHere, therein RMSE thepoorest In being unrestricted ppswor sampling wasparticular,units withfir (3.2) resulted sermodel than91.7%.estimator C striking’s the(3.2). thedifferencepopulation, values poorest cases, RMSE the ThethanHere,A mostis The correspondingRMSE4values a chievedstudy GREG pilot resulted inovermostTstrikingConsiderations is the (3.2) the combination 3 ppstrat) ones. the are performer, regardless ppswor Considerations lowest the RMSE being the 91.7%. compared to : x ˆ ) , A,3 / 4 population, For example,cases, unrestrictedof and Futurewas wasthe Any sim is particular, eexpected. pilot in Populations for estimated. Bc)rounding the overall CI A and C and rather strategies coverages ˆ difference lowest performer, regardless whether total.known Any sim Any 5. General Conclusions, Limitations, population e drop study the pilot particular,Any known yieldoften b poorestis performer, regardless ˆ poorestperformer, ˆover poorest Future for the special treatment e treatment ngbetween notable For example, the drop inforcombination5.coverages Conclusions,estimated. Limitations,of ofand special wasConsiderations differ RMSE GR overall CI ( Generalx ˆ5.:- A,) ppstrat)Limitations, regardless ofandwhether was known Populations ha the pilot (1,1 : (1,1 , 85 - A, orConclusions, and Future whether was known Populations th the ˆ alue, asB andones. The lowest drastic) achieved the combination differenceGeneral values are Future treatment both couldth over yield diffe aue/non-pilot might gainsroundinggains 91.7%. themost strikingTdifference)Considerations1,186.76, Limitations, Some future considerationswhen bo the special Populations of tegies population,expected. thebeing91.7%. The withWestriking (strategiesxinGeneralvalues are Using might Using (but not value, B D. as be 3 difference. the beThe achieved with the pilot most investigated Considerations is 1,186.76, 4 The strategies GR TGRthe in5. ,RMSE Conclusions, oralternative strategies for estimated. Considerations RMSEppstrat) special Populations treatment i particular, t trategiesthe and D.ˆthe expected. rounding 3 / 4 population, ones.lowest example, someRMSEestimated. or over yield differen compared’s ppstrat) orestimated. corresponding non-pilot the Considerationsthe RMSE the or5. General Conclusions, Limitations,sample size.populationtotal. sug particular, o ination a ˆdifference. Using the is 1,186.76, and Future nA(3.2) ( ’s difference. to thenon-pilot gains achieved withthe being We investigated investigated trea sometotal. yield differen uch andof TGR (1,1 inin ) ,ˆA, Using the the ones. For example,Forthe pilot strategiesestimated.the ed of C a corresponding Wepopulation total. so yield alterna population(2002) ofte sampling the pilot strategies over where there Limitations, and Future Brewerspecial diffe populations muchdifference : x kingin ˆresulted RMSE values are the gains achieved ˆ withand estimation in 5. General Conclusions, is on thepopulation total. Someyield differe ˆ ( ˆ in RMSE ,ones. are example, the RMSE future oft 91.7%. combination T corresponding (1,1 : x )is A, ppstrat)We investigated Conclusions, Limitations,sampling estimating particular,with The most’s corresponding ppstrat) values For 1,186.76, thealternative strategies for and Some sampling special con for striking(1,1 : x ) TGR non-pilot1,186.76, y, whoseConsiderations alternative strategies Future estimationfuture ofte particular, considerations c and in particular, trea ˆ ˆ ’s ˆ difference one target variable is investigated be estimated, and rounding pilot strategiesinoverthe (combination , A,non-pilotWe ones.For Weexample, Generalsome alternative minimumandfor futureSomeand estimato ther than (3.2) resulted in the GR gammapopula particular, con t be the investigated 5. General Conclusions, Limitations, and Future some is RMSE Considerations with expected. for the rather than (3.2) resulted strategies forFuture 5. population c 5.total to some General Conclusions, Limitations, target and We 5. General Conclusions, Limitations,on the forfor one the Some future to Brewer (2002 one Future on target population t whose treatm special Using the the gains achieved with the combination ( ˆ over ˆ the samplingisand estimation in populations where theresample size. special totaltre ,ˆ A, ppstrat)investigated some alternative strategies isvariable y,special size. is 1,186.76, for the pilot strategies amount of precision.”sample treatm However, variable y, 4.1. Estimates might this 1,289.02. That is, using a pilot1,289.02. an RMSEusing a pilot sample, design of a that sample,be considered foran type of problem leads to That is, that and selection of leads to an RMSE main pilot sample, design of a main sample, and s is about 92.1% of that of using no about 92.1% of that of using no pilot. pilot. estimator. is estimator. Figure 2 on the following pageFigure 2 on the following We obtained ambiguous results on whether a pilot displays the ratios page displays the ratios 2 population of RMSE’s of the population study, designed to get a preliminary We obtained ambiguous results on wh for the estimate of get a preliminary estimate 2 various of RMSE’s of the various study, designed to , would for the estimators and sampling plansestimators and sampling plans to the RMSEour versions of the HMT population, to the RMSE of the be worthwhile. For of the our versions of the HM Henry and valliant : x ˆ ) , B, ppstrat, with estimated ˆ the smaller pilot studies gavebe worthwhile. For ’s and ˆ combination of TGR (1,1 more negative ˆ combination of TGR (1,1 : x ) , B, ppstrat, with estimated the smaller pilot ˆstudies gave more negat for n = 100. This combinationfor n = 100. This combination was selected as average. Inbiasedless variable was selected as the more biased ones on the more the ones on average. In the population we reference since (a) ppstrat is a popular plan in practice, is a popular plan studied, conducting a pilot studied, conducting a p population we did not reference since (a) ppstrat consistently give in practice, mean square errors for the lower root TconsistentlyB, ppstrat, root mean square e ˆ RMSE ˆ and Figure 2—Ratios of RMSE’s for x ˆtheis one that is totals plans using one thatmain ˆGR (1,1 : x withgive educated (b) the GREG estimator TGR (1,1 : estimators and sampling than to theonly a for sample ) , an lower ˆ is totals than using only a main sample with and (b) ) GREG estimator TGR (1,1 : x ˆ ) is 2 population, size with estimated used by conservative practitionersbybecause it ,is practitioners then=100. it is Rounding ˆ to the nearest used conservative guess about because of . guess about the size of . Rounding ˆ t approximately design-unbiased while still design-unbiased whileparticularly helpful or harmful in estimating approximately taking half was not still takingEstimated was not particularly helpful or harmful Gamma advantage of the y-x relationship. The left the y-x relationship.Small root meanright half improvements came Gamma Known error advantage of and right totals. The left and square totals. Small root mean square error improv panels show the ratios for estimators that usethe ratios for estimators that use the true the true ˆ from reducing the variability in the reducingstrategies C panels show fromD ’s, in the variability in the ˆ ’s, in and an estimated . When theand an estimatedused in and D, for gamma in used in true gamma in D the less variable . When the true TGR(x^g.hat/2, x^g.hat) population ( the /less variable population ( and D, for 3 4 ), but the estimation, butTGR(x^g/2, x^g:x^g) is estimation, to determine opposite was TGR(1,1:x^g.hat)the more variable population a pilot study conducted but a pilot study is conducted to determine TGR(1,1:x^g) true in opposite was true in the more variable T(x^g/2, x^g:x^g) how to select the main sample, the select efficient ( 2 ). theT(x^g.hat/2, efficient ( how to most the main sample, Thus, when the focus is on estimatingwhen pilot most x^g.hat) M(1,1:x^g) 2 ). Thus, , a the focus is on estimati method of sampling is ppstrat.method of sampling is ppstrat. In the (ppstrat, pilot) In the (ppstrat, pilot) M(1,1:x^g.hat) HT study and rounding case, all estimators have about the same RMSE. have about the same RMSE.are not useful. But, ifrounding are on useful. But, if th study and the focus is not C C case, all estimators estimatingTGR(x^g.hat/2, x^g.hat) possibly with rounding, pilot, possibly with ro totals, a pilot, The right-hand panel gives The more realistic the right-hand panel gives the more realistic estimating totals, a may TGR(x^g/2, x^g:x^g) offer MSE the TGR(1,1:x^g) TGR(1,1:x^g.hat) comparisons among combinations that could among combinationsslightcould be improvements, depending on improvements, depend T(x^g/2, x^g:x^g) comparisons be used in population variability. that T(x^g.hat/2, x^g.hat) in offer slight MSE used practice. Conducting a pilot study with strategy A (no pilot study with strategy A (no population variability. M(1,1:x^g) practice. Conducting a ppstrat Among considered, M(1,1:x^g.hat) rounding) followed by aHTppstrat ( x ˆ ) main sample a ppstrat ( ˆ ) the sampling plans we the sampling plans we main sample rounding) followed ppswor stratification based on cumulativeAmong or x rules, by ˆ x B B x yielded a 4 to TGR(x^g/2, x^g:x^g) in yielded compared towtd bal in RMSE compared to the stratification based on cumulative x ˆ o 8% reduction RMSE a 4 to 8% reduction the TGR(x^g.hat/2, x^g.hat) denoted ppstrat here, were bothdenoted ppstrat here, were both reasonably reasonably efficient. The TGR(1,1:x^g) reference combination described above. combination described above. TGR(1,1:x^g.hat) in reference Rounding in use of wtd Rounding T(x^g/2, x^g:x^g) based use of wtd bal samples in strategy C reduces M(1,1:x^g)gainsstrategy doing a pilot. gains frombal samples pilot. on ˆ ’s was not effective based on ˆ ’s was no the from C reduces the T(x^g.hat/2, x^g.hat) doing a reducing the root mean square errors of totals. mean square errors of totals Weighted balance on an estimated ˆ has balance on an estimated no advantage HT M(1,1:x^g.hat) reducing has to reality. We studied the options that ˆ Weighted while the RMSE for ( TGR (1,1 : A , B, ppstrat) is approximation no advantage for this threeroot problem ) good overall strategy type of A over the reference combination. overx the reference combination. A be considered for this type ofA good overall strategy for this type might problem: design of a TGR(x^g/2, x^g:x^g) appears toTGR(x^g.hat/2, x^g.hat) be the following. appearsa to be the following. Select a hig Select highly restricted 1,289.02. pilot is conducted (strategies Ban RMSEthen pilot sample, design of a main sample, and selection of an using a to and If no That is, TGR(1,1:x^g) pilot leads no pilot D), that If is conducted (strategies BTGR(1,1:x^g.hat) and D), then probability proportional . is about 92.1% of that x^g:x^g) for (scheme, but B, ppstrat) is approximation to to probabilityThis can be to wtd bal ( whileis the mostusing no pilot. (1,1 : )ˆ ˆis ,ppstrat efficient scheme, but ppstratreality.x We studied three options that . the RMSE x ) T(x^g/2, of efficient bal ˆ ( proportional Th ˆ Tˆ x , B, ppstrat) approximation to x^g.hat)toreality.studiedstudied optionsoptions that wtd ,(ˆTB, (1,1xB, )ˆ)) ,the most estimator.approximationreality. WeWe studied three options thatx ˆ approximation We studied three three three that T(x^g.hat/2, reality. studied three while for M(1,1:x^g) for T(1,1 :GR: ) , :) :, the B, ppstrat) isapproximation the the RMSE ( for ˆ ) T displays ppstrat) is is is might be considered (for ) We options options ˆ for while whileRMSE the RMSE (1,1ˆ:( pageGR (1,1ˆinx x B, ppstrat) accomplished using to to reality. rule with problem: design of a the whilecompetitive. The xroundingppstrat) ratiosapproximation to reality. cum Wex this typewhether thattwo that ( TGR T ( (1,1 x strategy D is the Figure the the following x obtainedfor this for thisproblem: of problem: design of a ambiguous accomplished one of pilot a ( x )while 2 on RMSE HT GR GR ) GR very competitive. The We might considered typefor thisofon problem:or a design(ofxa) rule wit is very RMSE for M(1,1:x^g.hat) thisof this of of usingathe of of mightbe strategy D for type type designdesign cum a beconsidered results type of problem: considered type problem: design bemight considered for is leads an RMSE might be be in rounding population using the for1,289.02. That is, using using aaleads of to anvariousmightapproximationtostratum. studied studiedsample, thatselection the 1,289.02. That is, of ˆ(ˆpilotpilotleads toRMSE that that considered to We We a main total withthat a to reality. 1,289.02. aˆ (1,1 0.95 x B,leads an ppstrat) that units selected per getdesign of units the sample, stratum. that three and per either study, designed of design of three options of 1,289.02. RMSE2 using is, using:ˆxxRMSE’s)ppstrat) toan RMSE sample, pilotsample, We ofsample,mainsample, options of an of an ˆ That T is, whileleads 1,289.02. for 92.1%pilota:Tpilot,usingppstrat)tothat thepilot thatpilotpilotreality.areality.Estimateand1.05three and selection an an the total virtually ( for (1,1 leads 1.00 no pilot. while the toThat is, for the(TsamethataleadsB, ˆ B. leadsRMSEapproximationastoB.sample,mainstudiedsample,optionsandwouldof of an the RMSERMSE Thatis, using topilotRMSE isan approximation design Among preliminarysample, 1.10 andan that of )) :as ,to Among while 1,289.02. That of ( resultsapilot1.05B,1.10 isthe RMSEresultssample, designaofthe a ofaamain selectionselection Estimate the selected selection (1,1 xan virtually pilot sample, main 1.00estimate of , and is about 92.1% of that of, using no pilot. GR tono pilot. ˆof the is BLUP orpilot sample, designoftypeon a reasonable aaselection same be considered fordesign athismain of problem: design of a GR GRof estimator. our type mightbemight be GREG thisfor of problem: design of model is about a considered versions of or GREG estimator estimators 92.1% of of thatplans using pilot. might be considered For this type 0.95 problem:adesign of for is and model-basedusing to the ˆ RMSE of no ˆ is about about about sampling thatusing no( ratio , x ˆ : x ) andestimator. estimator.ˆ estimator based the HMT population, based on a reas of 92.1% of no pilot. T x / using is 92.1% thethat ofthat ofchoice pilot.2 is about 92.1% 2 on worthwhile. estimators, is,That is,aa pilotthepilotRMSE RMSE RMSE that ratios estimator. ˆ and BLUPratio estimator. ˆ / 2 estimator. obtained ambiguous results on whether a pilot 1,289.02. That is,Figure 2usingˆ thefollowing RMSEdisplayspilotfor the population aamain sample, and selection of an using 1,289.02. ThatonFigure thepilot leads to an displays the ratios ratios sample, of at x ) asample, and selection of an of an using on a following page displayspilot sample, design ,of : main Model (3.2), and selection leads to an to the that that 1,289.02. theonon2on thepage displays displays the the sample, design design of mainRMSE though incorrect, leads page ratios ratiospilot T ( x Wex hand. an displays the ratios ˆ choice following thethe FigureFigure Figure ˆthe following page the 2Figure ˆfollowing) B,estimators, 2 following ppstrat, combination of 2of usingno, pilot.page page model-basedWesmallerobtained ambiguous sample,results whether pilot pilot obtained ambiguousresults on whether studies more obtained ambiguous the results whetherhand. is about 92.1%forTthatxof2 usingxpopulation of RMSE’s ˆofthan various obtainedWeobtained inget for on weononestimateaand pilotpilot isfor the 92.1%of population :: ofˆnopilot. pilot.with estimated ˆvarious theWeWedesigned togaveacases whether on pilot ofa Model (3.2), thou about GREG92.1%TGR (1,1 population of RMSE’s /of the various We data ambiguous ambiguous population ˆwhether aapilot of the is for for ˆthe ˆ /populationare no RMSE’sworse of the still fit study, fairly well resultsresultsnegativea at’s This , would ˆestimator. examined. for that x 2 xpopulation of RMSE’s 2various) estimator. designed thangeta preliminary estimateof in ,would 22 RMSE’s the ( the aboutthe the( 2 ,that of )using of GREG of ofx the xvariousare study, designed preliminaryapreliminary estimatewellwould cases we ex of somewhat Tvarious estimator. somewhat designedto preliminary theestimate of , wouldthe RMSE’s ˆ the , thex for the2 GR 2 populationthe study, worse to the preliminary :study, study,study, onesa on get get still fit estimatefairly of , would designed to get to to average.estimate of lessofvariable designed get a a preliminary data , would , more Figure Figurethe on the combination plansselected as ratios We the biased ambiguous resultstoon In the aa pilotsome Figure 2 on 100. followingsampling was the GR the the of obtained worthwhile. For our onones onof theby 2 estimators on 2 This followingdisplays to ratios page page displays for n = the following page displays the ratiosRMSE general approach is whether ambiguous results versions whether be worthwhile. For estimators andplans toare of RMSE of RMSE We and sampling plans various the of theestimators sampling of sampling others, the the theof of of the Webeworthwhile.similarourresults a usedthepilota not others, populationand RMSE’s not to toRMSE RMSE the the notobtained ambiguousgeneral approachHMTsimilar to ones use although differences the to the to the extreme. estimators sampling plans plans although differences obtainedFor our versions of versions ofwhether population, estimators 2 population of and and population of plans the the various population extreme. our versions of the ofthe population, the the be be be is pilot and versions population, population, HMT for thefor estimators2(a) ppstratˆRMSE’s ˆ of the ofinRMSE study, designed towe that ForFor ourestimate of HMT HMT population, are worthwhile. preliminary the HMT pilot did 2 for the thecombinationsampling popularB, ppstrat, withstudy, study, worthwhile. get aour versions estimatewould, would reference since unrestrictedˆ(1,1 :RMSE’s various thebe worthwhile.firms aaFor pilotpreliminary ofmore studies. a xˆ ˆ) ,sampling was plan designed get studied, conducting the ,HMT population, get smaller conduct mean gave errors tion, the smaller pilot studies gave firms negative Incombinationunrestrictedis ppstrat,)sampling practice,estimated thetogive topreliminaryestimate ofmorenegative ˆˆˆ ’s andsegregatio all allcombinationof) ,TTˆ (1,1ˆ :B,xppstrat,estimated theestimated the smaller was root cost segregation,ˆwouldconduct’s and was In cases,ˆ of ˆ ˆof ˆT ppsworInwith ppstrat,with estimated designed lower thestudiesgave more of for the ’s cost cases, ppswor, ,B, ppstrat, estimated accountingsampling pilot studies gave more negative ˆ that ’s and B, GR combination of(1,1 (1,1 :x ppstrat, withwith negative B, combination ofsampling T(1,1 GR :plans )RMSE with unrestricted ppsworpilotsmaller gave more more negative and ˆ ’s square all of estimated the smaller studies study gave negative studies accounting estimators(b) performer,of: xplansxtoofthe B,RMSERMSEthat be worthwhile.smaller pilotpilot of the HMTcourse, ’s limited. and and estimatorscombination sampling ofˆwhetherˆ ) cases,the of isthe consistently biased studiesstudies of more negativeˆ variable of estimators TGR (1,1estimator ) , )whether and T GR : x , the smaller the pilotversions oneknown totalsthe simulation our versions HMT In In less and and GREG regardless T combinationwas known worthwhile. For our versionsonongave thepopulation, study is, the for = 100. GR the to poorestand samplingGRplansThis poorestthe isof the regardless of worthwhile.was known of is, average.population, less variable be poorest performer,nregardless to GR (1,1 : x performer,selected asbeAny more ourFor onesmainaverage. withthe population,variable of cou Any was selected as andthan For biased ones on theaverage.HMT the less variable the more biased ones on average. simulation less usingonesaverage. sample thanthe educated only a well-behaved In an the variable whether biased onon average. In lessInvariable variable the the ˆ that HMT more more biased are gave gave the biasedmore ˆ less for n = for for=of =:ˆxxˆ100. :This, B, with estimated selected as forof Tfornn = 100. ˆ(1,1 This combinationwas the as as the more pilot we average. conductingless ’s This combination was wasselected Populations biased less more , combination 100.n 100.:Tcombinationcombinationselected asthe thethe more ones ononesonesmoreInnegativetheˆIn andpilot well-behaved tha combination ofn TGRThis =100. B,practitioners was with estimated smallersmallerstudies studiesconducting a negativea not consis- not ) waswith estimated Populations ˆ ’s and may population or combination (1,1 since (a) ppstrat, is a popular plan inthe smaller pilot studies gave studied,negative that aareˆpilot did not estimated.ˆGR (1,1 This combination because it is guess pilot we studied, studied, conducting’s a pilot and population or estimated. conservative, B,xppstrat,ppstrat, selectedthepractice, about the sizeweconducting morepilottopilot nearestdid not used byreference GR )) the ppstrat is a popular plan in practice, differentwewe of studied, a populations,less not population studied,Rounding conducting pilot or popularin practice, practice, population studied, we . conducting pilotdid not did did estimated. plan in in population population studied,Accounting ˆ a a did did in not we yield population results. conducting square errorsAccounting pop reference(a) ppstratppstrat since (a) reference since ppstrat a popular plan reference 100. =This design-unbiased awhile plan inˆ takingpractice, biasedlower root average.the Inless variablethe totals the reference is results. In root less biased give on units rootyieldharmful for nreference (a)since (a)iscombination a popular the asmoreparticular,not particularly helpfulmean mean squarethe thefor the approximately ppstrat (a) ppstrat is was selected one that was consistentlylower meanInroot different for variable the on the for for n =for n sincethe GREGpopular selectedstill the is morehalf is consistentlyaverage. rootsquareerrorssquareerrors for the = since This combinationis was plan 100. 100. combination was selected as ˆ practice, biased ones lower give lower extreme valuesestimating for as tently ones ones give errors that This a the more give on average.lowersquaremean inerrors errors often consistently consistently root with orthe mean variable consistently have give lower and (b) the GREG estimator ˆˆˆ ˆ(1,1 : xˆ is one that is totals give using meana root square less need with extreme val lower ˆ andConclusions, Limitations, oneone ispopulation studied,ausing only a aa withsample errorsannot (b)the GREGˆ estimator) (1,1oneˆ :)that isthat estimator and(1,1 :xis) is populationconsistently giveconducting mean square withforforeducated than main sample pilota did not an main often with ˆ: plan TGRpractice, right is using we studied, onlya main sample with anguess and (b) estimator Limitations, TisGRleftisand))and isthatthanwe studied, using only particular,samplehave unitseducated T x(1,1 one population we x and5. General(a) ppstrat estimator5. x The )Future (b) and of GREG relationship. T :in pilot educated and (b) Conclusions, popular advantage (a) the conducting reference the GREG GREG aa popularpopular plan in practice, special Small root mean squaremain with an did an ancameeducated  referencethethe y-x is estimatorGeneral Conclusions, Limitations, than thanaonly a samplemain anwith not the General ppstrat is TGR (1,1 plan GR x totalsthan reference since(b)since (a) ppstrat is aTGRˆ(1,1in :practice, that totals. using usingboth when errorsample pilotand educated since improvementsdid totals totals totals only conductingestimating educated than than only main main sample with ˆ educated totals and Futureonly a using GR used ratios for it is treatment treatment the Considerations by conservative practitioners because itit is give lower root mean of .. square for the for the nearest about the size of for root square errors errors to the panels show the by conservative ˆ ˆpractitioners true itconsistently give size oflowermeanofmean ˆˆerrorsˆ the nearestwhen estimating use the because is about guess aboutroot Rounding Rounding ˆ both the nearest usedGREG conservativexx )) thatone that isbecause fromis about give variability square Rounding toˆˆthe nearest practitioners is that guessconsistentlylower size . size special Rounding theto C nearest ˆ because used conservativeestimators becauseis oneis consistently guesssize the the.ofsize. in.Rounding in strategiesthehalf by to . usedFuturethe estimator ˆˆ (1,1:T (1,1 : one because itis guess the theof the of by GREG practitioners is reducing about guess to the nearest and (b)andusedapproximately TGR (1,1practitioners while stillis population total.about size Rounding ’s, to to the (b)byby conservative Considerationsit Considerations is x guess about a main samplethe Rounding ˆnearest and (b)usedGREG estimator TGR practitioners ) that the conservative estimator :ˆ totals than than usingaonly main with with an educated than GR population educated was particularly with approximately design-unbiased design-unbiased taking totalstaking using only notparticularly helpfuloreducated inthe and an estimated design-unbiased while while takingtaking not halfonly considerations harmfulanorestimatinginestimating true while used in approximately design-unbiased approximately design-unbiasedthewhile gammawhile stillhalfand Somehalfwas not mainhelpful helpfulanorharmful estimating approximately . design-unbiased still in still still taking usingparticularly a samplesample harmful),estimating approximately ofWhen relationship. still left takingtotalswasfuture particularly or couldharmful 4variations was forparticularly helpful helpful to the harmful in estimating not size particularly helpfulhelpful total.in in harmful estimating halfnotless / D, the the notofvariable population harmfulnearest (include in estimating but washalfabout was not..particularly orˆ or in ˆ 3nearest nearest advantage the alternative strategies and half used We investigated ofpractitioners because itThe leftguess right the size of size root mean square errorto the is ˆ used by conservative relationship.relationship.and The leftand alternative strategies of . Rounding the improvements came by used ofadvantage some y-x relationship. Theanditright about was Small Rounding Some future considerations could inclu by guess right Rounding or to advantage practitioners Theinvestigatedis for is guess advantage conservative the the y-xis We because leftrightsome onright totals. size.root rootsquare improvements came came came left because advantage conservative relationship.The itfor advantagey-x somethe practitioners Thedetermine and about root root root (2002)error improvements y-x y-x right totals. estimation,theof athe of study for strategies left and totals. totals. totals.the square mean error improvements camecame but of y-x ratios conducted taking Small for We investigatedpilotthe relationship.while totaking taking opposite totals.mean mean error theerror improvements the was root samplingpanelsestimation ratioswhileestimatorsstill sam- truethe particularly Brewerorsquaresuggests size. in strategies and show the in for estimators estimation true not particularly helpful ormore variable improvements there half totals.from reducing the approximately panelsshow the alternativeestimatorsthat use halfinwas notsamplewhere helpfulmean squarein errorimprovements (2002) suggests design-unbiased approximatelythedesign-unbiased populations wherethat usethe truehalf wasSmallSmallmeanissquaresquareerror1,000 as the came C still approximatelyshow ratios forwhile use and true the is was Small Small true therethevariabilitysample in ’s, Brewer design-unbiased still the that usetrue populationsSmall in mean harmful in estimating on for the harmful the population estimating not particularly helpful or harmful ˆ ’s, estimating panelspanels select the ratios for estimators that use efficient show panels the mainratiossampling that how panels ratios for estimators where that use the when the focus is ’s, in the in variability the ˆ the from from reducing variability in the strategiesain strategies pling advantageshowrelationship.. estimatorsestimated, true minimumroot mean thevariability improvements camestrategies andto show relationship. When be and gammatotals. reducingfor reducing variability ˆinestimating cameCstrategies estimation in whosesample, the true the right from).root mean square invariability in ’s, ’s,,strategies 2 advantage of the of they,y-x relationship.the there right y,andfromtotalfromThus,estimatingtheerror improvementsestimating gamma Cwith “an The target variable one targetthe an the populations is toleft most right whose( Small from estimated,theerrorminimum“anyin ˆstrategies variable advantage of and y-x estimated totalThe left and is one tar- usedSmallreducing themean gammaonthein ˆthe’s,ˆreasonable C C C y-x The left and totals. totals. to reducing square squarein withimprovementspilot in used in is reducing the less variability the for ( in in 3came but the one true gammagammain in amount and D, root be for and However, in accounting ), Smallprecision.” variable population error /4 and estimated When the the (ppstrat, . When the truegamma used inand andmethodtheestimated estimatorsWhenusethegammainusedin D,Cin theof the for the less variablepopulation but ),3but When and of sampling and an anthe total known In the the used panelsan estimated anestimatedatoestimatorstrue intruetheinto determine andD, for variableless variablepopulation 3 ( 3 C4is/ /butbut the variable panelsshow anshowx, .which. is.the. true gammaconductedtrue andinfrom D, andD,variability in useful.’s, population(strategies ),),but the show andratios ratios When that usethat unit aux- reducing for variability the the population ( ), / C get one auxiliaryestimatedis ppstrat.one auxiliarytrue in pilot)isusedand forD, forfor theinnot the amountinˆstrategies4 /the3 44thebut variabletheratios for but forpilotestimated, and oneinthe known reducingthe less variable in ’s, in 3’s,4 inprecision.” Cthe be that is true x, for less panels y, whosefor estimatorsthefor everyuse whichfromstudyandand D, theare population the if/ the focus ),on However, in rounding variable population (of from applications, the real variability ˆˆ (But, strategies small every unit estimation, but about the same RMSE. study is conducted to determine oppositelessless true is onthe more variable population reducing the the the interest in performance in in case, all estimation, have pilotis studyis to estimators but estimation, estimation, but The butstudy study is conducted to to used in estimating true truetrue in in moremore variablemay population. a known .for of study true toinincrease to determine known was a pilot, xthe applications,population population known conducted as x variable the opposite was population iliary estimation, butWhen a thetruestudysample,usedrealisticopposite oppositeto waswasthe true the the withvariablethe interest is x, which .to Whenpilot conducted theused determine for was variable true the (the 3variable population possibly and an andestimation, variance aispilotpopulation.determine andefficienttheopposite wastrueas in in more more),the realpopulation on performa and an estimated is pilotselectthe truethe is the gamma inindetermineyopposite totals, inwas in more (variable4rounding, butpopulation estimated . a pilot every gamma in population. of for oppositeincrease the y unit in gamma an how estimated a Whenmain conducted inmost samples. lessthe ofThus, when the focus3is4 3 / the The is less and the more variable how to select the main thethe thevarianceefficient Pilots ). Thus,when main studies but ), how varianceright-handknown formthesample, variance efficient2 ). Thus, Thus, Thus,populationestimatingonestimatingor andpilot studies to Theto to the main main gives ofsample, the efficientD,and theforwhen thevariable populationis/on ofestimating the,,aa main selectof select exactthe main how how to select sample,increase as x increases,efficient( slight).22).).when10 isis the is onis/(Pilots n =on,= 10 , a pilot select is the main increases, most most( the efficient exact ( increases,methodthethe panel tosample,most butthe mostandD,form D,(((lessvariable=population focusonison),but 4pilotathe pilotpilot but y of sampling is sample, morethe (ppstrat, offer of ). theMSEn when the on focus estimating ,pilot a is how most 2 variance whenfocus isdependingof n50, a estimating samples. onestimating improvements, estimating focus focus , abalanced Thus, Thecomparisons pilot study is conducted determine in pilot) 2 the estimation, butmethodofstudy is conducted tocould to determine even was 2 Thus, the In such variable population estimation, but aa pilot a pilotis ppstrat. thatto determine(ppstrat,pilot)less, are typical.when the cases, weighted population estimation, of samplingstudyIn ppstrat. In (ppstrat,opposite was true in the the the focus variable if the conducted sampling is ppstrat. In pilot) opposite method more method of sampling ofsampling ppstrat. (ppstrat,be usedpilot) pilot) study and rounding are moreless, population focus is on method of but is combinationsppstrat. In the sampling the variance was unknown to among ppstrat.Modeling In the the (ppstrat, opposite true intrue inmore variable method the sampler.is isis theIn the (ppstrat, aspilot) study model-based useful. useful. typical. focus not how practice.case, allmain sample,sample, thestrategyRMSE. samples study and rounding areeven useful. focus ififthe focusonison pilot2 andthe the focus notare not useful. But, not if But, but tohow tocase, all2estimatorshave thewiththeRMSE.RMSE. 2Modelingand and roundingon not on But, if But,focus In on exact Conducting have have about efficient samthe sample, unknown most sampler. study study variance isfocusestimators may iffocus is RMSE. the how the selectcase, allestimators about about thetosame RMSE. populationwhenroundinguseful. estimating are,But,is on pilot is cases, weigh tocase, allform main about the samemostthesame efficient 2(). Thus,). when the rounding not estimating But,pilot,theis is select the ofestimators have unknownsame A (nostudy). Thus, and and focusare on But, useful. ,thepilothave such on the have a pilot study most efficient the same and rounding are not theas are useful. the if the the focus rounding select the variancetheaboutsameto the (( main about the study variability. are isa pilot, possibly aa case, all case, xiestimatorsis assumed to be a good RMSE. estimators xi Thus, when is a all The right-hand panel gives the 2 more realistic estimating totals, a pilot, possibly with rounding, may estimators have VarM yi | ) and plans estimating may estimating methodModeling Theis right-handInpanel((ppstrat,the sampleestimating totals, totals,totals, pilot,samplesrounding,model-based may to estimating totals, method of (sampling is panel panel the M (moremain moreis realisticestimatinga totals, pilot, pilot, possibly rounding, may may of sampling ppstrat. ppstrat In xgivesmore realistic advantages. possibly possibly with rounding, with | method of right-hand isa asIn panel gives(ppstrat, is assumed on estimating totals,aa improvements, with rounding, estimators sampling rounding)Thethe ppstrat.ppstrat. givesˆi )the pilot)xi realistic estimating the sampling possibly with withrounding, followedright-hand gives ythexi pilot)more by panelthe the a isAmong are nota MSE But, possibly considered, may pler.The right-hand variancegives Var (ppstrat,)realisticpilot)realistic be a goodpilot,useful. pilot,if theˆwe with onrounding, the the the The The right-hand more studystratificationroundingonMSE useful.if But,focus dependingonon the offer are not useful. improvements, is rules, slight andinrounding slight are cumulative the focusdepending on the roundingbased MSE improvements, theisx on is on advantages. or focus comparisons among combinations case, yielded allcomparisonsamong combinationsRMSE. to theoffer andin offer slight MSEnot But, case,allcomparisonshavereduction same RMSE.could could be inusedslightoffer slight improvements, x depending the onthe allestimators have approximation could bethat could study study and estimators 8% about the same to compared stud- usedin MSE depending case, comparisons havecombinationsreality. could beusedoffer populationMSE improvements, if on estimators combinations that 4 to among among the same used used may offer MSE pilot, possibly with depending assumed toabe a good about the in that RMSE. thatWe in estimatingofferslight pilot,a possibly with rounding,depending the be comparisons among combinationscombinations thatbe be used inestimatingslight improvements,with rounding, may on may the comparisons among aboutRMSE studycould strategy denoted ppstrat here, wereimprovements, depending on that with may rounding, practice.panel gives thepilotthe realistic strategy A population aa pilot, possibly Conducting givesmore more realistic A (no totals, totals, variability. a above. realistic estimating totals, variability. population variability. The right-hand panel described pilotstudyRounding in A (nopopulation variability. both reasonably efficient. The more with strategy population variability. The right-handthataConductingathe strategywith type population The combination panelpilot study with A (no A A (no population variability. sampling plans we considered, right-hand gives apilot with strategy (no (no reference practice. Conducting considered for strategy offer slight variability. study this offer offer slight improvements, depending on theon the practice. practice. practice. Conducting might be pilot a ied three options Conductingstudy study practice. Conducting a pilot with the sampling slight MSEAmong the samplingwas plans considered, MSE samples sampling dependingwe effectiveconsidered, improvements, ˆ plansdependingthe in on we rounding) ppstrat comparisons among among theWe that from doing be))pilot. sample wtd balAmong improvements,plansnot Amongthe the approximation reducesfollowed studiedppstratused ˆ ) main use of Among MSE based on ’swe plans three the considered, comparisons reality.ppstratbyppstrat be options sample considered, ppstrat) is comparisonsfollowedcombinationsby( aacould be x(ˆ(()xxxainused in samplevariability. sampling planscumulative we ˆweor considered, strategy among combinations that could (could in that Among Among the based C to by a pilota rounding) rounding)of followed aby a ppstrat )used ˆa main rounding) followed combinationsˆ thatx ˆ sample population by gains ofmight rounding) followedstudyppstratstrategy A ˆmainmain sample stratification based on problem: rounding) followed sample,) ( main main sample population variability. sampling ˆ design pilot by ofx of rules, ˆ practice. Conducting aaonaan8% typestudy designA (no main reducingstratification cumulative cumulativex x xxx ˆ x orrules, rules, a on practice. be considered pilotthispilotwith problem: design A population variability. mean based on cumulativeˆor or or x rules, Conducting 4 to astudy with strategy advantagestratification the rootbased square cumulative x ˆ x yielded a for 8% reduction with no (no of(no to stratification on onon errorsx of totals. rules, xx rules, in RMSE compared to the stratification practice.balance to estimated Conducting strategy based Weightedyieldedato 4to 8% reduction RMSE compared to the thestratification based plans plans orconsidered, hasRMSEcompared AmongAmong sampling cumulative considered, the yieldedsample,a 4 to 8%of anin RMSE) andˆ compared to the 4 yieldedreduction estimator. in sample to 8% 4 yielded selection 8% mainreductionin RMSE the the plans n RMSE that rounding)aandreference4aa reduction( in ˆin) mainselection of an to the Among the sampling plans we considered, stratifi- The a by combination xdescribed above. Rounding in denotedthe samplingwerewe reasonably efficient. reduction ˆRMSE sample we considered, sample, yieldeddesign of appstratppstratcompared to comparedAmong denotedsamplinghere, were we reasonably efficient. The ppstrat here, were both reasonably efficient. The pilot followed rounding) followed by ppstrat sample, (main) main sample rounding) followed by a ( described above. Rounding denoted ppstrat here, both both ˆreasonablyof problem both both x A denotedppstrat here, for this efficient. The The overall strategy ppstrat ppstrat here, efficient. reference described ˆ over reference combination described above.Roundingdenoted in good of onppstratwerereasonably typexorrules, rules, the reference combination. x reference combination described reference combinationcombinationabove.above. above.in Roundingindenotedhere, wereon were both reasonably efficient. The in Rounding reference in cationbased on bal based wtd cumulative cumulative on x rules, estimator. 8% combination described gainscompared tostratification use bebalon bal samples ’s xwasor ’s ˆwasrules, denoted in C reduces the gains from Rounding instratification wtd cumulative based a doing pilot. based bal cumulative based not ’snot restricted in yieldedyieldedstrategy8% reduction the RMSEB from the thenuse appearsbal samplesfollowing. ˆ Select ’son ˆeffectiveeffective in in aaIf tostrategyis C reduces (strategies and D), stratification to wtdwtd samples based xon onwas ’swas not effective 4 nostrategy conducted in comparedfromthe thea pilot.use of of the based based based or highlywas xnot effective reduction in RMSE compared to doing aa ofuse of use of based samples on ˆ or x ˆˆ ’s was not effective bal samples yieldedstrategya 4 toreduces gains gains gains doing doing 4 to reduces the reduces from from atopilot. pilot. pilot. wtd 8% reduction in RMSE doing wtd ˆ samples on effective not in C pilotC reduces the the on from doingpilotpilot.denoteduse C the ys the ratios strategy strategy ambiguous on an estimated gains whether a ppstrat reasonably both both . efficient. We obtained described above. on whether ina nodenotedppstrat ppstratwereroot mean reasonably of totals. The WeC Weighted described above. Rounding a pilot probability here, were weretoreasonably This The the both reasonably errors efficient. reference combinationambiguous results above. Rounding advantage reducingwereroot mean square errors of totals. be use referencebalobtained thebalanceonresults scheme, buthasno denotedreducinghere,root squaresquare squareefficient. The The combination an mostdescribedhas Roundinghas noadvantage the reducing here,botherrors errors efficient. totals. in reference xcombinationanefficientestimated nonoa advantage ppstrat here, the root mean square of of totals. in reducingroot mean the on errors of Weighted balance estimated no advantage advantage root mean mean ˆˆ’s wastotals. totals. in onan estimated has advantage an Weighted balance an estimated has has use of wtdreducingproportionalon squarexerrorseffectivecan of wtd designed is ( ) ppstratreducing bal samples based based on ˆ ’s was the samples Weighted balanceto get onon the the various study,Weighted balance aestimated estimate of ,,would of use wtdsamples based based ’s was notfor not effective in in Weighted on effective in study, C reducesbalancepreliminary doing a pilot.would bal balA good on was not type of effective strategy designed toreducesgains combination. doing a pilot. wtd balwtd A samplesoverall ( strategy for this type twoproblem strategy C reduces get a preliminary estimateaofpilot. use Aof of A good strategy forstrategynotthis this of or of problem strategy C thethe gains from doing over thethe the gains from reference from A overall overallgood overall x type type this this of problem reference ( worthwhile.on anour versions rounding in population, is over reference combination. HMT strategy reducing good goodrootsquarecum of forforofSelectonetype of problem very competitive. combination. over the balancebalanceestimated ofhasthe advantage reducing theA appearsoveralloverall strategy forof problem problem x referenceFor our versions of no advantage D over the referenceestimated has no HMT populacombination. MSE of theWeighted )overoverthenoreferencecombination. has no advantage accomplished using be thestrategy) errorsof type highly restricted the Ifcombination.The the (strategies B and D), then rootthe goodthe strategythis rule with totals. following. square errors the rootmean tomeanthefollowing.highlyhighly highlyrestricted reducingtheperroot following. oftotals.total restricted a restricted thefollowing.errors errors Selectaa highly mean SelectSelect totals. a either square Select Weighted balance on anpilotan estimated be Weighted be worthwhile. For on is conducted reducing tomeanthe bethe Estimateathe Selecthighly restricted square totals. a a with restricted appears stratum. following. appears to the appears appears to be be following. to be appears be to units If nothe conducted If no If virtually pilotsameconducted(strategiesthen and then thenA overall strategy for this this no isIfpilotpilotis(strategies B andB B and D), D), good good conducted more negative ˆ ’s (strategies D), B is leads pilot no conducted isconductedas (strategiesBD), thenA then selected strategy for this type of problem B. and .. This can be th estimatedover thesmallerIfpilotno(is combination. (strategiesAmongand D),A good probability proportional type oftype of This can be the overto wtd pilotstudiesis theresults efficient scheme,theprobabilityorprobability bal over thereference bal ( ( x)))is gave most efficient scheme, and ppstratprobabilityoverallproportional ontoThisxxproblem can be be be referencecombination. the most efficient scheme, but ppstrat overallproportional based toa x . xThis problem combination. proportional strategy for to can This be a GREG proportional to estimator This can but wtdestimators,referenceis the the efficientT scheme, :but) ppstratppstratbe accomplished proportional highly highly with onecan two bal ( thewtd (x )xis x x efficient efficient ˆscheme, ˆbut but BLUP be probability following.x .to highly restricted is wtd bal isthe model-based most B andbutx ppstratand wtdwtd conducted most choice ˆ ( x /D), ˆ then appearsappears the be the using the aa x .( reasonablerestricted or bal bal ) the is most scheme, 2 , ppstrat probability following. toSelect Select ax ) .rule model x ) ( the most (strategies the following. restricted x D), appears to to cum If no biasedispilot ison competitive. BThe rounding inthen for to population at hand. Select cum ( ( though incorrect, or two pilot( xones very average.(strategies B andvariable If no If no ) is conducted (strategiesIn and D), then the less lected as the more pilot conductedcompetitive. The rounding in strategy the accomplished using the ( cum rulexonewithtwo one two two D accomplished the (cum rule )with )with one or one or Model (x x ) rule rule with two usingxthe (3.2), x )rule with or cum This canor be one 86 strategy D ( xˆ ) most ) competitive. The strategy strategy D proportional to ( xtheisGREGweveryˆverycompetitive. roundingppstratin not-D -accomplished using perthe xx) ) very( competitive.ˆ The are The abut ppstrat thanaccomplished using the cum )x )(veryis /competitive.scheme,in rounding probabilityaccomplished using . ˆ 2 This be can x. wtdpopulationisisxtheis xveryefficientroundingroundingbut strategy Dprobability selected to in the. casesThisexamined. Thisbeeither a bal((( (xbalis istheto virtually ) theThe scheme,asin D Among the proportional well stratum. Estimatecan total with either studied, : xconducting but in strategy pilot B. probability units selected per stratum. Estimate the total with somewhat Tx )( is the most efficient , fit units proportional to data fairlystratum. same results asdid Amongselectedunitsselected stratum. Estimate the totaltotal with a a n in practice, wtd balwtd xx)) leads tomostxefficient scheme,resultsworse ppstrat stillthe the selected per per stratum. Estimate thewith either either aa GR tovirtually the same results as B. units theunits per stratum. Estimate thewith one or two a either total with either units selectedthe cum (( x )Estimatewe one the two per cum the total with leads same Among B. leads leads lower same samein strategy the Among to to virtually the results as B. leads give Among the using the ( consistently very the virtually the results as as in ˆstrategy theˆ general BLUP or GREGestimatorwith usedorareasonable similar) rule basedon reasonable to ( leadsisto xvirtually virtuallyThe roundingB.choiceB.Amongˆaccomplished approachaaisestimator rulexonesonreasonable model model xx)) is very competitive.thethe same resultsstrategy xˆD2/22, x ˆˆBLUPand a BLUP or aGREG cum basedreasonable aby some model ( very estimators, root mean square ˆerrors for accomplished usingGREG GREG estimator on a aona one or model ) competitive. model-based / 2 competitive. The not estimator rule on based reasonable the others, although differences areroundingˆ / 2 ˆˆˆTˆ(ˆ Dˆ/ˆˆ/ ˆ theD ) accomplished using the x on a )based with model two or BLUP BLUP or or s one that is ( estimators, isestimators,theThe rounding ˆinextreme.educated::xxˆ )ˆselectedGREGaestimator based thebased with(3.2), reasonablemodel estimators, themodel-based Among /( ( x , ˆand selectedper astratum. conduct cost segregationeitheraa ˆ T for stratum. Estimatehand. Model (3.2), though a the population the model-based model-based withˆ:(x TTthex ,: )xxunits ) andBLUP or per stratum.at hand. total witheitherthough incorrect, choiceas ( x choice estimators,the onlymodel-based TB. choicean )x2x,and x units accounting firmspopulationat Estimate the total studies. incorrect, theunrestricted ppswor sampling the the andunits selected population athand. total model-based as ˆchoicex( x , was : x and for the GREG estimator Model (3.2), though incorrect, choice Among x totals virtually than using samea results results T estimators, same main , : per the hand. hand. Model though incorrect, ) leads to virtually thethe theresults sample as B. Among the the population at thatatEstimate (3.2), Modelthough incorrect, forpopulation at hand. the (3.2), thougheither leads to In all cases, leads to virtually ˆ same ˆ B. are somewhat worse than the population for forfor the Model Model (3.2), with incorrect, ˆ/2 ˆ the GREG T ( xˆ /ˆ2/ 2, xˆ ˆ : xˆ ˆ) ˆ still estimator fairly well in the cases we examined. This cause it is BLUP or a GREG fit the databased on a reasonable model ot) study study and rounding are variable population focus e wasand rounding aremoreuseful.useful. ifBut, focus is on is on true in the not not But, the if the estimating totals, totals, is on estimating ,with rounding, may estimating focus a pilot, possibly rounding, may .tic Thus, when the a pilot, possibly with a pilot offer offer slight MSE improvements, depending the the on in slight MSE improvements, depending on nd rounding are not useful. But, if the focus is on population variability. population variability. no ng totals, a pilot, possibly with rounding, may the the sampling Among sampling plans plans we considered, ple Among improvements, dependingwe considered, light MSE on the comParing StrategieS to eStimate a meaSure of HeteroScedaSticity stratification based based on cumulative orx ˆ xor rules, rules, stratification on cumulative x ˆ x on the variability. denoted ppstrat here, were both reasonably efficient. The The denoted ppstrat here, were we considered, efficient. both reasonably ong the sampling plans in use of based on cumulative on ˆ ˆ or ˆ ’s was not effective in wtd bal samples based x on ot. ation use of wtd bal samples based’s wasxnot effective in rules, reducing here, the both reasonably errors of The age reducingwereroot squaresquare efficient. totals. ppstrat the root mean mean errors of totals. Cochran, W. (1977), Sampling Techniques, 3rd ediA good goodA good ˆoverall strategy for in type of problem apA overall on ’s for this type of problem overall strategy for this type wtd bal samples based strategy was not effective this of problem appears to be pears to befollowing. Select a highly restricted thebe the the following. highly restrictedrestricted probfollowing. Select a Select a highly appears tion, John Wiley & Sons, pp. 124-126. hen root meanto g the square errors of totals. probability proportional to This This can be can be to. . This ability proportionalx to rat probability proportional type of xproblem can be accomplished good overall strategy for this Godfrey, J.; Roshwalb, A.; and Wright, R. (1984), accomplished using using the( cum ( x )with one orone orunits selected using cum ) rule with one two rule rule with to beaccomplished the Select a xhighly restricted or two two the following. cum D “Model-based stratification in inventory cost estiunits selected per stratum. Estimate thethe thewith either eitheraaBLUP or per to Estimate total be total with either ity units selected per stratum. Estimatecan total with a This the proportional stratum.x . mation,” Journal of Business & Economic StatisBLUPBLUP or a cum ( estimator based on a two modelmodel for the or a GREG estimator rule with a on a reasonable model a GREG x ) based on one or reasonable lished using theGREG estimator basedreasonable nd tics, 2, pp. 1-9. for thefor the population atat hand.(3.2), though incorrect, incorrect, still population at hand. hand. Model (3.2), though incorrect, Model Model (3.2), though population the total ected per either a han fitstill stratum. Estimate the caseswithexamined. This This still the fit the data well in data fairly fairly well in the we we examined. cases fit the data fairly well in the cases we examined. This r a GREG estimator based on a reasonable model Foreman, E. K. (1991), Survey Sampling Principles, general approach is similar to ones ones usedsome some general approach is similar to used by by population at hand. Model (3.2), though incorrect, general approach is cost segregation studies.by some acsimilar to ones used Marcel Dekker, Inc., New York. accounting firms that conduct cost segregation studies. accounting firms that conduct the data fairly well in the cases we examined. This he counting firms that conduct cost segregation studies. simulation is, simulation is, of course, limited. wn Any Any similar study study of course, limited. approach is to ones used by some Hansen, M.H.; Madow, W.G.; and Tepping, B.J. Populations that are less well-behaved than HMT HMT may Populations that are less well-behaved than may ing firms that conduct cost segregation studies. (1983), An evaluation of model-dependent and yield yield different simulation studypopulations, in different results. Accounting is, of populations, in PopulaAny results. Accounting course, limited. y simulation study is, of course, limited. particular, often have units with well-behaved than HMT may yield difparticular, often have less extreme valuesvalues that need units with extreme that need probability-sampling inferences in sample surons that are tionswell-behaved than HMT may less that are ˆ and the special treatment both both when estimatingand theparticular, often special ferent results. Accounting populations, in treatment when estimating ˆ veys, Journal of the American Statistical Associadifferent results. Accounting populations, in population total. total. with extreme valuesneed need special treatment population units extreme values that that tion, 78, pp. 776-793. have with ar, often have units future considerations could include the variations include variations for Some Some future considerations ˆcould the population total. both when estimating and treatment both when estimating and Hartley, H.O. and Rao, J. N. K. (1962), “Sampling on theon the sample Brewer (2002)(2002) suggests 1,000 as the sample size. size. Brewer suggests 1,000 as the is on total. minimum for estimating gamma with “any reasonable minimum for estimating gamma with “any with Unequal Probabilities and without Replacend future considerations future include variations reasonable Some could considerations could include variations me amount of precision.” However, in accounting of precision.” However, in the amountBrewer (2002) suggests Brewer (2002)accounting 1,000 as ment,” Annals of Mathematical Statistics, 33, ample size. on the sample size. 1,000 as the suggests applications, the real interest is on performance in small small the real interest is s x forapplications, gamma with “any on performance in pp. 350-374. m estimating reasonable the n = 10 = 10 estimating gamma with samples. Pilots Pilots of nand main studies of n =of n or 50, or reasonisof samples. ofminimum for and main studies 50, = “any precision.” However, in accounting even less, are able amount ofIn cases, cases, weighted in accounting aptypical. In such such weighted balanced as even less, are typical. precision.” However, balanced Roshwalb, A. (1987), “The estimation of a heterosceions, the real interest is on performance in small samples andplications, the real interest ismay performance in small model-based estimators samples and model-based estimators may have on have dastic linear model using inventory data,” ASA . Pilots of n = 10 and main studies of n = 50, or advantages. advantages. ss, are typical.samples. cases, weighted balanced In such Pilots of n = 10 and main studies of n = 50, Proceedings of the Business and Economic Statisor even estimators may such and model-based less, are typical. In have cases, weighted baltics Section, pp. 321-326. anced samples and model-based estimators may have ges. advantages.  References Allen III, W. E. and Foster, M. B. (2005), “Cost segregation applied,” Journal of Accountancy, http:// www.aicpa.org/pubs/jofa/aug2005/allen.htm. Bryant, E.C.; Hartley, H.O.; and Jessen, R.J. (1960), “Design and Estimation in Two-Way Stratification,” Journal of the American Statistical Association, 55, pp. 105-124. Batcher, M. and Liu, Y. (2002), “Ratio Estimation of Small Samples Using Deep Stratification,” ASA Proceedings of the Business and Economic Statistics Section, pp. 65-70. Brewer, K. (2002), Combined Survey Sampling Inference: Weighing Basu’s Elephants, Arnold, a member of the Hodder Headline group. Särndal, Swensson, and Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag, New York. Sitter, R. R. and C. J. Skinner (1994), “Multi-Way Stratification by Linear Programming,” Survey Methodology, 20, pp. 65-73. Strobel, C. (2002), “New rapid write-off provisions for tangible and intangible assets,” Journal of Corporate Accounting and Finance, 13, pp. 99-101. Valliant, R; Dorfman, A.H.; and Royall, R.M, (2000), Finite Population Sampling and Inference: A Predictive Approach, John Wiley & Sons, New York. Valliant, R. (2002), “Variance estimation for the general regression estimator,” Survey Methodology, 28, pp. 103-114. Wolter, K. (1985), Introduction to Variance Estimation, Springer-Verlag, New York. - 87 - Table 3—Root Mean Square Error and 95-Percent Confidence Interval Coverage using Design-Based (D), Basic Model (B), and Leverage-Adjusted Model (L) 3 / 4 , n=100 for All Strategies Variances, A ppstrat wtd bal 134.86 100.0 133.28 94.8 94.9 134.21 94.8 95.1 135.24 95.4 95.6 135.15 95.4 95.6 137.67 94.1 94.7 137.98 94.9 95.1 150.80 95.3 95.4 150.75 95.3 95.4 168.24 93.9 94.3 154.78 95.0 95.3 168.46 94.0 94.4 165.68 93.7 94.0 153.98 95.3 95.4 141.81 94.4 95.0 153.12 95.0 95.2 144.09 95.7 95.8 149.22 94.4 94.7 138.54 94.5 94.8 139.54 94.4 95.0 137.47 94.8 95.0 137.19 95.4 95.6 148.19 94.6 94.7 153.69 94.0 94.4 149.61 93.8 94.3 138.72 95.0 95.0 139.39 94.5 95.1 137.70 94.8 94.9 137.85 95.3 95.6 139.73 94.6 94.8 139.78 94.9 95.0 144.89 95.7 95.9 141.72 94.7 95.2 155.59 94.0 94.3 144.24 94.8 95.5 139.49 94.4 94.9 139.05 95.2 95.3 147.85 94.3 94.4 138.22 94.3 94.6 139.02 95.5 95.6 141.16 94.9 95.2 141.25 94.9 95.2 143.34 95.0 95.5 143.14 95.0 95.6 155.89 94.0 94.3 144.42 94.8 94.9 139.94 95.1 95.1 139.78 95.2 96.0 146.93 94.1 94.4 138.18 95.3 95.7 139.14 95.5 95.6 155.89 94.0 94.3 155.59 94.0 94.3 149.92 93.8 93.9 149.26 94.5 94.6 159.06 94.2 94.5 158.61 94.0 94.6 149.10 94.4 94.6 138.55 94.5 94.8 139.52 94.3 95.1 137.50 94.7 95.1 140.13 94.1 94.4 138.28 94.3 94.5 136.92 94.5 95.1 149.10 94.4 94.6 138.55 94.5 94.8 144.42 94.8 94.9 144.24 94.8 95.5 138.80 94.8 95.0 138.51 94.5 94.8 147.89 94.3 94.5 145.58 94.6 94.8 139.52 94.3 95.1 139.94 95.1 95.1 139.49 94.4 94.9 139.12 94.6 95.0 139.52 94.4 95.0 141.51 95.2 95.6 139.32 94.4 94.9 148.36 94.1 94.4 138.29 94.8 95.2 138.77 94.4 94.6 136.98 94.6 95.0 140.33 94.1 94.4 138.29 94.4 94.7 136.27 94.6 95.0 148.36 94.1 94.4 138.29 94.8 95.2 138.77 94.4 94.6 136.98 94.6 95.0 137.50 94.7 95.1 139.78 95.2 96.0 139.05 92.5 95.3 137.23 94.5 94.8 137.48 94.9 95.0 142.63 95.0 95.2 140.97 95.5 95.7 471.34 94.0 207.98 95.4 139.56 94.5 138.40 99.8 259.02 95.1 138.93 94.4 138.44 100.0 471.34 94.0 207.98 95.4 139.56 94.5 138.40 99.8 srswor ppswor wtd bal ppswor wtd bal srswor ppswor wtd bal 138.93 94.4 138.29 94.4 94.7 138.28 94.3 94.5 138.18 95.3 95.7 138.22 94.3 94.6 142.82 94.2 94.8 143.07 94.5 94.7 145.03 95.9 96.0 143.42 94.5 94.8 B ppstrat C ppstrat D ppstrat ˆ T Strategy Design ppswor Henry and valliant ˆ T (1,1 : x ) RMSE 95% CI - D 259.02 94.9 ˆ TGR (1,1 : x ) RMSE 95% CI – B 95% CI – L 140.33 94.1 94.4 RMSE 95% CI – D 95% CI – L 140.13 94.1 94.4 ˆ T (x /2 RMSE 95% CI – B 95% CI – L ,x :x ) 146.93 94.1 94.4 ˆ TGR ( x /2 ,x :x ) - 88 - ˆ T (1,1 : x ˆ ) RMSE 95% CI – D 95% CI – L 147.85 94.3 94.4 ˆ T (x ˆ / 2 , x ˆ : x ˆ ) RMSE 95% CI – B 95% CI – L ˆ TGR (1,1 : x ˆ ) RMSE 95% CI – D 95% CI – L 137.87 94.3 94.5 138.15 94.5 94.7 ˆ TGR ( x ˆ / 2 , x ˆ : x ˆ ) RMSE 95% CI – B 95% CI – L 156.26 94.9 94.9 RMSE 95% CI – D 95% CI – L 158.14 94.5 94.7 Table 4—Root Mean Square Error and 95-Percent Confidence Interval Coverage using Design-Based (D), Basic Model (B), and Leverage-Adjusted Model (L) 2 , n=100 for All Strategies Variances, wtd bal 1247.28 94.2 1243.64 93.6 94.0 1243.09 93.7 93.9 1247.71 93.7 93.9 1248.23 93.7 93.9 1281.47 92.2 92.4 1275.54 92.0 92.3 1271.88 93.0 93.5 1274.71 93.5 93.7 1531.53 91.8 92.3 1347.67 93.5 94.0 1295.43 93.0 93.4 1536.34 91.7 92.0 1359.13 93.7 93.9 1327.38 93.0 93.5 1314.23 93.3 93.9 1276.47 93.1 93.6 1482.33 92.3 92.5 1311.79 93.7 93.8 1289.02 93.3 93.3 1268.02 93.6 93.7 1272.01 93.4 93.6 1293.69 94.1 94.4 1275.41 94.1 94.2 1358.54 91.8 91.8 1324.69 93.3 93.5 1276.81 93.2 93.2 1237.23 94.1 94.2 1267.80 93.4 93.7 1246.20 94.8 95.1 1230.47 94.8 95.3 1242.82 93.8 93.9 1243.26 93.9 94.0 1544.90 93.1 93.7 1370.79 94.1 94.1 1295.43 93.0 93.3 1268.39 93.5 93.8 1284.85 92.6 93.0 1204.91 94.3 94.6 1279.76 92.7 93.2 1252.81 93.6 93.8 1239.79 93.6 93.8 1250.31 93.4 93.4 1249.69 93.5 93.5 1594.75 92.8 93.3 1345.67 94.2 94.6 1311.31 93.2 94.2 1287.59 94.4 94.6 1276.01 92.4 92.8 1205.81 94.4 94.5 1279.80 93.7 93.9 1595.21 92.8 93.3 1545.43 93.1 93.7 1364.39 91.8 92.0 1483.95 92.2 92.5 1541.92 91.6 91.9 1536.62 91.7 92.2 1482.90 92.3 92.6 1312.25 93.5 93.7 1289.11 93.2 93.3 1268.04 93.6 93.8 1282.87 93.2 93.5 1206.52 94.0 94.2 1277.18 93.7 93.9 1484.84 92.3 92.6 1312.25 93.5 93.7 1345.67 94.2 94.6 1370.79 94.1 94.1 1327.10 93.2 93.5 1311.59 93.6 93.6 1366.02 93.6 94.1 1350.59 93.8 93.8 1289.11 93.2 93.3 1311.31 93.2 94.2 1295.43 93.0 93.3 1280.05 93.2 93.3 1289.02 93.3 93.3 1332.87 93.4 93.8 1298.09 93.0 93.3 1334.49 92.9 93.2 1302.87 94.3 94.3 1253.61 93.9 94.0 1236.75 94.6 94.7 1278.38 93.1 93.5 1206.63 94.0 94.1 1277.71 92.5 93.2 1335.55 92.9 93.2 1302.87 94.3 94.3 1253.61 93.9 94.0 1693.63 91.7 1331.49 94.2 1291.02 93.2 1268.66 94.0 1280.39 92.6 1205.96 94.4 1279.58 93.6 1693.18 91.7 1331.49 94.2 1291.02 93.2 1268.66 94.7 1236.75 95.3 95.4 1268.04 94.6 94.6 1287.59 94.2 94.2 1268.39 94.7 94.8 1236.12 95.2 95.3 1267.99 94.6 94.6 1315.43 94.2 94.5 1279.59 94.5 94.8 srswor ppswor B ppstrat wtd bal ppswor wtd bal srswor ppswor wtd bal C ppstrat D ppstrat ˆ T Strategy Design ppswor A ppstrat ˆ T (1,1 : x ) RMSE 95% CI - D 1280.39 92.6 1205.96 94.4 ˆ TGR (1,1 : x ) RMSE 95% CI – B 95% CI – L 1278.38 93.1 93.5 1206.63 94.0 94.1 RMSE 95% CI – D 95% CI – L 1282.87 93.2 93.5 1206.52 94.0 94.2 ˆ T (x /2 RMSE 95% CI – B 95% CI – L ,x :x ) 1276.01 92.4 92.8 1205.81 94.4 94.5 ˆ TGR ( x /2 ,x :x ) comParing StrategieS to eStimate a meaSure of HeteroScedaSticity - 89 - ˆ T (1,1 : x ˆ ) RMSE 95% CI – D 95% CI – L 1284.85 92.6 93.0 1204.91 94.3 94.6 ˆ T (x ˆ / 2 , x ˆ : x ˆ ) RMSE 95% CI – B 95% CI – L ˆ (1,1 : x ˆ ) TGR RMSE 95% CI – D 95% CI – L 1297.04 93.4 93.3 1198.12 96.1 96.3 1298.17 93.3 93.4 1186.76 95.7 96.1 ˆ TGR ( x ˆ / 2 , x ˆ : x ˆ ) RMSE 95% CI – B 95% CI – L 1263.57 93.9 94.4 1239.37 93.3 93.8 RMSE 95% CI – D 95% CI – L 1421.62 93.7 93.8 1239.20 93.4 93.6 5  Tax Benefits and Administrative Burdens, Recent Research from the IRS Gangi   Henry   Raub Scoffic Chu   Kovalick gangi, Henry, and raub Factors in Estates’ Utilization of Special Tax Provisions For Family-Owned Farms and Closely Held Businesses Martha Eller Gangi, Kimberly Henry, and Brian G. Raub, Internal Revenue Service also includes a detailed examination of asset composition of estates in each of the subpopulations, as well as an examination of estates’ liquidity, the financial capacity of estates to meet Federal estate tax responsibilities and other debts, including mortgages and liens, with only accumulated liquid assets. For decedents who died in 2001, about 1,800 estates, or 1.7 percent of the estate tax decedent population, elected to use at least one of the three special business provisions. A total of 831 estates elected special use valuation, alone or in combination with the business deduction or deferral of estate taxes; 1,114 estates claimed the qualified family-owned business deduction, alone or in combination with special use or deferral of taxes; and 382 estates elected to defer estate taxes, alone or in combination with the other two business provisions. Figure A shows the elections and combinations of elections employed by estates of 2001 decedents. Of the estates that elected at least one provision, the predominant election was the qualified family-owned business deduction alone, with 656 estates that claimed the deduction. The second largest election was special use valuation alone, with 425 estates that elected the provision. Estates elected both special use and the qualified family-owned business deduction in 332 cases. Rarely, estates elected all three provisions, only in 21 cases. Some differences by size of gross estate are notable. Of those estates that utilized a special business provision, W ith the enactment of several legislative provisions, the U.S. Congress has sought to protect family-owned farms and closely held businesses by lessening the burden of the Federal estate tax, a progressive tax on the transfer of wealth at death. These provisions have included: special use valuation—the valuation of property at its actual use in a family enterprise rather than its full market value; the qualified family-owned business deduction; and the deferral of Federal estate tax liabilities [1]. Special use valuation and the qualified family-owned business deduction each reduce the taxable estate, the amount to which graduated estate tax rates are applied, and, ultimately, an estate’s tax liability. The deferral provision allows an estate to defer the portion of estate tax that is attributable to the decedent’s closely held business and pay the balance in installments. In this paper, we present a brief description of Federal estate tax law in effect for the estates of 2001 decedents, as well as an examination of the three business provisions available to these estates. In addition, we presents logistic regression models that examine the relationship between usage of one business provision and other estate characteristics. We also discuss the potential for future research. This paper is an extension of our earlier research that examined the subpopulations of estates that utilize each of the three business provisions and compared them to the subpopulations of estates that do not utilize the provisions [2]. This earlier research Figure A—Election of Special Business Provisions [1], by Size of Total Gross Estate Election of business provisions Size of total gross estate Total number of estates 108,331 93,322 9,977 3,454 1,578 No elections (1) SUV only QFOBI only DOT only (2) 425 385 28 **12 ** (3) 656 578 52 21 5 (4) 221 99 39 55 28 SUV & QFOBI (5) 332 303 25 **4 ** SUV & DOT (6) 52 28 14 **10 ** QFOBI & DOT (7) 105 25 44 20 16 SUV, QFOBI & DOT (8) 21 12 6 **3 ** All estates Small ($675,000 under $2.5 million) Medium ($2.5 million under $5 million) Large ($5 million under $10 million) Very Large ($10 million or more) is abbreviated as DOT. 106,519 91,892 9,769 3,329 1,529 [1] Special use valuation is abbreviated as SUV, the qualified family-owned business interest deduction is abbreviated as QFOBI, and the deferral of taxes **Data combined to prevent disclosure of individual taxpayer data. - 93 - gangi, Henry, and raub smaller estates tended to elect only the qualified familyowned business deduction, while larger estates tended to elect only the deferral of taxes. u Federal Estate Tax Law and the Decedent Population The estate of a decedent who, at death, owns assets valued in excess of the estate tax applicable exclusion amount, or filing threshold, must file a Federal estate tax return, Form 706, U.S. Estate (and Generation-Skipping Transfer) Tax Return. For decedents who died in 2001, the exclusion amount was $675,000. For estate tax purposes, the value of property included in gross estate is fair market value (FMV), defined as “the price at which the property would change hands between a willing buyer and a willing seller, neither being under any compulsion to buy or to sell and both having reasonable knowledge of all relevant facts,” according to Regulation 20.2031-1(b) of the Internal Revenue Code (IRC) [3]. The gross estate consists of all property, whether real or personal, tangible or intangible, including “all property in which the decedent had an interest at the time of his or her death and certain property transferred during the lifetime of the decedent without adequate consideration; certain property held jointly by the decedent with others; property over which the decedent had a general power of appointment; proceeds of certain insurance policies on the decedent’s life; dower or curtesy of a surviving spouse; and certain life estate property for which the marital deduction was previously allowed” [4]. Specific items of gross estate include real estate, cash, stocks, bonds, businesses, and decedent-owned life insurance policies, among others. Assets of gross estate are valued at a decedent’s date of death, unless the estate’s executor or administrator elects to value assets at an alternate valuation date, 6 months from the date of death, described in IRC section 2032. Alternate valuation may be elected only if the value of the estate, as well as the estate tax, is reduced between the date of death and the alternate date. The estate tax return is due 9 months from the date of the decedent’s death, although a 6-month filing extension is allowed. In 2001, an estimated 108,330 individuals died with gross estates above the estate tax exclusion amount. These decedents owned more than $198.8 billion in total assets and reported almost $20.8 billion in net estate tax liability. Decedents for whom an estate tax return was filed represented 4.6 percent of all deaths that occurred for Americans during 2001, according to vital statistics data collected by the U.S. National Center for Health Statistics. Estate tax decedents for whom a tax liability was reported, 49,845, represented 2.1 percent of the American decedent population for 2001 [5]. u Data Sources and Limitations The Statistics of Income Division (SOI) of the Internal Revenue Service (IRS) collects and publishes data from samples of administrative tax and information records. With its annual Estate Tax Study, SOI extracts demographic, financial, and asset data from Federal estate tax returns. These annual studies allow production of a data file for each filing, or calendar, year. By focusing on a single year of death for a period of 3 filing years, the study allows production of periodic year-of-death estimates. A single year of death is examined for 3 years, as 99 percent of all returns for decedents who die in a given year are filed by the end of the second calendar year following the year of death [6]. The Estate Tax Study for the period 2001-2003 concentrates on Year-of-Death 2001, the year of death for which weighted estimates are presented in this paper [7]. Unweighted year-of-death records for decedents who died in 1998, collected during Filing Years 1998-2000, are also included in the section entitled “Logistic Regression Models.” u Special Use Valuation With the Tax Reform Act of 1976, Congress protected U.S. farms and closely held businesses by providing for special use valuation of decedents’ interests in real property devoted to such businesses. For estate tax purposes, the value of property included in gross estate, including real property, is generally the fair market value based on property’s potential “highest and best use.” However, for real property that is used by a decedent or family member in a farm or other business as of the decedent’s date of death, as well as in 5 of 8 years preceding death, the executor may elect to value such property at its “qualified,” or actual, use in the - 94 - factorS in eStateS’ utilization of SPecial tax ProviSionS business, if certain requirements are met. According to the IRC, the term “family member” may include any ancestor of the decedent; the spouse of the decedent; a lineal descendant of the decedent, decedent’s spouse, or parent; or the spouse of any lineal descendant. In order for an estate to elect special use valuation (SUV), several other conditions must be met: real property must be transferred from the decedent to a qualified family member of the decedent; at least 25 percent of the adjusted value of the gross estate must consist of real property, where adjusted value is defined as fair market value of real property less any debts against the property; at least 50 percent of the adjusted value of the gross estate must consist of real and other business property; and the estate must consent to payment of additional estate tax—“recapture tax”—if, within 10 years of death, the property is sold to an unqualified heir; if the property is no longer used for a qualified purpose; or if the qualified heir ceases to fully participate for more than 3 years in any 8-year period. For estates of decedents who died in 2001, the allowed maximum reduction in value between fair market value and special use value was $800,000 [8]. For 2001, an estimated 831 estates elected SUV for real property (see Figure B). Although this accounted for only 0.8 percent of all estates, it represented about 5.3 percent of estates that reported closely held or agribusiness assets, i.e., those estates that were potentially qualified to elect special use. Of those 831 estates, about half—405 estates—made protective elections of special use. An estate’s executor may make a protective election if he or she must file a Federal estate tax return prior to final determination of real property’s qualification as special use property. As such, the election is contingent upon property’s value as finally determined. Estates with protective elections do not separately report fair market and qualified use values for real property. Smaller estates were more likely to claim this provision than their larger counterparts. As shown in Figure B, about 0.8 percent of small estates (those with less than $2.5 million in total gross estate) claimed SUV, while only 0.3 percent of their very large counterparts used the provision. Reported fair market value for qualifying property was $377.2 million, and the property value decreased to $189.0 million for qualifying purposes. u Qualified Family-Owned Business Deduction With the Taxpayer Relief Act (TRA) of 1997, Congress sought to safeguard family-run businesses and provided an estate tax deduction for “qualifying” family-owned business interests included in gross estate and transferred to qualified heirs. Requirements for utilizing the deduction are, with a few exceptions, similar to those for electing special use valuation. The principal place of business must be the United States, and the business entity must not have debt or equity that is tradable on an established securities market or secondary market. In addition, at least 50 percent of the business entity must be owned by the decedent and members of the decedent’s family; or 70 percent must be owned by members of two families (and 30 percent Figure B—Number of Estates, Estates with Potentially Qualifying Assets, and Number that Elected SUV, by Size of Total Gross Estate Size of total gross estate All estates Small ($675,000 under $2.5 million) Medium ($2.5 million under $5 million) Large ($5 million under $10 million) Very Large ($10 million or more) Total number of estates (1) 108,330 93,321 9,977 3,449 1,583 Estates with potentially qualifying assets (2) 12,683 10,925 1,102 442 214 Estates that elected SUV (3) 831 728 74 23 5 CV [1] (4) 12.6% 14.1% 27.1% 28.1% 8.3% [1] Coefficient of variation (CV), the ratio of an estimate's standard error to the estimate, is used to measure the magnitude of potential sampling error. The CVs shown refer to the number of estates that elected SUV. - 95 - gangi, Henry, and raub owned by the decedent and members of the decedent’s family); or 90 percent must be owned by three families (and 30 percent owned by the decedent and members of the decedent’s family). Several other requirements must be met, including: the value of the business interest must constitute at least 50 percent of a decedent’s total gross estate less deductible debt, expenses, and taxes; and the decedent or family member must have been actively engaged in the business. An additional estate tax is imposed if, within a period of 10 years after the decedent’s death and before the qualified heir’s death, the heir fails to actively participate in the business for a total of 3 years in any 8-year period [9]. The qualified family-owned business interest deduction (QFOBI), initially set at $675,000 in TRA of 1997, could not exceed $1.3 million when combined with the applicable exclusion. Therefore, as the exclusion increased incrementally from $625,000 in 1998 to $1.5 million in 2004, the maximum allowable deduction decreased and finally disappeared in 2004 [10]. For decedents who died in 2001, the available deduction for qualified family-owned business was $625,000. Only a small fraction of estates utilized the QFOBI in calculating taxable estate and estate tax liability. For Year-of-Death 2001, an estimated 1,114 estates, or 1.0 percent of the total, claimed the deduction, while small estates made up the majority, 82.3 percent, of those that used the deduction (see Figure C). These 1,114 estates comprised about 7.1 percent of estates that reported closely held or agribusiness assets, i.e., those estates that were potentially qualified to elect QFOBI. The likelihood that an estate would claim the deduction was greater for larger estates. Among all very large estates, 1.5 percent claimed the deduction, while only 1.0 percent of all small estates claimed the deduction. For all estates, the deduction reduced taxable estate by $626.8 million. u Deferral of Tax and Installment Payments Congress has also enacted legislation that lessens the burden of certain estate tax payments for estates comprised largely of closely held businesses. The legislation provides estates with an alternative to selling closely held interests in order to meet Federal tax responsibilities. Initially, in 1958, Congress introduced installment payments for these estates, and then, in 1976, Congress established rules for deferral of payments. Under the law, an estate’s executor can elect to pay estate tax attributable to the business interest in two or more, but not exceeding ten, equal payments and defer tax payments for 5 years, paying only interest on the tax liability during the deferral period. In order to qualify for deferral of tax and installment payments, at least 35 percent of the value of adjusted gross estate must consist of an interest in a closely held business. Under the law in effect for 2001, the definition of closely held business included three types of entities: Figure C—Number of Estates, Number with Potentially Qualifying Assets, and Number that Elected QFOBI, by Size of Total Gross Estate Estates with Estates that Total number of potentially qualifying claimed QFOBI estates Size of total gross estate deduction assets All estates Small ($675,000 under $2.5 million) Medium ($2.5 million under $5 million) Large ($5 million under $10 million) Very Large ($10 million or more) (1) 108,330 93,321 9,977 3,449 1,583 (2) 15,612 11,711 2,219 1,056 626 (3) 1,114 917 127 47 23 CV [1] (4) 10.3% 12.2% 18.2% 17.6% 0.4% [1] Coefficient of variation (CV), the ratio of an estimate's standard error to the estimate, is used to measure the magnitude of potential sampling error. The CVs shown refer to the number of estates that elected QFOBI. - 96 - factorS in eStateS’ utilization of SPecial tax ProviSionS Figure D—Number of Estates, Estates with Potentially Qualifying Assets, and Number that Elected DOT, by Size of Total Gross Estate Estates with Total number of Estates that potentially qualifying estates elected DOT Size of total gross estate assets All estates Small ($675,000 under $2.5 million) Medium ($2.5 million under $5 million) Large ($5 million under $10 million) Very Large ($10 million or more) (1) 108,330 93,321 9,977 3,449 1,583 (2) 15,612 11,711 2,219 1,056 626 (3) 382 147 103 86 46 CV [1] (4) 11.8% 26.5% 18.7% 13.7% 2.7% [1] Coefficient of variation (CV), the ratio of an estimate's standard error to the estimate, is used to measure the magnitude of potential sampling error. The CVs shown refer to the number of estates that elected DOT. (1) sole proprietorships, (2) partnerships, if the estate included 20 percent or more of the partnership interest or if the partnership had 15 or fewer partners, and (3) corporations, if the estate included 20 percent or more of the voting stock of the corporation or if the corporation had 15 or fewer shareholders. An executor’s decision to use these payment options is not contingent on the election of special use valuation. However, if the executor elects special use valuation, the same, lower value must be used for determining the deferred tax payments [11]. Relatively few estates for 2001 decedents chose to elect deferral of tax (DOT) due to ownership interests in closely held businesses. As shown in Figure D, an estimated 382 estates, or 0.4 percent of all estates and 2.4 percent of estates that reported closely held and agribusiness assets (potentially qualifying assets), elected to use this provision. Larger estates were much more likely to use the provision than their smaller counterparts. About 0.2 percent of small estates (those with less than $2.5 million in total gross estate) used DOT. This percentage increased dramatically as the size of gross estate increased, as 2.9 percent of the largest estates (those with $10 million or more in total gross estate) used the provision. Estates deferred more than $365.6 million in estate tax, or 58.9 percent of reported tax liabilities for those estates; closely held business assets for which tax was deferred totaled $1.3 billion. and 485 elected QFOBI. Next, we determined eligibility criteria for each provision. Ideally, the sample used for the regression analysis should include only estates that were eligible to claim the provisions. This would have allowed for a cleaner analysis of the factors that executors of eligible estates use to determine whether or not to claim a business provision. Unfortunately, eligibility cannot be directly observed in the data, as requirements for claiming the business provisions are numerous and complex, and data reported on estate tax returns are limited. Unable to observe eligibility directly, we created partial eligibility criteria based on available information. As noted previously, each provision has an eligibility requirement based on the percentage of an estate composed of farms or closely held business assets. Since SOI captures asset type information in its data editing process, it was possible to create a filter to identify potentially eligible records based on the presence of farm or closely held business assets. Using this eligibility criterion resulted in 11,187 records with potentially qualifying assets, about 30 percent of the observations in our data set. We attempted to further refine our eligibility filters by limiting our data set to returns for which the proportion of assets held in farms or closely held businesses matched the statutory requirements for each provision. The results of this process produced an unacceptable level of classification error (i.e., returns that were determined to be ineligible claimed the provisions), which may have occurred due to the difficulty in correctly coding business asset types during the data collection process. u Logistic Regression Models Using unweighted estate tax records from Years-ofDeath 1998 and 2001, we created a data set of 37,179 records. Of these, 211 elected SUV, 389 elected DOT, - 97 - gangi, Henry, and raub The Model Our initial approach was to determine one model for each provision using explanatory variables suggested by prior research. For each estate tax return i, we consider the following model on the log-odds of the probability of the taxpayer claiming a provision: where π i is the probability of taxpayer i using the provision of interest, x is the matrix of 19 explanatory variables from Figure E, and b is the vector of slope coefficients for each corresponding x -variable. We fit our model to each provision separately. Since there is some similarity between the eligibility requirements for the three provisions, the same model was fit to a dichotomous variable that indicates election or nonelection of at least one business provision. The results from these four models are displayed in Figure F. log 1 i i xi Figure E—Explanatory Variables and Their Definitions Variable Age Married, Single, Widow Retired Female Liquidity Cat 1 Definition Age, in years, of decedent at time of death Dummy variables indicative of marital status of the decedent at time of death Dummy variable indicating that decedent was retired Dummy variable indicating that decedent was female Dummy variable indicating that estate had a liquidity ratio of 0.25 or less (see endnote 12) Dummy variable indicating that the estate had a liquidity ratio of 0.25 but less than 1 Dummy variable indicating that estate had a liquidity ratio of 1.0 but less than 5 Dummy variable indicating that estate had a liquidity ratio of 5 or greater Amount, in millions of dollars, of debts owed by the estate Variable Gross estate Marginal tax rate Definition Amount, in millions of dollars, of total gross estate Projected marginal tax rate of estate prior to claiming any of the provisions Amount, in millions of dollars, of farm assets Amount, in millions of dollars, of total gross estate Dummy variable indicating that the record was from Year of Death 2001 Interaction variable of Widow and Female Interaction variable of Single and Female Interaction variable of Married and Female Interaction variable of Debts and Farm Interaction variable of Age and Retired Farm Closely held Year Liquidity Cat 2 Liquidity Cat 3 Liquidity Cat 4 Debts Widow*Female Single*Female Married*Female Debts*Farm Age*Retired - 98 - factorS in eStateS’ utilization of SPecial tax ProviSionS Figure F—Estimated Coefficients and Standard Errors, by Model SUV Variables Age Married Single Widow Retired Female Liquidity Cat 1 Liquidity Cat 2 Liquidity Cat 3 Liquidity Cat 4 Debts Gross Estate Marginal tax rate Farm Closely held Year Widow*Female Single*Female Married*Female Debts*Farm Age*Retired Estimate (SE) 0.000372 (0.00189) QFOBI Estimate (SE) -0.00076 (0.00177) DOT Estimate (SE) 0.00264 * -0.5220 * (0.2058) (0.2931) (0.2397) (1.3461) (0.3947) (0.6462) (0.3215) (0.2201) (0.2971) (0.0208) (0.0022) (0.0170) (0.0455) (0.00126) At least one provision Estimate (SE) 0.00136 (0.00118) 0.7441 * (0.3520) (0.4826) (0.3787) (1.3810) 0.7632 * (0.1988) (0.2835) -0.1175 -0.1422 -0.2398 0.3138 -0.3055 (0.1499) (0.2151) (0.1788) (0.8598) (0.3112) (0.5108) (0.2543) (0.1718) (0.1545) (0.0333) -0.2407 0.7775 * -2.3365 0.1441 -0.1933 -0.7653 -0.4038 -0.0381 (0.2275) (1.0975) (0.4134) -1.6085 -0.6373 0.0536 -1.6585 * -0.6246 * -0.0407 -0.2640 (0.5990) (0.6949) (0.3456) (0.2336) (0.3045) (0.0714) (0.0499) (0.0486) (0.0726) -0.8662 -0.5644 -0.5166 -0.6605 * -0.7907 * -0.9110 * 0.1921 * -0.3828 * 0.3741 * 0.5715 * 0.0802 (0.0817) (0.6616) (0.3297) (0.2229) (0.1946) -0.2500 -0.7576 * -0.6008 * 0.0703 (0.0633) (0.0335) (0.0335) (0.0535) (0.0240) (0.1222) -1.0798 * -1.2975 * 0.00549 0.000567 0.2000 * 0.1302 * ** ** -0.8373 * -0.9322 * -0.0585 -0.2224 * 0.5248 * 0.1363 * 0.1845 * -0.1835 0.2892 -0.00483 * (0.00194) 0.2026 * (0.0138) (0.0360) 0.1701 * ** ** 0.0812 -0.3052 (0.1415) -0.1725 (0.0950) (0.1774) (0.6468) -0.0501 0.1627 0.4174 0.5260 0.4011 (0.4541) (0.7601) -0.1213 0.2409 (0.4452) 0.4727 (0.3450) (0.5079) (0.9178) (0.6729) (0.0205) -0.4426 -0.0242 0.0267 (0.6625) (0.5228) (0.0131) (0.0167) -0.4296 0.1943 0.0316 * (0.0135) (0.4614) -0.00779 0.00198 -0.00676 (0.0103) (0.3550) 0.0141 0.0141 (0.0167) (0.0137) (0.0107) * Indicates significance at 5 percent ** Variable was excluded from model because inclusion resulted in a model convergence problem - 99 - gangi, Henry, and raub Model Results Prior to modeling the data, we expected that liquidity would have a strong, inverse relationship with the likelihood of claiming each of the three business provisions, since, for all three provisions, eligibility requires that an estate holds a certain percentage of its assets in farms or closely held businesses, i.e., illiquid assets [12]. As shown in Figure F, the expected outcome was validated, as each of the three single provision models and the combined model have significant, relatively large, negative coefficients for the highest liquidity categories. Based on our earlier findings, we further expected to find that, ceteris paribus, larger estates were less likely to claim the SUV and QFOBI provisions, but more likely to claim the DOT provision. These expectations were partially validated. Gross estate was significant in the SUV and QFOBI models with a negative coefficient. In the DOT model, gross estate had a small, positive coefficient, consistent with expectations, but it was not significant at the 5-percent level. In the combined model, gross estate has a small, but significant negative coefficient. We also expected that a higher marginal tax rate before claiming any provisions would increase the economic value of claiming a provision and would increase the log-odds. This expectation was validated, as marginal tax rate has a significant, relatively large coefficient in each of the four models. The coefficient is largest in the SUV and QFOBI models, which is unsurprising, given that these two provisions have the effect of directly decreasing the size of taxable estate. Our expectations about the significance of debt and demographic variables were less defined. The amount of debt held by an estate was significant only in the SUV model, with its positive coefficient that suggests that holding more debt tended to increase the likelihood of claiming this provision, ceteris paribus. Interestingly, while debt alone was not significant in the QFOBI model, the interaction of debts and farm assets had a significant, positive coefficient. Regarding demographic characteristics, age had a significant effect only in the DOT model, with a small, positive coefficient, suggesting that older decedents were more likely to claim this provision. Being married had a significant effect in each of the three single provision models, although the direction of this effect was varied. Ceteris paribus, married decedents were more likely to claim the SUV and QFOBI provisions, but less likely to claim the DOT provision. Widowed decedents were also more likely to claim the SUV provision than single or divorced decedents. Gender and retired status had no significant impact in any of the three single provision models, but they were significant in the combined model, with female and retired decedents less likely to claim at least one of the provisions than male decedents and single or married decedents. The significance of gender and retired status in only the combined model may be attributable to the larger number of observations in the subsample of estates that claim one or more provisions. u Conclusions Our findings reveal that, holding other factors constant, smaller estates were more likely to claim the SUV and QFOBI provisions than their larger counterparts, and that estates facing higher marginal tax rates were more likely to claim each of the three provisions. From a demographic standpoint, being married had a significant impact on the odds of claiming each of the provisions, although the direction of the effect varied. While being married increased the likelihood of claiming SUV or QFOBI, holding other factors constant, it decreased the likelihood of claiming DOT. While we believe that this research provides a starting point for understanding the factors that influence the utilization of special estate tax provisions for farms and closely held businesses, to expand our understanding of this topic, there are at least three main areas for future research. First, an approach that would specifically model the decisionmaking process that faces the executor of an estate would be enlightening. Ideally, this model would incorporate not only the choice to claim one business provision, but also the choice to claim a combination of business provisions, if eligible for more than one. In addition, the interaction of other choices, such as marital and charitable deductions, should be incorporated into this model, as should some measure of - 100 - factorS in eStateS’ utilization of SPecial tax ProviSionS the financial constraints placed on an estate by claiming these provisions. Second, when analyzing the characteristics of estates that claim these provisions, time is a factor worth examining. Estate tax returns provide a snapshot of the decedent’s assets and debts at the time of death, but reveal no information about these characteristics at earlier points in time. This is particularly relevant to our analysis because we have no way of observing what, if any, choices were purposefully made prior to death so that an estate would qualify for a business provision. While the tax law contains a provision that limits the ability of individuals to shift their assets in a tax-beneficial way prior to death, it is possible that various forms of planning are used by some individuals or their representatives in order to qualify for these beneficial business provisions [13]. Finally, while modeling with records identified by our asset eligibility criteria is clearly superior to modeling with the entire dataset, modeling with only records for estates that are eligible would provide more insight into why estates choose to elect a special business provision. While eligibility cannot be observed in the data currently available, it is possible that future changes to tax law or reporting requirements could obviate this limitation. [4] Senate committee reports, U.S. Treasury regulations, and a general explanation of the tax code. Ibid. [5] Population estimates are from “Annual Estimates of the Population for the United States and for Puerto Rico: April 1, 2000, to July 1, 2004,” Population Division, U.S. Census, Bureau, December 2004. Total adult deaths represent those of individuals age 20 and over, plus deaths for which age was unavailable. Death statistics are from Volume 52, Number 3, Table 3, Centers for Disease Control and Prevention, National Center for Health Statistics, U.S. Department of Health and Human Services, September 2003. [6] Because almost 99 percent of all returns for decedents who die in a given year are filed by the end of the second calendar year following the year of death and because the decedent’s age at death and the length of time between the decedent’s date of death and the filing of an estate tax return are related, it was possible to predict the percentage of unfiled returns within age strata. The sample weights were adjusted accordingly, in order to account for returns for 2001 decedents not filed by the end of the 2003 filing year. [7] Estate tax returns are sampled while the returns were being processed for administrative purposes, but before any examination. Returns are selected on a flow basis, using a stratified random probability sampling method, whereby the sample rates are preset based on the desired sample size and an estimate of the population. The design for the Year-of-Death 2001 study had three stratification variables: year of death, age at death, and size of total gross estate plus adjusted taxable gifts. Sampling rates ranged from 1 percent to 100 percent. Returns for over half of the strata were selected at the 100-percent rate. [8] For more information on special use valuation, see Code section 2032A in The Complete Internal Revenue Code, Research Institute of America, July 2001, p. 6,016. u Endnotes [1] Special use valuation and deferral of estate tax liability are available to estates for current deaths. However, the qualified family-owned business deduction was repealed for deaths after 2003. [2] See Gangi, Martha Eller and Brian G. Raub, “Utilization of Special Estate Tax Provisions for Family-Owned Farms and Closely Held Businesses,” Statistics of Income Bulletin, Summer 2006, Washington, D.C. This article is also available on SOI’s TaxStats Web site at http://www.irs. gov/pub/irs-soi/spestate.pdf. [3] United States Tax Reporter, Estate and Gift Taxes, Volumes I and II, Research Institute of America, 1996. This publication provides an overview of tax law, Internal Revenue Code text, House and - 101 - gangi, Henry, and raub [9] For more information on the qualified familyowned business deduction, see Code section 2057 in The Complete Internal Revenue Code, Research Institute of America, July 2001, p. 6,047. [10] In the 1997 Act, Congress provided for gradual increase in the lifetime exemption from $625,000 in 1998 to $850,000 in 2004. However, in 2001, Congress enacted legislation in the Economic Growth and Tax Relief Reconciliation Act that completely changed the landscape of estate tax law. As a result, the lifetime exemption, $675,000 in 2000 and 2001, is set to increase to $3.5 million in 2009, and the estate tax disappears entirely for deaths in 2010. [11] For more information on the deferral of taxes and installment payments, see Code section 6166 in The Complete Internal Revenue Code, Research Institute of America, July 2001, p. 9,125. [12] Liquidity ratio is defined as liquid assets (cash and cash management accounts, State and local bonds, Federal Government bonds, publicly traded stock, and insurance on the life of the decedent) divided by the projected estate tax liability prior to claiming any business provisions plus debts of the estate. [13] According to Internal Revenue Code 2057(c), most gifts given within 3 years of a decedent’s death are included in adjusted gross estate. - 102 - Corporation Life Cycles: Examining Attrition Trends and Return Characteristics in Statistics of Income Cross-Sectional 1120 Samples Matthew L. Scoffic, Internal Revenue Service E very year, the Statistics of Income (SOI) Division of the IRS produces a cross-sectional study of 1120 series corporation tax returns based on a weighted sample of the population of certain Forms 1120. The microdata from this study are used to produce tabular data for public dissemination through SOI’s Taxstats Web site and many regular and occasional paper publications. SOI also uses these data to produce custom tabulations for internal and external customers in many disciplines. While these data provide an excellent source for annual financial tabulations and for developing an understanding of the implications of tax policy for the taxpaying public, there is less focus on the implicit longitudinal characteristics of the SOI sample or the changing population of 1120 filers from which SOI draws its sample. This paper examines the extent to which business entities in the SOI sample survive, perish, or appear inconsistently, and to what extent returns from these three categories differ in certain financial characteristics. Examining these issues can provide in- sight into what types of business entities tend to survive and perish over a period of time and can provide users of SOI tabular data with insight into whether estimates are based on the same entities over time, or a sample that changes with regularity. u The SOI 1120 Sample Before examining the performance of the SOI sample over a period of years, it is first useful to understand the structure of the cross-sectional SOI sample itself. The SOI study’s target population consists of all for-profit corporations that are required to file an 1120 series tax return that is included in the SOI study. SOI studies Forms 1120, 1120-A, 1120-F, 1120-L, 1120PC, 1120-REIT, 1120-RIC, and 1120-S. The survey population consists of those returns that are selected for the SOI sample and are processed on the IRS Business Master File (BMF). SOI has been using a sample of 1120 series returns to estimate population values for over 50 years. The first SOI sample was implemented for Tax Year 1951, when 41.5 percent of the 1120 fil- Figure A—Sample and Population Size for SOI 1120 Study 1993–2003 Year 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Sample Size 91,687 95,021 97,461 94,172 98,204 137,600 140,984 144,917 146,479 145,353 141,678 Population Size 4,340,688 4,700,268 4,852,305 4,968,490 5,102,958 5,204,810 5,315,461 5,429,473 5,563,781 5,701,024 5,845,672 Sample as Percentage of Population 2.11 2.02 2.01 1.90 1.92 2.64 2.65 2.67 2.63 2.55 2.42 - 103 - Scoffic ing population was sampled. In 1951, the total number of Forms 1120 filed was 687,000, and SOI selected 285,000 returns for its study. The sample size as a percentage of the population has fluctuated over time, and, in the last tax year for which data are available, 2003, the SOI sample was 2.4 percent of the total population of over 5.8 million 1120 returns, or 141,678 returns. In the 10 years that are the focus of this paper, the SOI sample size has increased from 91,687 returns in 1993 to 141,678 returns in 2003. To determine whether an individual return is to be sampled, an algorithm is used to transform the Employer Identification Number (EIN) of the tax return, and a Transform Taxpayer Identification Number (TTIN) is produced. This TTIN can be characterized as a pseudorandom number; the same algorithm is used to produce the TTIN every year, so that the same algorithm applied to the same EIN will produce the same TTIN in any study year. This implies that, with no change in the selection probability of the applicable stratum and no change in the stratum into which the return falls, a return selected in year one should be selected in year two, providing it is present in the population (and providing it has not changed its EIN). The sample is stratified by form type, size of total assets, and income, or in some cases form type and size of total assets alone. Each stratum is associated with a sampling rate. The sampling rate is multiplied by 10,000 to create a four-digit number between 0000 and 9999. If the last four digits of the TTIN for a given return are less than or equal to this number, the return is selected for the SOI study. For example, the last four digits of a TTIN may equal 3025. If the product of the sampling rate * 10,000 is equal to 7777 (0.7777 * 10,000) for this stratum, the return will be selected for the SOI study. If the product is 2222 (0.2222 * 10,000), the return will not be selected for the SOI study. The stratum’s sampling rate determines the probability of a return in that stratum being selected. A higher value of the sampling rate for a given stratum equates to a higher probability of a return in that stratum being selected for the SOI study. This probability can range from a fraction of 1 percent to 100 percent. The rate at which returns are sampled depends on their size (measured in income and/or total assets) and form type. Generally, the sampling rates in- crease as size increases for all form types. Over the 10 years studied, sampling rates have tended to increase for most size classes and form types, but rates for some strata have declined [1]. This selection process takes place over a 24-month window of time. Typically, more than 15 percent of corporations file tax returns based on a noncalendar year accounting period. Therefore, a selection window of July through the following June is necessary for any given study year. The time necessary is extended further due to optional extensions of the filing deadline which are used by many corporations, and by administrative processing delays on the part of the IRS. A study for Tax Year X is therefore composed of returns selected from July of year X through June of year X+2. Some returns can also be added after this time if their presence in the SOI study is deemed critical [2]. Returns that would meet the sampling criteria may not be selected because they have been filed later than SOI’s deadline for selection, because the returns were not available to the SOI Division while being held by another IRS function, or because data processing errors caused the returns to fall into an incorrect stratum [3]. u Data Description In order to study the behavior of returns in the SOI sample, I compiled 10 years of selected data from SOI’s cross-sectional 1120 study, Tax Year 1994 to Tax Year 2003. To create the dataset, I first identified all unique EINs in the Tax Year 1993 study. There were 86,632 records in this dataset. I used this file as the “base year” to which I compared SOI studies from other years to determine the presence or absence of the base-year returns in subsequent years. I performed these interyear comparisons by matching datasets on EIN. For the subsequent 10 years of SOI studies from 1994 through 2003, I compiled ten datasets containing selected data items of base-year returns which were selected again in the subsequent years, and ten datasets containing selected data items of base-year returns not selected in the subsequent SOI study years [4]. In each year, I analyzed whether the base-year return was present or not in the SOI sample and compiled - 104 - corPoration life cycleS an inventory dataset for each return which represents its life cycle throughout the 10 years. This dataset contained all EINs from the base year and an observation for each subsequent study year, 1994-2003. The observation could take on a value of “0” if the return was not present in the study year, or “1” if the return was present in the study year. The dataset also contained a data item representing the life cycle of the return. This data item was a concatenation of all the study-year observations (“0” or “1”) and represented the 10-year pattern of presence or absence for each base-year return. The final data item in the dataset was a sum of all “1” or “0” study-year observations, representing the number of years in which the return appeared in the SOI study from 1994-2003. I then used the inventory dataset to group the baseyear returns into three categories based on a characterization of their life cycles over the 10 years studied. The categories used were Consistent, Inconsistent, and Terminal. I defined a Consistent return as one that is present in at least 8 out of the 10 years analyzed but has not been absent from the sample in the last 2 years, 2002 and 2003 [5]. I defined an Inconsistent return as one that was present in less than 8 years of SOI studies and was not categorized as a Terminal Return. I defined a Terminal return as one whose life cycle pattern matched one of nine specific patterns that indicate a return left the sample and never returned. Figure A shows the patterns used to characterize Terminal returns. A “1” indicates the return is present for the year, and a “0” indicates the return is absent. Each of the ten characters comprising the life cycle pattern represents a study year, 1994-2003. Because returns can be present in the SOI study and present in the population, absent from the SOI study and absent from the population, or absent from the SOI study but present in the population, I matched files of base-year returns not present in each subsequent year to administrative IRS population files to examine the ultimate status of the returns [6]. In some cases, it could be shown that, although base-year returns were missing from the SOI sample for a subsequent year, they were present in the population of 1120 filers. These returns are in general presumed not to have met the SOI selection criteria for the study year, subject to the limitations Figure B—Criteria for Terminal Return Definition Life Cycle Patterns Characterizing Terminal Returns 0000000000 1000000000 1100000000 1110000000 1111000000 1111100000 1111110000 1111111000 1111111100 From left to right, each character represents an SOI study year, 1994-2003. A “0” indicates absence from the SOI study for the year. A “1” indicates presence in the SOI study for the year. of the selection process described previously. In other cases, it could be shown that a base-year return not selected for a subsequent SOI study was not selected because it was no longer present in the population of 1120 filers. It is of use to determine which nonselected base-year returns remained in the population and are available for selection to demonstrate whether a return has simply failed to meet SOI sampling criteria or is in fact no longer required to file an individual 1120 series tax return [7]. In order to determine whether Consistent, Inconsistent, and Terminal returns differed qualitatively in terms of their financial characteristics or other characteristics, I compiled these three groups of returns and determined the means of four key financial data items and the age of the entity. I compared the means of the data items and the ages in each category and tested the differences to determine statistical significance. The four financial items compared were Total Receipts, Net Income, Total Assets, and Net Worth [8]. The age of the entity is the number of years between the date of incorporation and the base year, 1993 [9]. u Data Analysis Figure C presents the count of base-year returns present in each subsequent SOI study and filing population from 1994-2003 as well as the percentage of - 105 - Scoffic Figure C—Presence of Base-Year Returns in SOI Sample and Population SOI Study Year 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Base-Year Returns in Sample 86,632 74,303 68,122 60,948 56,465 52,750 48,842 44,728 42,154 39,998 36,159 Base-Year Returns in Population 86,632 79,243 75,965 72,585 68,633 57,734 62,674 59,257 53,743 51,683 42,414 Base Year % in Sample [1] 100 85.8 78.6 70.4 65.2 60.9 56.4 51.6 48.7 46.2 41.7 Base Year % in Population [2] 100 91.5 87.7 83.8 79.2 66.6 72.3 68.4 62.0 59.7 49.0 [1] Percentage of base-year returns remaining in sample. [2] Percentage of base-year returns remaining in population. base-year returns present in the sample and population in subsequent years. The same data are represented graphically in Figure D. In the base year of 1993, some 86,632 returns were selected for the SOI study. The number of base-year returns remaining in the SOI study declined steadily over the 10 years analyzed, with 85.8 percent, or 74,303 of the original base-year returns selected for the 1994 SOI study and only 41.7 percent, or 36,159 of the original base-year returns still present in the most recent SOI study for 2003. The number of base-year returns available to be selected from the population declined in a very similar fashion, with 91.5 percent, or 79,243 of the base-year returns remaining in the population in 1994 and 49.0 percent, or 42,414 returns remaining in the population of 1120 filers in 2003. The difference in the counts and percentages of base-year returns in the sample and population can be attributed to a number of factors. Returns which exhibit a year-to-year change in total assets and/or income may qualify for a sampling rate different than that applied in a prior year in which the returns were selected for the SOI study. Similarly, a change to the sampling rates for a stratum may cause returns that were selected in that stratum previously to no longer qualify for sample selection based on the values of their TTINs. There are other administrative and processing reasons that may prevent a negligible number of returns from being included in the SOI study. These reasons include rejection by tax examiners from the SOI study, improper coding or processing, unavailability of returns, or late filing of desired returns [10]. Since the difference between the base-year returns present in the sample and population is small and stable throughout the 10-year period, it can be concluded that the majority of returns which leave the SOI study have also left the population of 1120 filers. For example, in 1994, only 5.7 percent (4,940) of base-year returns were absent from the sample but present in the population. In 2003, this percentage had increased to only 7.3 percent (6,255). Although the SOI sample size has increased over the 10-year period studied, sampling rates for various strata have fluctuated. This means that, in addition to any base-year returns with changes in total assets and/or income becoming ineligible for sampling at prevailing rates, changes to the sampling rates in individual strata may make previously eligible returns ineligible. This helps explain why the percentage of base-year returns in the population but not the sample has increased slightly over the 10 years observed. Since larger returns are sampled at a 100-percent rate, - 106 - corPoration life cycleS Figure D Presence of Returns from Base Year Figure D—Presence of Returns from Base Year 90,000 80,000 70,000 60,000 50,000 40,000 30,000 20,000 10,000 0 1993 1994 1995 1996 Sample Sample Population 1997 1998 Year 1999 2000 2001 Population Sample 2002 2003 Number of 1993 Returns decreases in sampling rates tend to affect strata where smaller returns are located. Any decreases in sampling rates could account for a loss of base-year returns, but only if they are still available in the target population. However, since Figures C and D indicate that the majority of the base-year returns leaving the sample have also left the population, it appears that most of the missing base-year returns have not survived as individual 1120 return filers. They may no longer exist, they may file a non-1120 tax return, or they may be included in the consolidated return of another 1120 filer. When returns from the base year were grouped into categories based on their life cycle patterns, 37,614 returns were observed to be consistently present in the SOI study from 1993-2003. This category of returns was called Consistent. The number of Inconsistent returns totaled only 9,482, showing that a relatively small number of returns appeared sporadically. The Terminal return category contained a total of 39,536 returns [11]. A pronounced and statistically significant difference in the means of all the data items was observed among the various categories of returns. Figures F, G, and H summarize the means of the various categories. The statistical significance of the differences of the means was determined by performing a t-test using SAS statistical software. The results showed statistical significance above the 99-percent level for comparison of all means among all categories. The means presented in Figures F, G, and H clearly show that Consistent returns appear on average to be much larger in terms of financial characteristics than either returns that appear in the SOI study only inconsistently or returns that have dropped out of the SOI sample and most likely the population as well. Graphical representations of financial comparisons are shown in Figures J through M in the appendix. When financial items from Consistent returns are compared to those of Terminal returns, all items are larger for Consistent returns by significant margins. Average Total Receipts for Consistent returns are 2.9 times larger than the average for Terminal, Net Income 3.3 times larger, Total Assets 4.8 times larger, and Net Worth 7.5 times larger. The largest differences in the averages are between Consistent and Inconsistent returns. Average Net Worth for Consistent returns is 21.1 times that of Inconsistent. Clearly, the returns that are consistently selected for the SOI sample have higher average levels of assets and income. Although this may seem intuitive since larger returns fall into strata with higher sampling rates, in fact, the design of the sample leads to the same returns being selected each year in each stratum. Therefore, barring changes to the sampling rates of the relevant strata, a small base-year return exhibiting no drop in assets or income and no change in form type would - 107 - Scoffic Figure E By Type of Return Figure E—By Type of Return 37,614, 43% 39,536, 46% be expected in the sample again, just as would a large return in a stratum with a 100-percent selection rate. In practice, sampling rates for certain strata have declined at times. Most base-year returns that are not selected are demonstrably not in the population, but, for those smaller base-year returns that are in the population and are not selected, sampling rate changes are a possible TERMINAL explanation. INCONSISTENT CONSISTENT Terminal Inconsistent Consistent 9,482, 11% Figure F—Consistent Returns Figure F: Consistent Returns Variable Total Receipts Net Income Total Assets Net Worth Age N 37,744 37,744 37,744 37,744 37,744 Mean $136,238,155 $8,215,763 $304,742,101 $109,835,169 19.4 Standard Deviation $1,498,106,574 $96,288,521 $3,776,946,351 $902,754,411 21.0 To conduct a more detailed analysis of the three categories of returns, I created another data item called Size. This data item was determined by the size of total assets of the return. Returns with less than $10,000,000 in total assets were defined as “small,” returns with between $10,000,000 and $249,999,999 in total assets “medium,” and returns with $250,000,000 or more in total assets “large.” I then grouped each of the three “consistency” categories of returns into subgroups of small, medium, and large returns to analyze differences in mean financial characteristics and mean age by both consistency and size. After segmenting returns based on both their consistency and their size, it was observed that large returns made up a considerably higher percentage of consistent returns than they did inconsistent or terminal returns. For consistent returns, 16.6 percent were large, whereas only 1.6 percent and 5.5 percent were large for Inconsistent and Terminal respectively. Conversely, small returns tended to make up a much larger percentage of Inconsistent and Terminal returns, as is indicated by Figure I. The attrition rate was defined as the percentage of returns within each size category—small, medium, and large—which was ultimately classified as Terminal. Large returns had the lowest attrition rate at 26.4 percent, followed by medium-sized returns, (36.4 percent). Small returns had the highest attrition rate at 55.0 percent. This may partially be due to the fluctuating sampling rates for smaller returns, but, since most nonselected returns were also not present in the population, most of these taxpayers did not file individually [12]. Examining Figure I can provide insight into why the averages of selected financial items tend to be much higher for Consistent returns than the other categories. The averages for Consistent returns are based on a much Figure G: Inconsistent Returns Figure G—Inconsistent Returns Variable Total Receipts Net Income Total Assets Net Worth Age N 9,459 9,459 9,459 9,459 9,459 Mean $25,796,330 $220,453 $37,207,485 $6,618,853 14.8 Standard Deviation $238,476,363 $14,196,113 $444,127,898 $70,868,775 16.6 Figure H—Terminal Returns FigureH: Terminal Returns Variable Total Receipts Net Income Total Assets Net Worth Age N 39,926 39,926 39,926 39,926 39,926 Mean $77,461,225 $3,222,766 $205,827,618 $43,992,315 15.7 Standard Deviation $814,956,006 $58,191,247 $3,493,116,498 $583,865,566 19.6 - 108 - corPoration life cycleS Figure I—Return Counts by Size and Consistency with Attrition Rate Consistent Small Medium Large $250,000,000 or more. Inconsistent 6,959 (73.6%) 2,322 (24.5%) 178 (1.9%) Terminal 25,479 (63.8%) 11,789 (29.5%) 2,658 (6.7%) Attrition Rate 49.5% 40.9% 39.0% 19,041 (50.4%) 14,719 (39.0%) 3,984 (10.6%) Small returns are those with less than $10,000,000 in assets, Medium with $10,000,000 to $249,999,999 in assets, and Large with Percentages following counts indicate the percentage of the total count for the group of Consistent, Inconsistent, or Terminal. Attrition rate is the percentage of the total number of base-year returns in this size category which were categorized as Terminal returns. higher proportion of large returns than are the other categories. As a function of the definition of large returns, these financial items will tend to be greater on returns with more assets, so that averages based on a higher proportion of large returns will be greater. All means and standard deviations of financial items and ages by consistency and size are reported in the appendix. In addition to being on average larger in terms of these selected financial items, this comparison indicates that Consistent returns tend to be older than Inconsistent or Terminal returns. Age was defined in years as the base year (1993) minus the year of incorporation. The average age of returns consistently in the SOI study is 19.7 years. The average ages of both Inconsistent and Terminal returns are lower at 14.6 years and 15.9 years, respectively. With most of the base-year returns missing from the SOI study also missing from the population of 1120 filers, the analysis indicates that, on average, business entities that were older in the base year tended to survive longer [13]. Younger returns were more likely to be Inconsistent or Terminal. A graphical comparison of mean ages is shown in Figure N. Of particular interest is the difference in mean ages of large Consistent, Inconsistent, and Terminal returns. The mean age of large Consistent returns is 20.6 years, while the mean ages of large Inconsistent and Terminal returns are 22.4 years and 24.8 years, respectively. The difference between large Consistent and large Inconsistent returns is not statistically significant, but the difference between large Consistent and large Terminal returns is significant at the 99-percent level. Although returns of all sizes exhibit higher mean ages for Consistent returns than for Inconsistent or Terminal returns, breakouts by size showed that large Consistent returns were younger on average than large Terminal returns. u Conclusions and Further Research The analysis showed that the majority of base-year returns which left the SOI sample also left the population of 1120 filers, indicating that the SOI sample selects the same entities from year to year when those entities are available in the population. Therefore, even though a small number of returns exited the SOI study due to changes in sampling rates, the conclusions drawn from analysis of the SOI studies largely apply to the population of 1120 filers as well as to the sample. After analyzing 10 years of data from SOI samples and 10 years of population data from IRS Business Master Files, 41.7 percent of the base-year returns were shown to be present in the latest SOI study and 49.0 percent of base-year returns present in the filing population. With the lowest attrition rate of all groups, large business entities are more likely than smaller business entities to remain in the SOI sample and in the filing population. The group of returns defined as Consistent exhibited a larger proportion of returns with $250,000,000 or more in total assets than the other two categories of returns, and large returns made up the smallest proportion of Terminal returns at 5.5 percent. The surviving business entities also tended to be older on average than business entities that fell out of the population or were - 109 - Scoffic Figure J $160,000,000 $140,000,000 $120,000,000 $100,000,000 $80,000,000 $60,000,000 $77,461,225 $136,238,155 Total Receipts Figure M $9,000,000 $8,000,000 $7,000,000 $6,000,000 $5,000,000 $4,000,000 $3,222,766 $3,000,000 $8,215,763 Net Income $40,000,000 $25,796,330 $20,000,000 $2,000,000 $1,000,000 $0 $220,453 CONSISTENT INCONSISTENT TERMINAL $0 CONSISTENT INCONSISTENT TERMINAL Figure K $350,000,000 $304,742,101 Total Assets $250,000,000 $205,827,618 Figure N 25 $300,000,000 Age 20 19.4 $200,000,000 15 14.8 15.7 $150,000,000 10 $100,000,000 5 $50,000,000 $37,207,485 0 $0 CONSISTENT INCONSISTENT TERMINAL CONSISTENT INCONSISTENT TERMINAL Figure L $120,000,000 $109,835,169 $100,000,000 Net Worth $80,000,000 not selected for SOI studies. This relationship was not true for the group of large returns however, where Consistent returns were slightly younger on average than Terminal returns. The next steps in corporation life cycle research will be to define specific reasons for attrition from the SOI sample and population and to more fully explain attrition based on these reasons. This research should include the assembly of corporate family structures capable of accounting for previously individual returns which become part of consolidated groups. A predictive model could be implemented to determine if financial relationships are predictive of presence in the SOI sample or population. - 110 - $60,000,000 $43,922,315 $40,000,000 $20,000,000 $6,618,853 $0 CONSISTENT INCONSISTENT TERMINAL corPoration life cycleS u Endnotes [1] For a complete history of sampling rates for all sizes and form types, see SOI’s annual Publication 16, Corporation Income Tax Returns. [2] For an explanation of critical returns, see SOI’s annual Publication 16, Corporation Income Tax Returns. [3] For a more detailed description of SOI’s sampling process and studies, see the most recent version of SOI’s annual Publication 16, Corporation Income Tax Returns. [4] For datasets where the returns were not present in the SOI sample, the data items were populated with values from the most recent SOI study in which the returns were available. [5] A return that was missing from the population in 2002 and 2003 would qualify as Consistent if it was present in all earlier years because the sum of all presence observations would total eight. A classification of Terminal is more desirable because the return is not present for the latest 2 years and will presumably not return. [6] SOI maintains a file of return transaction data extracted annually from the BMF. This file contains a code that indicates whether an 1120 return was processed on the BMF for a given EIN at any time in the Processing Year, roughly equivalent to a Calendar Year. The file also contains a tax period indicating the year to which the transaction relates. [7] The entity formerly filing its own 1120 return may no longer do so because it is included in the consolidated filing of another return or group of returns with a different EIN. [8] For SOI’s definition of financial items, see Publication 16, Corporation Income Tax Returns. [9] Age was calculated and carried through the analysis as of the base year rather than recomputed each year because increasing appearances in SOI studies would correlate directly with increasing age. [10] For descriptions and counts of unavailable returns, see SOI’s Publication 16, Corporation Income Tax Returns. [11] The sum of Consistent, Inconsistent, and Terminal returns does not equal the total of the base-year returns due to legitimate “duplicate” records. Duplicate records can be present in one study when part-year returns are selected in addition to full-year returns. [12] These entities may be filing a non-1120 type return or may be included in the consolidation of another return or group of returns. [13] Entities counted as not surviving may be filing a non-1120 type return or may be included in the consolidation of another return or group of returns. u References Internal Revenue Service, Statistics of Income–2003, Corporation Income Tax Returns, Washington, DC, 2005. Internal Revenue Service, Statistics of Income Bulletin, Summer 2006, Washington, DC, 2005. - 111 - Scoffic u Appendix Consistent Returns Size Small Appendix Data Item Mean $6,371,580.79 $120,879.88 $1,807,835.87 $639,986.34 16.4479282 $53,895,910.61 $2,511,693.13 $69,825,074.13 $29,494,265.47 22.7388410 $1,061,133,974 $67,978,026.03 $2,620,483,834 $928,540,800 21.5155622 Standard Deviation $57,384,713.78 $4,079,558.5 $2,312,005.37 $ 4,270,068.29 16.6014683 $106,779,628 $7,407,540.48 $57,974,136.63 $44,890,136.91 24.0182814 $4,499,784,062 $289,082,191 $11,364,833,471 $2,638,900,731 25.4626241 Total Receipts [1] Net Income Total Assets Net Worth Age Medium Total Receipts Net Income Total Assets Net Worth Age Large Total Receipts [2,3] Net Income [4] Total Assets [2,3] Net Worth Age [2] Inconsistent Returns Size Small Data Item Mean $4,077,602.06 -$34,503.10 $1,479,486.82 $200,645.81 13.2152608 $41,511,957.43 $-598,765.04 $43,880,737.74 $8,721,769.96 18.8165375 $669,891,521 $20,874,759.10 $1,346,959,444 $230,109,460 24.9157303 Standard Deviation $15,518,169.88 $1,936,312.34 $2,162,763.78 $4,779,648.44 14.5542741 $79,428,394.05 $13,179,286.11 $44,024,985.24 $62,242,205.94 20.1862701 $1,583,578,000 $88,900,726.62 $2,956,099,587 $405,911,755 25.7444784 Total Receipts [5] Net Income [5] Total Assets Net Worth [5] Age [5] Medium Total Receipts Net Income Total Assets Net Worth Age [6] Large Total Receipts [5] Net Income [5] Total Assets [5] Net Worth [5] Age [5] Footnotes at end of table. - 112 - corPoration life cycleS u Appendix—Continued Terminal Returns Size Small Data Item Mean $4,952,880.42 -$71,616.51 $1,382,087.57 $133,487.37 12.9184034 $47,605,901.58 $1,147,350.28 $67,945,915.83 $17,690,263.35 20.0385105 $904,927,191 $44,007,051.15 $2,777,142,544 $580,019,080 23.2558315 Standard Deviation $70,038,460.90 $6,520,985.17 $2,069,756.45 $5,351,577.13 14.8322453 $95,661,811.13 $9,267,561.22 $57,212,181.19 $59,872,085.44 24.1205414 $3,025,364,570 $219,787,529 $13,275,372,904 $2,190,282,973 29.4368933 Total Receipts Net Income Total Assets Net Worth Age Medium Total Receipts Net Income Total Assets Net Worth Age Large Total Receipts Net Income Total Assets Net Worth Age Difference across means statistically significant at the 99-percent level unless otherwise noted. [1] Difference between Consistent and Terminal statistically significant only at the 97-percent level. [2] Difference between Consistent and Inconsistent not statistically significant. [3] Difference between Consistent and Terminal not statistically significant. [4] Difference between Consistent and Inconsistent statistically significant only at the 97-percent level. [5] Difference between Inconsistent and Terminal not statistically significant. [6] Difference between Inconsistent and Terminal statistically only at the 97-percent level. - 113 - An Analysis of the Free File Program Michelle S. Chu and Melissa M. Kovalick, Internal Revenue Service T he Restructuring and Reform Act of 1998 (RRA 1998) stated that the Internal Revenue Service (IRS) should set goals to have at least 80 percent of all Federal tax and information returns filed electronically by 2007. There are many benefits of electronically filing tax returns; tax law compliance is improved, the IRS reduces operating costs by reducing the need for human inputs to transcribe data, and transcription errors are eliminated. The electronic file (e-file) program began in 1986. During the 2006 filing season, an estimated total of 83.1 million tax returns were filed electronically (IRS Document 6186), including individual income, corporate, partnership, excise, and exempt organization tax returns. About 73.0 million individual income tax returns were e-filed during the 2006 filing season. While many factors affect the growth of the e-file program, this paper focuses on the Free File Program, which provides taxpayers with access to free online tax preparation and e-filing services. Although data on the Free File Program is limited, this paper will present a demographic overview of Free Filers. In addition, an overview and analysis of the Program will be provided. One question that arose during the development of the Free File Program was why the Federal Government would partner with private industry instead of creating its own software for free-file purposes. When the Department of the Treasury announced new efforts to expand the e-file program in January 2002, Secretary Paul O’Neill asked then-IRS Commissioner Charles Rossotti to partner with the private sector. O’Neill stated that it was not his intent “for the IRS to get into the software business, but rather to open a constructive dialogue with those who already have established expertise in this field. In the end, this effort should come up with a better way to save time and money for both taxpayers and the Government” (Office of Public Affairs, PO-964). Since software companies had already proven their knowledge in the area of electronic tax services, working with private industry has several advantages. It encourages competition, gives taxpayers more choices, and reduces costs to the American public. u Benefits and Objectives of the Free File Program The Free File Program has four main objectives: to increase e-file penetration, provide more free online options to taxpayers, ease tax preparation and filing, and provide greater access to taxpayers. The e-file option offers the advantages of reduced burden on filers and quicker refunds, and the Free File Program exposes these benefits to taxpayers who may have previously prepared and filed paper returns. In addition, promoting the Free File Program on the IRS Web site might alleviate taxpayers’ concerns about the security of the e-file process. On October 30, 2002, the original Free Online Electronic Tax Filing Agreement was signed by the IRS Commissioner and the Manager of the Free File Alliance, LLC. The Free File Alliance is a group of software companies who provide free commercial online tax preparation and e-filing services. The agreement had an u Overview and History of the Free File Program The Free File Program was developed in response to President Bush’s E-Government initiative and the Office of Management and Budget’s EZ Tax Filing Initiative, with the assumption that providing free e-filing services to the majority of taxpayers would help meet the 80-percent e-file target established by RRA 1998. Although some private sector firms offered free e-file services to limited groups of taxpayers in the past, the Free File Program marked an innovative approach by making free services consistently available to the majority of taxpayers on a multiyear basis. - 115 - cHu and kovalick initial term of 3 years, followed by automatic options to renew for successive 2-year periods. When this agreement expired, a revised agreement was signed which extended the terms from October 30, 2005, through October 30, 2009. As of October 2006, analysis of the Free File Program is limited due to the availability of data. Although the program has been in existence for 4 years, in the initial years, data related to Free-Filed returns were the property of members of the Free File Alliance, not the IRS. The IRS did not begin to identify free-filed returns until the 2006 Filing Season (Tax Year 2005). Limited quantitative data from prior years is available via survey results from studies conducted by Russell Marketing Research and Foote, Cone, and Belding, and volumes of free filers provided by the software companies. However, use of this data is restricted for proprietary reasons. Another constraint is that complete filing season results for 2006 were not available at the time this paper was written. The deadline for Form 4868, Application for Automatic Extension of Time To File U.S. Individual Income Tax Return, to be filed was October 16, 2006, and the data used in this analysis were current as of September 26, 2006. isfactory level of quality, the members were required to submit test returns for certification prior to being identified as members of the Alliance on the Web page. In addition, all members must have a security and privacy seal certificate from a third party. The certification process was based on an assessment of the member system’s ability to protect taxpayer data and privacy concerns. The agreement also specifies the guidelines for operating the Alliance Web Page on the IRS site. The IRS will host and maintain the Web page, but the Alliance will determine the final content of the Web site. This includes determining the rank order placement of the links to individual offerings, presence of a link to the free services, and prohibition of advertisements on the Free File Web page. The IRS must be notified if an offering will be unavailable for 5 hours or more, and IRS has the authority to delist a member if its service remains unavailable for more than 24 hours. Marketing issues are explored in the agreement. Although the IRS will promote the availability of the free services, it will not specifically endorse products. The IRS and the Alliance will also explore ways to support Federal/State filing of returns through the Free File Program. The option of IRS offering free e-filing services also remains open. If the IRS notifies the Alliance of this decision to offer free e-filing services during the primary filing season, the Alliance may terminate the agreement effective April 16. After three successful filing seasons, the agreement between the IRS and the Alliance was extended for an additional period of 4 years (October 30, 2005, through 2009) with amendments stemming from lessons learned from the first agreement. The new agreement specified an aggregate coverage of 70 percent of taxpayers. The volume of taxpayers eligible to use the free service would change each filing season. In the first year of the new agreement, Filing Season 2006, some 93 million taxpayers qualified to use the service. The IRS will use the most current Adjusted Gross Income (AGI) number that equates to 70 percent of all individual income taxpayers. However, no single alliance member can cover more than 50 percent of total taxpayers. Also new to the agreement was the introduction of Form 4868. u The Free Online Electronic Tax Filing Agreements The initial agreement between the IRS and the Alliance was executed on October 30, 2002. The arrangement covers a wide array of topics such as performance standards, scope of marketing efforts, terms of terminating the agreement, and the operation of the Alliance Web page. The contract specifies that, in total, Alliance members must provide the free e-filing option to at least 60 percent of all individual income taxpayers during the primary tax filing season (January through April). If the Alliance fails to reach the 60-percent coverage, the group must raise the coverage within a 6-month period. In addition, each individual Alliance member must provide this free service to cover at least 10 percent of the total individual income tax returns filed. The agreement also addresses disclosure issues, privacy, and security provisions. In order to ensure sat- - 116 - an analySiS of tHe free file Program A number of amendments to the program content were included in the new agreement. The first topic addressed Refund Anticipation Loans (RALs). Although less than 1 percent of the 2.8 million Free File users in Tax Year (TY) 2002 opted for RALs, this was one of the key issues addressed in the new agreement. Both parties agreed that RALs may be offered by the members under several guidelines. The offer of free online service cannot be conditional on the purchase of a RAL. The language must clearly indicate that a RAL is a short-term loan and must be repaid within a certain time, independent of the refund issued by IRS. All fees and interest rates associated with RALs must be disclosed. Finally, RALs cannot be promoted, and some Alliance firms will not offer RAL products, thus ensuring that consumers have RAL-free options. During the first 3 years of the program, IRS relied on the Alliance members to provide the number of returns that were Free Filed through their respective offers. One of the amendments included an agreement that the Alliance members would provide an electronic Free File indicator. In return, the IRS confirmed that they will not build a marketing database or compile company-specific proprietary data. Although the IRS cannot refuse to comply with requests from Governmental agencies and Congress, the IRS will promptly notify the Executive Director of the Alliance if this information is provided. The Alliance members will then have the option to cease providing the indicator. Also, amendments addressed Web site compliance measures and customer satisfaction surveys. The performance standard was placed at a 60-percent acceptance rate, and additional privacy and security issues were addressed. 4.0 million returns (a 22-percent decrease) in TY 2005 from the 20-member Alliance. The initial agreement specified a minimum coverage of 60 percent, which the members abided by in the first two filing seasons. In the third year of the program, one of the Alliance members decided to offer the free preparation and filing service to all taxpayers (TIGTA 200640-171). Other members followed, and, in TY 2004, all 100 percent of taxpayers had the option to Free File. This was the main contributing factor to the 46-percent increase in Free-Filed returns in Filing Season 2005. This caused some friction among the Alliance members, and the existence of the Alliance was threatened. Hence, one of the amendments included in the new agreement includes the stipulation that no single member can offer more than 50-percent coverage. Since the past filing season represents the first year the IRS started identifying the Free-Filed returns, the consistency of prior-year data cannot be verified for accuracy. Projections of Free File volumes produced by the IRS indicate that almost 5.0 million returns are expected to be Free Filed in TY 2006. This represents a 25-percent increase from the TY 2005 filing season. The volume is expected to reach almost 6.0 million by TY 2009. u Weekly Trends Although Free Filers reflect the early filing patterns of the overall e-filers, calculation of the cumulative weekly filing percentages show that the Free Filers generally filed even earlier in the filing season compared to the total electronically-filed returns. The comparisons are based on the TY 2005 filing results. By the end of January, 9 percent of Free Filed returns had been filed compared to less than 8 percent of total e-filed returns. However, the difference increased to over 7 percent in early February and another percentage towards the end of the month. More than half of the Free Filed returns (56 percent) were received by the end of February, versus 48 percent of total e-filed returns. The gap continues to range from 3 percent to almost 8 percent until the end of the primary filing season. By April 20, approximately 97 percent of Free Filed returns, and 95 percent of total e-file returns, were filed. u Free File Volumes The unprecedented alliance between the IRS and the private sector to offer free e-filing services met with success from the start. In the first year of the program (Filing Season 2003), 2.8 million returns were filed through the 17-member Alliance. The second year resulted in a more than 26-percent increase, with 3.5 million returns filed through the 17-member Alliance. The third and the most recent filing years resulted in 5.1 million Free Filed returns (a 46-percent increase) in TY 2004 and almost - 117 - cHu and kovalick TY 2005 Cumulative Weekly Filing Percentages 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Total E-file Free File Source: Electronic Tax Administration Data u Tax Year 2004 Demographics In order to gather more information about Free File Program users, the Electronic Tax Administration within the IRS contracted with Russell Marketing Research and Foote, Cone, and Belding to implement an online survey of taxpayers who Free Filed their TY 2004 individual returns. The purpose of the survey was to obtain results which would be used to further develop marketing campaigns for the Free File Program. Each eight-hundredth Free Filer was asked to complete the online survey. The contractors collected the results which were summarized by research teams within IRS’s Wage and Investment Division (W&I Research Project 6-05-08-2-038N). Although these results provide an overview of Free Filers, they must be interpreted with caution. Participation in the survey was voluntary, and many taxpayers opted not to complete the questionnaire, leading to an estimated response rate of 2 percent. Thirteen of the 20 Free File Alliance members offered the online survey. In addition, not all of the participating companies offered the survey at the start of the filing season, and some companies did not initially follow the skip pattern (offering the survey to the 800th filers). However, by February 14, all 13 Alliance members who participated in the survey were offering it according to the agreed-upon pattern. For the purposes of this paper, only those surveys collected after February 14 are included in the analysis. According to survey results, 17 percent of taxpayers who Free Filed in Filing Season 2005 were first-time filers. Of the remaining 83 percent who had previously filed Federal income taxes, 29 percent were e-filing for the first time. Some 78 percent of this group of prior paper filers self-prepared their tax returns during the previous filing season. Of the approximately 70 percent of respondents who had used e-file methods during the prior filing season, only 2 percent claimed to have used the TeleFile Program. About 41 percent used tax preparation software, and 15 percent e-filed via tax preparers. The remaining 42 percent stated that they used Free File in the previous year. When questioned about previous use of Free File, 51 percent had used the program in prior years; about 49 percent of those surveyed were first-time Free Filers. Based on survey responses, Free File participants share certain demographic characteristics. Over half (52 percent) claimed a single filing status. Some 32 percent were married filing jointly, and 14 percent filed as heads of households. The remaining 2 percent were married filing separately or qualifying widows. Some 50 percent of Free Filers were 35 years or younger. About 42 percent had a pretax income of less than $25,000, and 56 percent reported a pretax income of less than $35,000. About 16 percent of survey responders reported that they claimed the Earned Income Tax Credit on their 2004 Federal income tax returns. Almost 90 percent of respondents were owed a refund in Filing Season 2005. Respondents were also asked about their future plans to e-file tax returns. Some 75 percent stated that they would use e-file again in the future, and an additional 21 percent expressed that they would be likely to e-file future returns. Only 1 percent indicated that they would either file (or probably file) a paper return in upcoming filing seasons. When asked about how they heard about the Free File Program, responses covered a range of topics. Communication from the IRS was the most likely source for hearing about Free File; some 49 percent of respondents heard about the program from either information on the IRS Web site, tax forms, or IRS mailings. Specific n Ja n Ja n Ja b Fe b Fe b Fe b Fe 2 ar M 23 23 ar M 16 ar M 9 ar M 0 r2 Ap 3 r1 Ap r6 Ap 30 ar M n Ja 5 12 19 26 2 9 16 - 118 - an analySiS of tHe free file Program responses indicated that 35 percent learned about the program from the IRS Web site, and 22 percent heard about it from relatives or colleagues. u TY 2005 Demographics of Free Filers Analysis of TY 2005 Free Filed returns (which was the first year Free File data were flagged by the IRS) illustrated several interesting characteristics of Free Filers. The data showed that Free Filers are mostly in their twenties with a single filing status and have relatively low AGIs. Most received refunds. About 47 percent of Free Filers were between the ages of 20 to 29, and an additional 12 percent were between the ages of 16 and 19. Some 73 percent of Free Filed returns indicated Single filing status, while 15 percent of returns were Head of Household, and 11 percent were Married Filing Jointly. Over half of the returns had AGI of less than $17,000, while 19 percent had an AGI greater than or equal to $17,000 but less than $25,000, and 17 percent had an AGI greater than $24,999 but less than $35,000. Of the 3.8 million Free Filed returns, 96 percent were refund returns with an average refund amount of $1,300. This compares to 88 percent of total e-filed returns (IRS Document 6187) which were estimated to be refund returns. The data indicated that 34 percent of the returns were the long and more complicated form type (Form 1040). The short form, Form 1040EZ, constituted an additional 38 percent of the returns. Around 5 percent of total electronically-filed individual returns were filed through the Free File Program. An analysis of how TY 2005 Free Filers filed their tax returns in the previous year (TY 2004) showed that the Free File Program is contributing to the growth of the overall e-file program. As expected, not all Free Filers are first time e-filers. About 66 percent electronically-filed their returns in TY 2004. Some 39 percent of these filed online, while 17 percent used the TeleFile Program, and the remaining 10 percent e-filed via practitioners. However, 17 percent of TY 2005 Free Filers had paper-filed their tax returns in TY 2004. Furthermore, almost 42 percent of this population (TY 2004 paper filers) had V-Coded their returns, meaning that they prepared their returns on the computer but printed the returns and mailed them in as paper returns. In ad- dition, about 18 percent of current Free Filers are new filers who did not file a return in TY 2004, indicating that the Free File Program is attracting new taxpayers to the e-file program. u State Level Data and Participation Rates—Tax Year 2005 An analysis of State-level data (including the District of Columbia) yielded several interesting patterns in terms of Free Filers during the 2006 Filing Season. Although these results are based on one filing season, future studies may result in more conclusive relationships among demographic variables and participation in the program. To calculate the Free File participation rate (FFPR) per State, a ratio was calculated based on each State’s number of Free Filed returns as a percentage of that State’s total return volume (including paper and electronic volumes). The FFPR for the U.S. was 1.30 percent in TY 2005, with State levels ranging from 4.40 percent in Ohio to 1.64 percent in New York. The average state FFPR was 3.15 percent. The 10 States with the highest FFPR were Ohio, South Dakota, Wisconsin, Maine, West Virginia, Nebraska, Utah, Oklahoma, Idaho, and North Dakota. These States represent a broad range of geographic locations, State sizes, and total populations. Using age and population data from Global Insight, Inc., it was determined that 3 of the 10 States with the highest FFPRs—Utah, Idaho, and North Dakota—also ranked in the 10 U.S. States with the highest ratio of residents in the “15-to-34-year-old” age range. This range includes teenagers and those entering the workforce for the first time who would be likely to have lower incomes and meet the AGI limit. State-level per capita income was also analyzed to determine if States with lower per capita incomes had higher FFPRs. West Virginia, Utah, and Idaho were within the 10 states having the lowest per capita incomes, which may indicate that States with lower incomes have more participation in the program, particularly if the States (like Utah and Idaho) also have a high percentage of younger residents. States with the lowest FFPRs tended to have higher per capita income levels. The 6 States with the highest per capita incomes were - 119 - cHu and kovalick the District of Columbia, Connecticut, Massachusetts, New Jersey, Maryland, and New York. With the exception of Massachusetts, the other States with higher per capita incomes were skewed toward having the lowest FFPRs. The District of Columbia was 37th, and the other 4 high-income States ranked in the bottom 10 in terms of FFPR, with New Jersey and New York having the lowest participation rates of all States. and the company selection process could be improved. About 82 percent were satisfied with the Free File pages and did not think the pages could be improved. Early surveys of taxpayers’ attitudes toward e-file indicated some level of concern about the security of online transactions with the IRS (RMR March 2003). However, over half of the respondents (54 percent) felt very confident that the information they provided during the Free File process was secure; 42 percent indicated that they were somewhat confident. Although the majority of responses were highly favorable, increasing the level of confidence in the security of the Free File process represents an area that the IRS and the Alliance can work to improve in the future. In terms of deciding which provider to use, no one factor appears to dominate the decisionmaking process. Some 21 percent based their decisions on a software company they had used in the past, while 19 percent used a company recommended by family or friends, and 14 percent based their decisions on the criterion that the company’s “offer met my needs.” Only 11 percent of respondents based their decisions on the company’s reputation. Those using the “Guide Me to A Service” feature were far more likely to indicate that the deciding factor in selecting a company was the fact that the company was suggested by this IRS-provided feature. Some 55 percent responded that they would use the same tax provider next year, and 36 percent said that they would probably use the same company again. Only 1 percent said that they would definitely not use the same company again. Survey results for Filing Season 2006 indicated that more Free Filers learned about the program from family or colleagues (over one-third gave this response) than in Filing Season 2005. About 40 percent cited the IRS as their initial source of information about Free File, a drop from the almost 50 percent who gave this response in the Filing Season 2005 survey. Although 89 percent felt that the initial information they received provided sufficient knowledge of the program, only 49 percent stated that their initial source mentioned the income limit of $50,000 for using the Program. u Tax Year 2005 Survey Results—Free Filer Attitudes For TY 2005, the IRS again contracted with Russell Marketing Research to conduct telephone interviews of Free Filers. The sample consisted of 1,800 Free Filers who were selected from lists provided by the IRS. Although this survey yielded some demographic data similar to the survey efforts of the prior filing season, the objectives were to determine the overall usage and perception of Free File, the usage and evaluation of specific site features, and other learning experiences. Data collected regarding the overall usage and perception of the Free File Program was highly favorable; some 94 percent of respondents indicated that they would like to use the program again, while 97 percent said they would recommend the program to friends or family. In terms of improving the program, 30 percent of respondents had suggestions for improvement. Among the feedback offered was making Free File easier to use (7 percent), increasing awareness of the program (4 percent), removing the income criteria (4 percent), and providing more information on the tax preparation companies (4 percent). In terms of ease of using Free File, 60 percent of those surveyed rated the experience as very easy, and 34 percent rated it as somewhat easy. About 1 percent responded that the experience was very difficult. Free Filers who used step-by-step instructions, the frequently asked questions guide, and the “Guide Me to A Service” feature rated the program as easier to use than those who contacted the Help Desk for assistance. Among those who felt that the Free File Program Web site and pages could be improved (18 percent of respondents), 25 percent indicated that the pages should be easier to use - 120 - an analySiS of tHe free file Program u Conclusion and Future of the Free File Program Although there is concern that Free File volumes seemed to decline in Filing Season 2006, the program is considered to be an overall success. According to the Electronic Tax Administration Advisory Committee’s (ETAAC) 2006 Annual Report to Congress, the Program’s most positive accomplishment was attracting 4.0 million taxpayers to the e-file program, including many who would have not otherwise used e-file. This growth occurred at no cost to the IRS, taxpayers, or the American public. The Treasury Inspector General for Tax Administration (TIGTA) also conducted a review of the Free File Program in 2006. The report agreed that the amended Agreement added new levels of taxpayer protection, security, and performance standards. TIGTA does acknowledge that many of these issues resulted from the unique relationship the IRS must maintain with the private sector for the program to work, with the realization that the IRS cannot entirely control the program. TIGTA also recommends that the IRS improve Free File options offered to Spanish-speaking taxpayers via the IRS Web site. In response to the TIGTA report, the IRS will conduct a study to evaluate providing a Free File entry portal in Spanish. The IRS will begin discussions with the Multilingual Language Initiative Strategy Office, the Electronic Tax Administration, and representatives from the Free File Alliance to discuss the resources, requirements, and funding needed for this effort. It is anticipated that the decision to provide a Spanish entry portal will be made in 2007. New for TY 2006, the IRS will offer Form 1040EZ-T, Claim for Refund of Federal Telephone Excise Tax, to those taxpayers who will be filing a Federal return for the sole purpose of claiming the TETR. This may result in several hundred thousand Forms 1040EZ-T filed via the Free File Program. The cessation of the TeleFile Program in TY 2004 will also continue to have implications on Free File volumes. As of April 27, 2006, over 650,000 returns that were filed via the TeleFile Program in Filing Season 2005 came in through the Free File Program during Filing Season 2006. This represents almost 20 percent of the total TeleFile returns from Filing Season 2005. Since its inception, the Free File Program continues to evolve and make valuable contributions to the e-File Program while reducing taxpayer burden. It offers another e-file option when other programs, like TeleFile, end. As the program prepares to offer Forms 1040EZT and Spanish language option, it continues to be an innovative arrangement benefiting taxpayers, private companies, and the IRS. u Acknowledgments The authors would like to thank Mark Heinlein, Wayne Mercado, and Jose Plazza for their valuable contributions and assistance in preparing this paper. u References Electronic Tax Administration Advisory Committee, Annual Report to Congress, June 2006, Publication 3415, Catalogue Number 28110R. Russell Marketing Research, “Findings from Focus Groups among Taxpayers with Self-Simple Returns,” March 2003. Russell Research, “Report of Findings from the 2006 Free File Cognitive and Behavioral Research,” July 2006. U.S. Department of the Treasury, Internal Revenue Service, Document 6186, Calendar Year Return Projections for the United States and IRS Campuses, 2006-2013 (Revised Spring 2006). U.S. Department of the Treasury, Internal Revenue Service, Document 6187, Calendar Year Projections of Individual Returns by Major Processing Categories, 2006-2013 (Revised Spring 2006). U.S. Department of the Treasury, Internal Revenue Service, “Free File Survey Analysis,” Wage and - 121 - cHu and kovalick Investment Research Group 6, Research Project 6-05-08-2-038N, August 31, 2005. U.S. Department of the Treasury, Office of Public Affairs, “Treasury, IRS Announce New Efforts To Expand E-Filing,” January 30, 2002, PO-964. U.S. Department of the Treasury, Treasury Inspector General for Tax Administration, “Use of the Free File Program Declined After Income Restrictions Were Applied,” September 29, 2006, Reference Number 2006-40-171. u Data Sources Electronic Tax Administration Research and Analysis System Free File Volume Estimates, IRS Research, Analysis, and Statistics Division Global Insight, Inc., Regional Forecasts—States Database IRS Individual Master File . - 122 - 6  Statistical Dissemination and Communication Johnson Standing Out in a Crowd: Improving Customer Utility on a Centrally Administered, Shared Web Site* Barry W. Johnson, Internal Revenue Service T he Internet has become the primary public interface for many statistical organizations, offering opportunities to reach larger audiences with more products than ever before. Often, however, a statistical organization’s virtual existence must be shared with other, dissimilar organizations, due either to resource constraints or policy decisions. In countries without a centralized statistical agency, such as the United States, statistical organizations are often housed within much larger agencies whose missions are primarily administrative. In such cases, the needs of the statistical function are often at odds with those of the administrative function. Similar tensions can exist in countries where the statistical functions are centralized. In these cases, subject matter with a relatively small customer base may compete for visibility and resources with topics that have broader appeal. Shared use of a single Web site may reduce flexibility in design and limit the types of products that can be offered. Often, design decisions are driven by the component with the largest customer base and may not optimally serve smaller statistical functions and their customers. Statistics of Income (SOI), a division of the U.S. Internal Revenue Service (IRS) and the primary source of data on the U.S. tax system, provides an excellent case study for this sort of coexistence. The irs.gov Web site is designed primarily to assist taxpayers in filing their taxes. It contains tax forms, filing instructions, regulatory rulings, and other resources for answering questions about the myriad tax and information reporting requirements that compose the U.S. tax system. It is also home to SOI’s Web pages, “TaxStats,” which provide public access to more than 4,000 statistical data products and average almost 500,000 downloads per month. This paper will focus on SOI’s efforts to improve the TaxStats pages on irs.gov. It will discuss recent redesign efforts and share future plans, all in the context of working within the design limits imposed by a multiuse Web site. The goal is to provide guidance and encouragement for other statistical organizations in similar situations. u Background The official public IRS Web site, irs.gov, is maintained by a contractor, under the supervision of two organizations within the Service. The Communications and Liaison division (C&L) oversees the general look and feel of the Web site and maintains a set of detailed guidelines for page design, including approved fonts, colors, page formats, writing style, etc. All Web pages and content posted to irs.gov must be created and modified through the Content Management Application (CMA). This tool, through validation checks and the use of dropdown menus, helps ensure that all Web pages comply with the parameters specified in these guidelines. The IRS Electronic Tax Administration division (ETA) oversees the hardware and software aspects of irs.gov. Jointly, these two divisions set standards, plan upgrades, conduct user-testing, and facilitate monthly meetings with irs.gov’s major content providers. Statistics of Income began disseminating data electronically in 1992 via an electronic bulletin board, which was maintained on a personal computer by SOI staff. In 1996, SOI replaced the bulletin board with the TaxStats pages on irs.gov. These pages were organized by subject matter, primarily reflecting SOI’s internal structure. Downloads and Web content grew annually, but, by 2003, it became clear that customers, particularly those new to TaxStats, were having difficultly locating products and services. To learn more about customer experiences on TaxStats and to address problems, SOI formed a small, cross-functional “Web team” made up of economists, statisticians, and computer specialists from a diverse array of subject matter areas. *Johnson, Barry W. (2006), “Standing Out in a Crowd: Improving Customer Utility on a Centrally Administered, Shared Web Site,” United Nations Economic Commission for Europe, Work Session on Statistical Dissemination and Communication, http://www.unece.org/stats/documents/2006.09.dissemination.htm. - 125 - JoHnSon u Gathering Feedback Any organization with a Web presence needs to periodically measure how well it is serving its customer base. For SOI, informal feedback provided a catalyst for evaluating the effectiveness of SOI’s Web pages. Initially, some of the most useful comments came from customers who contacted its Statistical Information Services (SIS) office after failing to find the information they wanted on TaxStats. Many times, SIS staff were able to help these customers navigate the TaxStats pages to find the information they sought, a clear indication that the Web pages needed improvements. In addition, SOI has a panel of expert tax policy researchers who meet biannually to offer feedback and provide direction to SOI. These users not only provided additional, informal feedback about their experiences using TaxStats, but also became an integral part of the redesign process. To gather formal information from customers, SOI developed a survey that was given to all callers who contacted SOI’s SIS staff [1]. This survey included 11 structured questions and an opportunity for general comments. Questions included general respondent information (occupation, frequency of visits to TaxStats, subject matter interests), general satisfaction with TaxStats (ease of use, quality of products, overall satisfaction), and suggestions for improvements (expanded content, preferred file formats, specific changes to improve navigation). In addition, the survey was administered to the membership of the U.S. National Tax Association, whose participants are considered key users of SOI data, and to SOI’s consultants. The results showed that SOI customers had a wide range of occupations but were mainly researchers from universities; Federal, State, or local government employees; or individuals providing consulting or issue advocacy services. In general, customers found SOI products useful and of high quality but often had difficulty locating items on TaxStats. They specifically cited problems with Web page organization. Other comments included requests for more data, especially historic data, and easier-touse product formats for data tables and articles [2]. In addition to formal and informal customer feedback, irs.gov provided SOI with monthly Web metrics that identified popular products. These metrics were also useful as benchmarks against which redesigned pages could be evaluated. After analyzing data from all sources, it was clear that both page and overall Web site design issues were contributing to user dissatisfaction. Page design problems were generally things that SOI could address directly. Site design problems posed a greater challenge, since these necessitated working with irs.gov personnel to change the structure of irs.gov or modify style guidelines. u Attacking the Problem Having confirmed that customers were having difficulty finding information on the TaxStats pages of irs.gov, the next step was to identify products that SOI wanted to make available to the public via the Web. This was done through conducting a careful inventory of existing TaxStats content, brainstorming new product offerings, and researching the types of products available from other statistical functions in the U.S. and in other countries. Customer feedback from the surveys was also very important to this process. A few prime customers provided additional input by participating in a card sort exercise. Card sorting, as applied to information management, is a technique for developing an information structure, as well as suggesting navigation, menus, and possible taxonomies [3]. SOI used its panel of 15 consultants as subjects for this exercise, which was conducted via mail [4]. Each test subject received a package consisting of: 1) slips of paper, each with a single content item printed on it, 2) instructions, and 3) some blank slips of paper for subjects to write in additional content items. Participants were asked to create subgroups from items they perceived as related, by grouping individual cards using rubber bands and paper clips, and then to organize these subgroups into larger categories. Participants then mailed the cards back to SOI, along with any comments or suggestions they wished to add. While response rates were somewhat disappointing, the six subjects who chose to participate represented a range of research interests. Despite their varied interests, the subjects provided results that were surprisingly similar. Each also provided a number of suggestions for new content items. The results of this exercise were - 126 - Standing out in a croWd instrumental in developing the structure and content of a prototype for the new TaxStats Web pages. Another important component of the redesign effort involved examining Web sites of major U.S. and international statistical agencies, as well as a number of commercial Web sites. The team also reviewed articles and research papers that presented guidelines for effective Web pages [5]. At the time, the recently redesigned U.S. Bureau of Labor Statistics (BLS) Web site was particularly helpful, because it is an organization whose mission and scope are similar to those of SOI. Since BLS is renowned for its cognitive research, all its new Web pages were subject to extensive usability tests, the results of which are well documented in a series of papers on Web design and testing [6]. In addition, the BLS Web designers were very generous in sharing their expertise with SOI’s Web team. to draw a distinction between customers who access tax statistics and those who came to irs.gov in search of tax filing or compliance information. Third, SOI acknowledged the value of design constraints that had been developed to enhance the experiences of the latter group and provided evidence that these very features were making it difficult for SOI’s customers to find the products they needed. Finally, recognizing resource limitations, SOI chose to focus on a limited number of requests for changes in irs.gov policies or practices. The results of this meeting included a clearer understanding of SOI’s needs, an agreement to make a significant change to the existing irs.gov page structure, and a promise for continued dialogue. u User Testing After developing a working prototype Web site, SOI conducted user-testing prior to implementing any actual changes to the TaxStats pages. While the prototype did not have working links for all 4,000 SOI data products, it included examples of all the page styles that SOI was proposing, including several pages with similar functions, but different design features, in the hope that testing would indicate a clear “best” choice. After consulting with professional Web developers and SOI’s own staff of statisticians, a series of test tasks were developed. Testing was conducted at the BLS cognitive research laboratory, and a trained facilitator administered these tasks individually to a diverse group of seven test subjects while members of the Web team observed from a separate room [7]. Observers were able to hear each of the test subject’s comments, as well as view their facial expressions and all computer key strokes via a computer monitor. Each session was also captured on videotape for further analysis. At the end of each test session, subjects were debriefed using a questionnaire. The test results were used to finalize Web design plans. u Developing a Plan The official irs.gov design guidelines provided three basic page layouts at the time SOI undertook its redesign. All Web pages contained static content, primarily text in HyperText Markup Language (HTML) or documents in Portable Document Format (PDF). As SOI Web team members developed new page layouts, a guiding factor was to keep, as much as possible, the specifics of the designs within the written guidelines established for irs.gov, but, within those guidelines, to be as innovative as possible. Several new layouts were developed, and these were presented to SOI’s panel of consultants for feedback. Based on their feedback, SOI developed a working prototype of the new site using Microsoft FrontPage. While developing the prototype Web pages, SOI met with some of the individuals who oversee irs.gov. At this meeting, SOI presented research results and a detailed short- and long-term vision for TaxStats and unveiled a few prototype pages. An important feature of this presentation was the use as examples of other successful Web pages from organizations with missions similar to that of SOI. A few key factors made this meeting successful. First, SOI had empirical research to show that the current irs.gov TaxStats pages were not serving customers well. Second, SOI was careful u Implementation Once the plan was finalized, Web team members set about the task of programming new Web pages. Hierarchies of pages were developed, and design attributes, such as font sizes, spacing, text justification, grid - 127 - JoHnSon styles, and usage, etc., were determined and documented in written guidelines that included instructions and examples to ensure uniformity across pages. Actual programming was performed by individuals with some expertise in the subject matter whenever possible. This ensured that specific content items were correctly categorized and described. To assist in final page design, classroom training in writing for the Web was offered to team members. Once all of the pages were completed, subject matter experts were enlisted to thoroughly test each page for accuracy. In total, nearly 150 pages were developed with more than 4,000 links to content items. The new pages included a new main (home) page and a redesigned left navigation bar. Based on customer feedback, all tabulated data on the site were made available as Microsoft Excel spreadsheets, and all research reports were posted in PDF format, with free readers provided for each. Web pages were nearly all programmed in HTML and were certified as compliant with U.S. standards for accessibility by individuals with disabilities [8]. Third, a prototype application that allows customers to create customized tables from SOI data is being tested on TaxStats. This application uses off-the-shelf software with custom-designed display screens that allow users to access a database containing tabulated SOI data (microdata are not made available due to privacy protection concerns). Users can combine data across different tax years, select variables of interest, and choose categories of data to include in a table, as well as calculate simple descriptive statistics using this application. Fourth, metadata designed to help users better interpret the data available on TaxStats are being developed. Possible metadata items include tax forms marked to indicate the origin of specific data items, written descriptions of individual data items, and sample selection information, including variance estimates where applicable. Samples of metadata are currently being tested. In addition, SOI is working closely with irs. gov officials to develop a fully articulated taxonomy of TaxStats that, in time, will be used to improve search capabilities and navigation, as well as provide common definitions of concepts and terms across all irs.gov content areas. u Future Directions SOI is currently working to improve several aspects of the TaxStats Web pages. First, while all of the actual TaxStats Web pages are certified as accessible to individuals with disabilities using screenreading software, many of the PDF documents available through those pages are not. SOI is committed to correcting this problem by improving both the techniques used to create the documents and their overall design. The software used to produce SOI documents has recently been upgraded, and SOI is seeking training and advice from desktop publishing experts. Second, many of the tables on TaxStats contain extra formatting features that are necessary for creating printed publications but that make certain types of analysis difficult. Customers who use these tables for analysis must first remove some formatting features before applying even simple math functions to the data. SOI has just issued draft guidelines for producing researcher-friendly data tables. These guidelines were developed by incorporating extensive feedback from customers. u Lessons Learned Statistics of Income’s experience in redesigning the TaxStats pages on irs.gov serves as a model for other organizations faced with a Web site that is not specifically designed to serve its customers’ needs. The resulting redesigned Web pages, while not cutting-edge, nevertheless have garnered favorable feedback from both regular and new customers. More products are now offered on clearer, better organized pages. Product formats have been standardized and, in some cases, redesigned. The effort was not expensive. In fact, the only direct expense was the cost of sponsoring a Webwriting training class. There were opportunity costs in the time spent on the redesign efforts by employees, but SOI’s Web team was careful not to let Web design activities interfere with their day-to-day responsibilities. And as is often the case, the team project brought energy to SOI that provided benefits beyond the successful completion of this specific task. The key to - 128 - Standing out in a croWd SOI’s success was involving subject matter specialists and customers in all phases of transforming the TaxStats pages. This fostered a sense of commitment to the project, a deeper understanding of customer needs and SOI products, and the creativity needed to work within the constraints of a design framework that initially appeared to be fundamentally unsuitable. Some specific lessons learned include: a. Gather specific feedback from users in order to thoroughly understand opportunities for improvement. If possible, involve a group of core customers in redesign efforts. b. Research best practices used by organizations with similar products or customers. Also examine commercial Web sites since these may reflect the most current design practices and technology. c. Focus initially on those things that are under the control of the content provider. Consider questions such as: • Are products being provided in formats that meet customer needs? • Are products and pages accessible to all users? • Is content organized and adequately described so that users outside the provider’s culture can clearly understand what is being provided? d. Take as much control over content management as possible. Involve employees who are familiar with the mission and products of the organization in redesign efforts. Keep management informed of team progress and ideas to ensure executive-level support. This is especially important if redesign plans require any site-level policy changes. e. Develop a thorough understanding of design guidelines and restrictions, and, if possible, meet with Web site managers to better understand them. f. Present research results to Web site managers along with a clear plan for improvement that respects current Web site guidelines. When necessary, propose modifications that will meet the needs of specific customer groups, focusing on a few essential changes. g. Become involved in the Web site’s user group, or urge the formation of such a group if none exists. These are excellent forums for educating Web site managers about customer needs. h. Prototype and test pages prior to implementing any changes. i. Continuously monitor user experiences on the Web site. Web pages are not static, but must continue to change as technology and Web practices evolve. u Endnotes [1] While an online survey of TaxStats users would have been preferred, at the time of the redesign, irs.gov did not have the technical capacity to implement Web surveys. [2] Prior to the redesign, documents were available in PDF, Lotus, and Microsoft EXCEL. In addition, larger files were compressed and provided as executable files. [3] Maurer, Donna and Warfel, Todd, “Card Sorting: a definitive guide,” http://www.boxesandarrows. com/view/card_sorting_a_definitive_guide, 2004. [4] The minimum recommended number of card sort participants is 15. While conducting this exercise face-to-face allows observers to record respondent reactions, it is acceptable to mail packages to participants when cost is an important consideration or when conducting the exercise via mail improves participation rates. Nielsen, Jakob, “Card Sorting: How Many - 129 - JoHnSon Users To Test,” http://www.useit.com/alertbox/20040719.html, 2004. [5] See, for example, “Best Practices in Designing Web Sites for Dissemination of Statistics,” United Nations Statistical Commission and Economic Commission for Europe, 2001. [6] See, for example, Levi, Michael D., “Usability Testing Web Sites at the Bureau of Labor Statistics,” National Institute of Standards and Technology Symposium, Transcript, 1997. [7] While five is considered the minimum number of test subjects required to discover the major- ity of usability problems, SOI determined that its users fell into two broad groups, experienced statistical data users and individuals with a general interest in the U.S. tax system, so that it was necessary to try to get representatives of both groups. Nielsen, Jakob, “Why You Only Need To Test with 5 Users,” http://www.useit. com/alertbox/20000319.html, 2000. [8] See Section 508 of the Rehabilitation Act (29 U.S.C. 794d), as amended by the Workforce Investment Act of 1998 (P.L. 105-220), August 7, 1998 (herein referred to as Section 508). - 130 - Index of IRS Methodology Reports on Statistical Uses of Administrative Records Special Studies in Federal Tax Statistics, 2005 Selected papers given primarily at the 2005 Joint Statistical Meetings of the American Statistical Association in Minneapolis, Minnesota, and at the National Tax Association’s Annual Conference on Taxation in Miami, Florida. The volume is divided into seven major sections. It begins with three papers: one on analyzing business organizational structure from tax data; one on current research in the nonprofit sector; and one on geographic variation in filing rates for Schedule H, the IRS form used to report Social Security and Medicare wages paid to household employees. Section 2 presents a paper on Schedule M corporate book-tax difference data, 1990-2003. Section 3 presents a paper on the effects of taxation on corporate financial policy. Section 4 contains three papers on measuring nonsampling error in the SOI Individual Tax Return Study; how imputed returns on the Corporate File compare to actual returns; and the impact of followup on Tax Year 2002 Foreign Tax Credit Data. Section 5 contains four papers on cluster analysis in describing tax return data; comparing income concepts at IRS, Census, and BLS; the 1999-2003 Statistics of Income Tax Return Edited Panel; and trends in 401(k) and IRA contribution activity, 1999-2002. Section 6 presents a paper on the Estate and Personal Wealth sample design. Finally, Section 7 presents a paper on IRS area-to-area migration data. Special Studies in Federal Tax Statistics, 2004 Selected papers given primarily at the 2004 Annual Meetings of the American Statistical Association in Toronto, Ontario, Canada, and two other professional conferences--the Luxembourg Wealth Study Workshop in Perugia, Italy, and the Conference on Privacy in Statistical Databases in Barcelona, Spain. The volume is divided into five major sections. It begins with four papers on recent developments in Statistics of Income research. Section 2 includes five papers on quality assessment of administrative records data. Section 3 presents a paper on estimates of income and wealth from survey and tax data. Section 4 contains a paper on disclosure protection techniques. Finally, Section 5 presents a paper on some current theorietical research on multivariate analysis presented in a poster session at ASA. Special Studies in Federal Tax Statistics, 2003 Selected papers given primarily at the 2003 Annual Meetings of the American Statistcal Association in San Francisco, CA. The volume is divided into four major sections. It begins with four papers presented in the same session under the topic, "Are the Rich Getting Richer and the Poor Getting Poorer?" Section 2 includes a paper on survey methods. Section 3 presents five papers on new developments in tax statistics and administrative records. Finally, Section 4 contains a paper on survey nonresponse and imputation. Special Studies in Federal Tax Statistics, 2002 Selected papers given primarily at the 2002 Annual Meetings of the American Statistical Association in New York City and at the 2002 National Tax Association Conference in Orlando, FL. The volume is divided into seven major sections. It begins with two papers on recent IRS research. Section 2 includes a group of four papers on methodological and analytical advances in tax statistics. Section 3 presents two papers on statistical uses of administrative records. Section 4 contains a paper on disseminating IRS locality data. Section 5 includes a paper on confidentiality and data access issues. Section 6 presents a paper on measuring the quality of IRS responses to taxpayer inquiries. Finally, Section 7 includes two papers on distributional theory and computation. - 131 - index Special Studies in Federal Tax Statistics, 2000-2001 Selected papers given primarily at the 2000 and 2001 Annual Meetings of the American Statistical Association in Indianapolis, Indiana and Atlanta, Georgia, plus one other paper presented at the International Conference on Establishment Surveys II in Buffalo, New York in 2000. The volume is divided into four major sections. The book begins with five papers on statistical applications. Section 2 presents two papers on confidentiality and data access issues. Section 3 presents two papers on changing industry codes. Finally, Section 4 includes five papers on analyses of Federal tax and information returns. Turning Administrative Systems Into Information Systems, 1999 Selected papers given at the 1999 Annual Meetings of the American Statistical Association (ASA) in Baltimore, MD. In addition, the report includes one paper presented at the 1998 ASA conference in Dallas, TX. The volume is divided into six major sections. The book begins with a complete ASA session analyzing administrative records from the U.S. tax system. It contains four papers, as well as a set of comments on the presentations. Section 2 presents four papers on the statistical uses of administrative records. Section 3 includes two papers, which focus on employee satisfaction and customer satisfaction surveys at the IRS. Section 4 contains two papers, one of which was presented at the 1998 ASA conference, that provide an update on the Survey of Consumer Finances. Section 5 presents one paper that looks at the feasibility of preparing State corporate data by matching receipts and employment data by State and industry. Finally, the volume concludes with a paper on distributional theory and computation. Turning Administrative Systems Into Information Systems, 1998-1999 Selected papers given at the 1998 Annual Meetings of the American Statistical Association in Dallas, Texas. In addition, the report includes a session of papers presented in 1999 at the Annual Meetings of the American Economic Association (AEA) plus one other paper. The volume is divided into five major sections. The book begins with the AEA session in memory of the late Dr. Daniel B. Radner, Social Security Administration economist. It contains four papers on new empirical findings in the distributions of personal income and wealth, as well as two sets of introductory remarks and two sets of comments on the presentations. Section 2 presents two papers on data measurement and data bases for economic research. Section 3 includes two papers, which focus on sample design, estimation, and imputation research. Section 4 explores issues dealing with public-use files, including the potential for disclosure. Finally, Section 5 concludes the volume with a paper verifying the classification of public charities in the 1994 Statistics of Income Study Sample. (It is the only paper not presented at the ASA or AEA meetings.) Turning Administrative Systems Into Information Systems, 1996-1997 Selected papers given primarily at the 1996 and 1997 Annual Meetings of the American Statistical Association in Chicago, Illinois and Anaheim, California, plus one non-ASA article. The volume is divided into nine major sections. The book begins with a paper originally printed as a textbook article on inheritance and wealth in America. Section 2 presents papers on using administrative records for generating national statistics. Section 3 contains two sets of panel reports on the statistical uses of administrative records. Section 4 focuses on methodological research. Section 5 explores issues dealing with quality improvement in government. Section 6 presents a panel discussion on Customer Satisfaction Surveys. Section 7 focuses on the effect of downsizing on Federal statistics. Section 8 explores the privacy area. Finally, Section 9 concludes with seven papers on statistical disclosure limitation. Turning Administrative Systems Into Information Systems, 1995 Selected papers given primarily at the 1995 Annual Meetings of the American Statistical Association in Orlando, Florida and another conference. The volume is divided into five major sections. The book begins with a paper on SOI migration data, giving an example of how this unique dataset can be used by demographers and policy research- - 132 - index ers. Section 2 presents papers on sample designs and redesigns, as well as on SOI efforts in the corporation and partnership areas. Section 3 contains papers on weighting and estimation research. Section 4 focuses on analytical approaches to quality improvement, from graphical techniques to cognitive research. Finally, Section 5 concludes with papers from an invited session on record linkage applications for health care policy, a session organized by SOI in view of its long-term interest in improving matching techniques for administrative and survey data. Turning Administrative Systems Into Information Systems, 1994 Selected papers given primarily at the 1994 Annual Meetings of the American Statistical Association in Toronto, Ontario, Canada. The volume is divided into nine major sections. The book begins with an overview of the Statistics of Income Programs, describing the origins and customers of various SOI data and highlighting our products and services. Section 2 presents the descriptive results from two recent studies--one on sales of capital assets and one on self-employed nonfilers. Section 3 contains papers and discussion from a session on privacy issues involved in using administrative record data. The next two sections are much more methodical in nature: Section 4 focuses on sample design and estimation work in SOI, beginning with a reprint of a 1963 paper by W. Edwards Deming, which presents an evaluation of the SOI sample. Section 5 presents data on record linkage. Section 6 draws together the papers from a session on nonresponse in Federal surveys. Section 7 is a more statistical section, which contains a collection of papers on imputation methodology in a number of different arenas. Section 8 focuses on another long-time theme of these volumes--quality improvement efforts. Finally, Section 9 presents two unrelated papers on data preparation techniques. Turning Administrative Systems Into Information Systems, 1993 Selected papers given at the 1993 Annual Meetings of the American Statistical Association in San Francisco, California and other related conferences. The volume contains seven major sections, each focusing on a somewhat different area of research. The first section begins with a paper that presents a view for the future of the Federal statistical system. This effort is part of a dialogue with other agency leaders to redefine a cohesive plan for Federal data producers and users. Section 2 contains several descriptive papers based on tax data about individuals, and Section 3 looks at similar uses of tax data for businesses. Section 4 focuses on sample design issues for several SOI projects, while Section 5 presents information on improvements to analytical techniques. Finally, Sections 6 and 7 describe a number of different studies SOI is involved in to improve the quality and productivity of other areas of IRS. Turning Administrative Systems Into Information Systems, 1991-1992 Selected papers given mostly at the 1991 and 1992 Annual meetings of the American Statistical Association, held, respectively, in Atlanta, Georgia and Boston, Massachusetts. Papers chosen for this volume exemplify some of the basic changes that are occurring in the Statistics of Income program during the 1990’s, including discussions of methodological improvements and applications currently under way in the U.S. Federal statistical community. The volume contains seven general areas of interest: information from tax return data; the 1989 Survey of Consumer Finances; estimation and methodological research in the SOI business program; sample design and weighting issues in the SOI individual program; some quality improvement applications; some technological innovations for SOI research; and a look to the future data needs for the Federal sector. Previous volumes in the series were called Statistics of Income and Related Administrative Record Research (see below). The title was changed to more clearly reflect how the Internal Revenue Service’s Statistics of Income function is adapting to better meet the informational needs of its many customers. Statistics of Income and Related Administrative Record Research, 1990 Selected papers given primarily at the 1990 Annual meeting of the American Statistical Association in Anaheim, California. Papers selected for this volume contain discussions of methodological improvements and applications - 133 - index currently under way in the U.S. Federal statistical community. In particular, the focus is on work being done by the Statistics of Income Division of the Internal Revenue Service (IRS). The volume covers five general areas: longitudinal panel data and estimation issues; analytical research using survey and administrative data; design issues for Federal surveys; information on the conclusions of the Establishment Reporting Unit Match Study; and a look at future data needs for the Federal sector. Statistics of Income and Related Administrative Record Research, 1988-1989 Selected papers given mostly at the 1988 and 1989 Annual Meetings of the American Statistical Association in New Orleans, Louisiana and Washington, D.C., respectively. Papers for the volume focus on perspectives on statistics in government--in celebration of ASA’s 150th anniversary; improvements in income and wealth estimation; methodological enhancements to administrative record data; some looks at the effects of tax reform; and technological innovations for statistical use. Statistics of Income and Related Administrative Record Research, 1986-1987 Selected papers given, for the most part, at the 1986 and 1987 Annual Meetings of American Statistical Association in Chicago and San Francisco, respectively. Papers focus on ongoing wealth estimation research and U.S. and Canadian efforts regarding methodological enhancements to corporate and individual tax data and recent refinements to disclosure avoidance techniques. Record Linkage Techniques, 1985* The Proceedings of the Workshop on Exact Matching Methodologies held in Arlington, Virginia, May 9-10, 1985. Includes landmark background papers on record linkage use and papers describing methodological enhancements, applications, and technological developments, as well as extensive bibliographic material on exact matching. Statistical Uses of Administrative Records: Recent Research and Present Prospects* A two-volume reference handbook on research results involving the use of administrative records for statistical purposes from 1979 through 1982:  Volume I (March 1984) focuses on general considerations in administrative record research, applications of income tax data, uses based on data from other major administrative record systems, and enhancements to statistical systems using administrative data. statistical purposes, selected examples of end uses of linked administrative statistical systems, and a status report that sets goals for the future.  Volume II (July 1984) focuses on comparability and quality issues, access to administrative records for Statistics of Income and Related Administrative Record Research, 1984* Selected papers given at the 1984 Annual Meeting of American Statistical Association in Philadelphia. Papers focus on future policy issues, applications, exact matching techniques, quality control, missing data, and sample design issues. Statistics of Income and Related Administrative Record Research, 1983* Selected papers given at the 1983 Annual Meeting of American Statistical Association in Toronto. Papers focus on use of administrative records in censuses and surveys, applications for epidemiologic research and other statistical purposes, and statistical techniques involving imputation and disclosure and confidentiality - 134 - index Statistics of Income and Related Administrative Record Research, 1982* Selected papers given at the 1982 Annual Meeting of American Statistical Association in Cincinnati. Papers focus on statistical uses of administrative records, resulting methodologic advances, and estimates and projections for intercensal updates. Statistics of Income and Related Administrative Record Research* Selected papers given at the 1981 Annual Meeting of American Statistical Association in Detroit. Papers focus on applications and methodologies with an emphasis on IRS’s Statistics of Income Program, the Small Business Data Base, nonprofit and pension data, and on Canada’s Generalized Iterative Record Linkage System. Economic and Demographic Statistics* Selected papers given at the 1980 Annual Meeting of American Statistical Association in Houston. Papers focus on evaluation of the 1977 Economic Census, CPS hot deck techniques, and efforts to upgrade Social Security’s Continuous Work History Sample. ______________________________ *Out of print—Copies of selected papers can be obtained upon request. NOTE: The IRS Methodology Reports on statistical uses of administrative records are now being offered free of charge. To obtain copies, write to: Statistical Information Services (SIS) Statistics of Income Division (RAS:S:SS:SD) Internal Revenue Service P.O. Box 2608 Washington, DC 20013-2608 Phone: (202) 874-0410 FAX: (202) 874-0964 E-mail: sis@irs.gov - 135 - Department of the Treasury Internal Revenue Service publish.no.irs.gov Publication 1299 (3-2007) Catalog Number 63296M IRS

Shared by: Ryan Colwell
About
I heart Excel!
Other docs by Ryan Colwell
June-2006 Tax Court Opinion Ruling Case-SPENCER
Views: 209  |  Downloads: 1
June-2006 Tax Court Opinion Ruling Case-SHINAULT
Views: 132  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-ROSSMAN
Views: 147  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-ROSEN
Views: 149  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-PROWSE
Views: 110  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-PILLAY
Views: 118  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-PEOPL
Views: 96  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-PARKER
Views: 133  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-MURRAY
Views: 95  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-MILLER
Views: 99  |  Downloads: 0
June-2006 Tax Court Opinion Ruling Case-LYNN
Views: 48  |  Downloads: 0
Related docs
IRS Publication 590
Views: 168  |  Downloads: 2
IRS Publication 502
Views: 69  |  Downloads: 0
IRS Publication _600
Views: 141  |  Downloads: 1
IRS Publication 225
Views: 84  |  Downloads: 0
IRS Publication 3319
Views: 21  |  Downloads: 0
entire publication
Views: 2  |  Downloads: 0
IRS Publication 1828 (Spanish)
Views: 13  |  Downloads: 0
2006 Publication[498]
Views: 1  |  Downloads: 0
2004 Publication[225]
Views: 3  |  Downloads: 0
IRS Publication _1762
Views: 167  |  Downloads: 1
IRS Publication _1431
Views: 195  |  Downloads: 2
IRS Publication _2053B
Views: 99  |  Downloads: 1
IRS Publication _3722
Views: 168  |  Downloads: 1
IRS Publication _3112
Views: 208  |  Downloads: 1