Inter-war Labour Database (1919 to 1944): Technical Appendix Dave Gower
Since World War II, Canada has gathered considerable labour market data to monitor changes in employment. One important source has been the Labour Force Survey (LFS), which gathers information from a sample of Canadian households. Prior to 1945, some labour data existed, much going back to World War I. Unfortunately, they came from a variety of sources, related to different statistical concepts, and covered only some segments of the labour market. Since statistics cannot be gathered retroactively, the quality of data between the wars does not match that of information compiled since 1945. However, modern computing technology not available to earlier statisticians can help shed light on economic conditions of the earlier era. The Interwar Labour Database (ILD) attempts to integrate early data into a statistical series that should give as full a picture as possible of the economic conditions faced by Canadian workers from 1919 to 1944. This document is intended to serve as a companion piece to the Excel workbook called “ILD(full)," which contains detailed sources and calculations. That file is for users who wish to examine the methods employed to derive the ILD data. It is also possible to use the file to experiment with alternative procedures, or to use the contents as a starting point for further work. Documentation and a recalculation macro are provided on the file. Results from ILD are also presented in a simpler format: the “ILDdata(small)” file gives the final results only, by month and province, which may better satisfy the needs of those solely interested in the results.
Data sources During the period in question, four censuses provided labour information. The 1921 Census measured the number of people active in the workforce, but did not indicate how many were unemployed. The 1931 Census provided both measures, as did those of 1936 (for the Prairie provinces only) and 1941. These censuses give essential reference points or benchmarks for creating monthly series. In addition to the censuses the Department of Labour conducted two monthly surveys. Employment information was gathered from large employers and unemployment statistics from trade unions. Both these sources offered only partial coverage of the labour market. However, because they tell something about movements over the seasons and economic cycles, their information can be used to project the census data.
Data quality The interwar sources provide series by month and province. Obviously, these details are of interest because they allow a level of regional and time-series analysis not previously possible. Further, creating the database at that detail facilitates improvements to the data. In particular, unemployment series can be cleaned of outliers to a degree not possible with more aggregated data. However, as mentioned, the quality of the data provided in the ILD cannot equal that available after WWII. The earlier the data, the less reliable they are.
Concepts Since World War II, the primary labour measure has been the labour force. This consists mainly of people either currently working (employed) or available for work and seeking work (the unemployed). Most often the employed are classified as employees (those who work for others) or the self-employed. Traditionally, the workforce has also been divided into agricultural and non-agricultural, although this distinction has declined in importance. Before World War II, concepts were different. The censuses used as their basic labour concept, “gainfully employed,” a somewhat confusing term that also embraced the unemployed. Gainfully employed seemed to be a measure of one’s status in life, rather than of current labour market activity. Starting in 1931, the censuses asked all gainfully employed people if they were at work on June 1 (the reference day). If they said no, they were asked why. Two possible responses (among others) were “no job” and “temporary layoff.” Persons volunteering either of those categories are considered unemployed in the ILD. A major subgroup of gainfully employed was “wage-earner,” equivalent to the current employee (or paid worker) category. As with the gainfully employed, wage-earners could be unemployed. Since the monthly employment and unemployment data used in the ILD were gathered among non-agricultural wage-earners, this constitutes the category for which estimates have been performed. Agricultural workers and the self-employed are thus excluded.
Time period The data series run from the start of 1919 to the end of 1944. The first date represents the beginning of monthly unemployment data, and is the earliest reasonable date for back-casting the employment series. The second date was chosen as the final year for two reasons: the LFS began the following year, and unemployment was essentially zero, owing to wartime conditions.
Prior publications Historical Statistics of Canada has long been considered the basic source of interwar labour data. In fact, the unemployment statistics in that publication came from the monograph program of the 1931 Census, in which the first attempt was made to produce intercensal estimates of unemployment. The scope of the labour data was limited by technical, rather than statistical, considerations. Indeed, the census monograph candidly admitted that the calculations were restricted to those "that a schoolboy could perform." What did these early analysts attempt to do? Their main objective was to measure labour conditions, especially unemployment, before and after the 1931 Census. First, they used only Canada data, and looked only at June (the census month) of each year. They had two options: to look at the change in employment, and estimate unemployment from that, or to use the series on unemployment provided by trade unions. Unable to choose between these alternatives, they presented as the official estimate an average of the two rates. (This decision was probably a compromise between factions in office politics.) This is obviously not the best possible use of the available information, which is the reason for the ILD project.
Using the "ILD(full)" workbook This workbook requires at least Excel 97 to run. Data users who want simply to access the results can use only the "chart" and "results" worksheets. (Note that the top line of the data on the "results" worksheet are equations, not values, and require copy/paste special/values to copy properly.) Users who wish to examine the raw data or calculation procedures, or who wish to experiment with alternative procedures, can use this workbook to do so. All required information is provided in the various worksheets, whose contents are explained in the following text. If people wish to experiment with different equations or raw data, and have not changed the location of any table, they can recalculate the entire workbook by running the "calculate" macro, by selecting tools/macro/macros/run. This will recalculate all equations and return the surplus equations to values. If the location of any data is changed, the workbook may need to be recalculated manually, which involves establishing all interdependencies between tables on various worksheets. To save space, equations in most tables are stored in only one row or column. A copy/paste for each table is necessary. Then the formulae should be changed to values on all but the original column or row; otherwise, the workbook will require much more memory.
A chart worksheet has been provided to display results. Users can enter their own reference formulae to display whatever series they need. The censuses The 1941 wage-earner data excluded those "on active service." In other words, the data correspond to the current concept of the civilian labour force. The 1921 Census presented other problems. For example, even though wage earners were identified in that census, the data were tabulated and published only for major cities, apparently for budgetary reasons. To derive provincial from city data, it was assumed that the ratio of wageearners to the working-age population followed the same 1921-to-1931 trend outside cities as inside them. The fact that agriculture is excluded lends credence to that assumption. This calculation was done for each province. A more serious problem was the absence of unemployment data in the 1921 Census. Ironically, by June 1921 Canada was in the middle of a short but severe recession, so this was a significant omission. To generate an estimate of 1921 unemployment, two options are possible: use the union rates, or derive a measure from the substantial drop in employment over the preceding year. Because the union rates showed extreme volatility between provinces, an average of the values obtained by the two techniques was used. For the second estimate, June 1931 to June 1932 was the starting point. During that period, the employment index dropped 14.5 points, and the union measure of unemployment rose 5.6 percentage points. In other words, unemployment rose .4 points for every one-point drop in employment. Between June 1920 and June 1921, the employment index for Canada dropped 19.1 points. Change in employment varied widely across the country, though not necessarily in keeping with provincial rises in unemployment (especially in British Columbia). Unfortunately, only half of this drop -- the portion after December 1920 -- is actually derived from true regional data (see "Employment indices"). The earlier months are extrapolated from the national trend, one of the limitations of the earlier ILD data. In order to create an estimate for unemployment rise based on employment change, the provincial drop in the employment index was multiplied by .4, as derived above. This was then averaged with the rise in union-measured unemployment to create a best guess for the increase in provincial unemployment between June 1920 and 1921. The figure for June 1920 (quite low and derived from the union data) was added to the increase to produce the 1921 provincial unemployment benchmark. (The "census" worksheet contains the precise equations used to produce these values.)
Union unemployment rates Unions had been required to submit monthly reports on their membership to the Department of Labour since the middle of the First World War. One of the statistics gathered and published in the Labour Gazette was the percentage out of work at the end of each month. Obviously, unions account for only a fraction of employment (see the "unionmemb" worksheet). Their unemployment rates differed from those recorded in the censuses, which is why they were considered a poor measure of global unemployment. Furthermore, union membership and survey reporting varied month to month and over longer periods. For that reason the union measure of unemployment is best regarded as an indicator of trends rather than of levels. One qualification is necessary. Detailed examination (see the top table on the "ununion" worksheet) reveals occasional outliers in an otherwise reasonable pattern. One example is British Columbia in early 1921, and a second is Alberta in the spring of 1925, where rates were extraordinarily high for a short period of time. Commentary in the relevant issues of the Labour Gazette suggests that the main reason for these occasional outliers was industry shutdowns (other than labour disputes, which are not counted as unemployment). For example, the 1925 Alberta situation was caused by a temporary oversupply in the coal industry. Such shutdowns in a highly unionized sector can artificially and temporarily drive up the provincial rate, making it unrepresentative of overall trends in the province. For this reason, a procedure to remove outliers was required before union rates could be used to calculate global unemployment. Since this procedure uses employment values, employment data must be described first. The employment indices In 1919, large employers began to submit reports on their workforce to the Employment Service of Canada, a part of the Department of Labour. By April of that year, response rose to a level the department considered "fairly representative of employment conditions" (Labour Gazette 1919), and the results began to be published -- first as week-to-week movements, then (in 1920) as a weekly index on a base of January 17, 1920 = 100. Starting in 1921, these Labour Gazette reports became monthly and were published for the five regions. In all cases the change estimates were based on common reporting units, that is, firms who responded in both periods. The exact coverage of the employment index series is difficult to determine. Which firms were required to report, and how the requirement was enforced, is not evident in the material consulted for the study. However, data published over the years indicate that most larger employers, except those in agriculture, government and some quasi-public sectors, were included.
After 1926, the base was recalculated to 1926 (annual average), and in 1937 full provincial detail replaced the Maritime and Prairie regions.
In this study, the indices by province have been extrapolated back to January 1919. For pre-1937 information, data for the smaller provinces were projected from regional movements. In 1919 and 1920, no regional detail was published, so this was projected from the Canada movements. However, though not available from the indices, full regional detail did exist in the census employment benchmarks, so the ILD does provide distinct information for all provinces over the entire period. Data for 1919 and 1920 were obtained from charts in the Labour Gazette. The 1919 data were in fact accompanied by a stern warning not to use them cumulatively -- a warning ignored here on the grounds that, whatever their imperfections, they offer the only information now available. The equations in the "empindx" sheet give the precise formulae by which the weekly visual observations were converted to monthly index values. Readers may refer to that publication to verify the author's observations.
Calculating benchmarked employment levels from the indices The first table of the “emplvls” worksheet converted employment indices to numbers of people and then adjusted these to census benchmark data. A forcing technique using shifting adjustment factors was employed. (Simply put, this technique first takes the difference between the employment value as projected by the indices and the census “target.” It then divides this difference by the number of months and then progressively adds more until the correct value is obtained. For example, there are 120 months between June 1921 and June 1931, so 1/120th of the required adjustment is added to the first month, 2/120th to the second month, and so on. Finally, at June 1931, 120/120 of the adjustment is added, and the number equals the 1931 target.)
The employment figures, it should be remembered, are for employed non-agriculture wage earners (today called paid workers or employees). This category best matches the content of the employment indices. Population and employment/population ratios The basic data for estimating the working-age population are CANSIM series by year, by province, and by age. From these, estimates are derived for the 14-to-69 age range, considered the normal working age during the interwar period. The data are extrapolated monthly and extended back to January 1919 (see the "pop" worksheet for the precise calculations). Since these data change slowly, any errors introduced by these extrapolations are likely to be minor. The population values then form the basis for calculating employment/population ratios (bottom table on the "emplvls" worksheet), which in turn form the basis for the routine to reduce unemployment outliers. Calculating unemployment levels from rates
The union survey provided only unemployment rates. For the ILD, these rates needed to be converted to numbers. These was done by multiplying employment by the unemployment rate (not percentage) divided by 1 (minus the rate). (The calculations are at the top of the "uncalcs" worksheet.) Once a monthly estimate of provincial unemployment was produced, outliers could be dealt with.
Identifying and reducing unemployment outliers No satisfactory method was discovered for identifying values that might be too low. Only excessively high values were treated. Obviously, a procedure that reduces high values but ignores low ones will introduce an overall downward bias to the union unemployment data series. However, in the ILD, this overall bias was irrelevant, since final levels were determined by the census benchmarks. Screening for outliers was accomplished in the second table of the "ununion" worksheet. In short, the relationship between each provincial rate and the Canada rate for a given month was compared with the experience of this ratio over the entire year. The standard distribution of all such comparisons (bottom of "ununion" worksheet) helped determine month-province outliers. A cutoff of 1.65 standard deviations was chosen, since upon experimentation this was found to produce a count of possible outliers amounting to 5% of the observations. In other words, potential outliers were defined as the upper end tail outside a 90% confidence limit. When a potential outlier was found, the drop in the employment/population ratio from 12 months earlier was used to calculate the maximum likely increase in unemployment. If this was less than the rise in union unemployment, it replaced the union value. This happened in about half the cases, as can be seen in the “uncalcs” worksheet. Benchmarking unemployment levels After outliers were removed, the unemployment series was benchmarked to census values using gradually shifting adjustment factors. The equations can be seen on the bottom table of the "uncalcs" worksheet.
Capture and verification Because the type font and small print used in the Labour Gazette defeated attempts at scanning, data had to be captured by hand. To check for keying errors, several verification measures were used. Any values that were actual numbers could be verified by a check for additivity of national and provincial values. Because the unemployment and employment data were rates, they could not be
added directly. Here, various devices were used. In some years provincial weights were published, so a weighted average could be compared with that for Canada. In other periods, annual averages were published, allowing a comparison with monthly figures. Where neither of these was available, a check was made for large month-to-month fluctuations. Some of the errors so caught were in the publications themselves. Smoothing and reference dates Both the employment and unemployment series needed to be smoothed to remove some of the unwanted "noise." For employment, a three-month moving average was considered sufficient, and was calculated before other manipulations (bottom of the "empindx" worksheet). For unemployment, the outlier process was performed first, since a smoothing would mask outliers. In recognition of the greater volatility in this series, a five-month moving average was applied, this being the greatest number of months that could be used without losing too much of the seasonality. Because union data were reported at the end of each month, and census unemployment was measured as of June 1, the May union data provided the best match to the census. Hence, the five-month average of unemployment pegged to the census date is March through July.
Provincial considerations The union unemployment rates were published only for Nova Scotia and Prince Edward Island combined. Consequently, no attempt has been made to produce separate ILD data for these two provinces. All other ILD calculations have been done by province. The Canada totals on the "results" worksheet are aggregates of the provincial values. The Yukon and Northwest Territories have been excluded, as has Newfoundland, which was not part of Canada during this period. Users can verify for themselves that the national totals do match the published census values after appropriate adjustments.
Conclusion This exercise should help preserve a part of Canada's statistical record. Many questions have not been addressed here, however. One is the number of discouraged workers. A second is the seasonality of the patterns. Another is the potential undercount of peak Depression unemployment caused by declining union membership (see the "unionmemb" worksheet).
Possibly the most compelling challenge posed by the data set was the attempt to link these numbers to global labour force measures as currently defined. This would entail trying to
estimate the missing sectors, chiefly agriculture and self-employment. No attempt has been made to address these questions in this data set, since these are analytical rather than data manipulation issues. Relevant census data have been included in the "census" worksheet.
...................................................................................................... References
Denton F.T. and S. Ostry. Historical Estimates of the Canadian Labour Force. 1961 Census of Canada. Statistics Canada, Catalogue no. 99-549-XPE. Ottawa: Dominion Bureau of Statistics, 1967. Department of Labour. The Labour Gazette. Ottawa, various years. Dominion Bureau of Statistics. Seventh Census of Canada, 1931. Monographs, vol. XIII. Statistics Canada, Catalogue no. 98-1931. Ottawa, 1942. Gower, D., "A note on Canadian unemployment since 1921." Perspectives on Labour and Income (Statistics Canada, Catalogue no. 75-001-XPE), 4, no.3 (Autumn 1992): 28-30. Statistics Canada and the Social Science Federation of Canada. Historical Statistics of Canada. Second edition, edited by F.H. Leacy. Statistics Canada, Catalogue no. 11-516-XPE. Ottawa, 1983.
_____
_