A census of population has been taken and the results are becoming available. What can they tell us about mortality? What do we need to know and do to get this information? How can we assess its accuracy?
The most important census questions for information on mortality are children born and surviving and recent household deaths. This post discusses what we can learn about mortality from data on children born and surviving. Part 2 will discuss what we can learn from data on household deaths.
This article focuses on (a) what census data can tell us about levels and trends in mortality, (b) how to assess data quality, and (c) producing estimates for large numbers of subnational areas. It does not explain the techniques required to produce estimates from census tabulations, which information is available in the literature. The literature is not terribly reader friendly, and an up to date exposition would be useful, but this is not a task that can be undertaken here.
Reports on numbers of children ever born and surviving ("child survivorship" data) are the most common census data on mortality and often provide the most most useful information. Women over some minimum age are asked how many children they have born altogether in their lifetime and how many of these children are now surviving. Sometimes the questions are addressed to all women, sometimes only to ever married women.
The standard tabulations show children ever born and children surviving by age of mother in five year age groups. The 1970 Population Census of Malaysia, General Report, Vol. 2, page 385, provides a wonderful example of the genre. It is too large to include here, but it may be found here.
We don't need all this detail to estimate mortality. All we need, in fact, are the total numbers of children ever born and surviving to women in each five year age group, as shown in Table 1 (Feeney 1976b).
Table 1: Children Ever Born and Surviving:
Gilbert and Ellis Islands Census of 08-Dec-1973
Age Number Children Children Group of Women Ever Born Surviving -------------------------------------- 15-19 2,980 464 411 20-24 2,643 2,819 2,516 25-29 1,871 4,575 4,016 30-34 1,739 6,935 5,862 35-39 1,484 7.645 6,191 40-44 1,296 7,657 5,924 45-49 1,230 7,592 5,685 --------------------------------------
There is one caveat, however. In the event of non-response, this table should be made only for women who reported both children born and children surviving to avoid biases in the computed proportions of of surviving children (Feeney 1976a), a subtlety that escapes the tabulation recommendations of the United Nations Statistics Division's Principles and Recommendations for Population and Housing Censuses.
William Brass discovered over half a century ago that the proportions of deceased children among all children born to women in standard five year age groups approximate life table proportions of children dying between birth and certain ages. The correspondence is shown in Table 2.
Table 2: Correspondence between
age groups and q(x) values
-------------------------- 15-19 1 45-49 20 20-24 2 50-54 25 25-29 3 55-59 30 30-34 5 60-64 35 35-39 10 65-69 40 40-44 15 70-74 45 --------------------------
So, for example, the proportion of surviving children among children born to women in the 15-19 age group is (464-411)/464 = 0.114 approximates the life table proportion of children dying between birth and age 5.
Brass developed a clever adjustment procedure that involves calculating "multipliers" to the proportions deceased to give a more accurate estimate of q(x). In practice, the multipliers tend to lie within a few percent of one. Problematic assumptions and data often produce vastly greater errors, however, so for initial evaluations, at least, we can safely ignore the multiplier adjustment (cf my earlier post Rapid Assessment of Census Data on Children Born and Surviving, June 10, 2009).
The Brass method effectively assumed that mortality was constant in the decades prior to the census. This was a fairly reasonable assumption at the time the method was developed circa 1950. By the 1970s, however, when I first encountered the method as a young researcher at the East-West Center in Honolulu, mortality was declining rapidly in many developing countries.
Declining mortality resulted in relatively high estimates of mortality for children ever born to older women (children who were born longer ago) and relatively low estimates for children ever born to younger women (who were born more recently).
In the early 1970s I discovered that "longer ago" and "more recently" could be quantified; that it is possible to calculate a number of years prior to the census date to which the estimate for each age group applied. This was first reported in Feeney (1976b). The definitive presentation was published in Population Studies (Feeney 1980). See also Feeney 1987 and Feeney 1991.
Typical "years back" values for different age groups are shown in Table 3.
Table 3: Approximate correspondence between
age groups and "years back"
--------------------------- 15-19 1 45-49 15 20-24 3 50-54 19 25-29 4 55-59 22 30-34 7 60-64 24 35-39 10 65-69 26 40-44 12 70-74 27 ---------------------------
The proportion of deceased children among all children ever born to women age 35-39 years gives, for example, an estimate of the life table probability of surviving from birth to age 15 years that refers to the time 10 years prior to the census. It is a characteristic of this estimation technique that estimates refer to p points in time rather than to time periods.
The possibility of "dating" the q(x) estimates opens the possibility of using child survivorship data to estimate trends as well as levels of infant mortality. To do this, however, one additional step is needed: we need to "translate" the various q(x) values for different age groups to a common standard, such as q(5), using a one-parameter model life table family.
The procedure for estimating the trend of mortality from child survivorship data thus consists of the following steps.
1. Estimate q(x) for x = 1, 2, ... from proportions of deceased children for women aged 15-19, 20-24, ..., following the correspondence shown in Table 2.
2. Using a suitable model life table family (most obviously the model used to calculate the q(x) estimates) translate each q(x) value to a common statistic, such as q(5).
3. For each q(x), calculate the years back value and subtract it from the time of the census to determine the point in time to which this q(x) applies.
There is a substantial technical literature on child survivorship estimation, most of it dating back to the 1970s. Little work has been done in recent years.
Assessing data quality is absolutely essential. How do we do so? There are several tests, each involving comparison of a set of child survivorship estimates with estimates from another source.
If data is available from a previous census, we may estimate trends from both censuses and compare them for consistency. The simplest way to check consistency is to plot both series on the same plot. The following plot illustrates for infant mortality rate estimates based on women aged 20-50 years in the 1968 and 1973 censuses of the Gilbert and Ellis Islands, as they were then called (Feeney 1976b).
Figure 1: Child Survivorship Estimates of the Infant Mortality Rate,
Gilbert and Ellis Islands Censuses of 1968 and 1973
Observing the differences between the two series, we see that the individual estimates may err by 5-10%, but also that overall level and trend indicated by the 1973 data are broadly consistent with the level and trend indicated by the 1968 census. Fitting a straight line to the points, perhaps omitting or down-weighting the last estimate from the 1973 census, would provide a reasonably good estimate of the level and trend of mortality during the 1950s and 1960s.
Plotting really is essential here because the times to which the estimates from the two censuses refer are different, so that comparison by examining the estimates is impossible without interpolating. In addition to displaying patterns that we might miss looking at the numbers only, the plot enables a rough visual interpolation.
If vital registration data is available, we may compare child survivorship estimates to it. The following plot, redrawn from Feeney (1980), makes such a comparison for Costa Rica.
Figure 2: Infant Mortality Estimates from the 1973 Census of Costa Rica
Compared with Vital Registration Estimates
Comparison of the child survivorship IMRs with the IMRs calculated from registered births and deaths yields several conclusions. Here as in the Gilbert and Ellis Islands example, the level and trend indicated by the two sources are broadly consistent. The child survivorship estimates from the oldest age groups, 65-69 and 70-74 are clearly under estimates, but the estimate from the 65-69 group is not far off, and the estimate from the 55-59 group is very close to the vital registration value.
At the opposite end, the child survivorship estimates for the 15-19 and 20-24 age groups show an increase in infant mortality that contracts the vital registration figures as wells as the long term trend. The child survivorship estimates err here, but not because of poor data quality. The data accurately reflect differential infant mortality by age of mother. Children ever born to 15-19 year old women are necessarily born to very young women, and these children have a higher risk of death in infancy than children born to older women.
This apparent upturn in mortality is very common in child survivorship estimates, so common that in practice it is usually necessary either to discard the estimate from the 15-19 year old women or to invoke the method developed by Ewbank (1982). The data in Figure 2 indicate an upward bias in the estimate from the 20-24 year old women as well. This is far less common.
The remaining discrepancies between the two series reflect the pronounced year to year fluctuations in the vital statistics series. These discrepancies do not reflect errors in reporting of children born and surviving. They are inherent in the nature of the data, which aggregates the mortality experience of many birth cohorts and therefore imposes a strong smoothing on the underlying year to year trend.
This example calls attention to a common misconception, that child survivorship data for older women are so likely to be defective that they are not worth looking at. Empirical evidence demonstrates overwhelmingly that data for older women can give useful results (Feeney 1995). For many censuses, to be sure, data for older women are defective. There are a fair number of censuses for which data for younger women is defective, however, and there are many censuses for which data for older women is as good as data for younger women. The sensible approach is to look at all the data, assess it, and use what is useful.
Of course we don't have high expectations for very old women, but we should always look at the estimates for all available age groups, and national statistical offices should take care not to truncate published tables too early. An open ended age group less than 70+ risks discarding valuable information contained in data collected at great cost.
Population surveys, notably the Demographic and Health Surveys (www.measuredhs.com), provide a third source for comparison with census data . These typically provide direct estimates based on birth histories as well as indicated estimates based on survey questions on children ever born and surviving.
As in the above examples, there is no substitute for plotting all available estimates. This requires locating the estimates in time, however, and this requires locating the DHS estimates in calendar time. The standard DHS reports don't do us any favors here because they locate estimates by time prior to interview.
For comparisons with other data sources we need to convert time prior to interview to calendar time. A sensible way to do this is to regard all interviews as having occurred at a single point in time mid-way between the beginning and end of field work (or, if the distribution of interviews by month is provided, by calculating the mean of this distribution). The error incurred by this procedure is generally negligible in relation to other errors.
Estimates for subnational areas bring us face to face with a dilemma that has never been satisfactorily resolved. On the one hand, we cannot afford to lavish the same effort to estimates for large numbers of subnational areas that we gladly devote to estimates for the nation. As it is usually done, the work is simply too labor intensive. The labor is not in the calculations as such, which are taken care of by computer, but dealing with inputs and outputs and scrutinizing the results of the calculations and deciding what to do about problematic results.
On the other hand, if we don't do estimates for a large number of subnational areas—scores to hundreds or more, depending on the size of the country—, we undermine the unique advantage of the census, that it is a complete enumeration. This is not good politics. If all we produce estimates only for the nation and a handful of regions, we need not have taken a census. A large sample survey would provide the same results at a fraction of the cost.
There are other uses and rationales for complete enumeration, to be sure, but the rising cost of censuses is a political disadvantage that we should counter by exploiting the census data to the fullest extent possible. This means, among many other things, doing mortality estimates for a large number of small areas.
How to accomplish this? We need to decide first what sort of results should be produced. Clearly we do not want to produce the level of detail displayed in Figure 1 and Figure 2 above. Most users do not need this level of detail, and many would be confused or mislead by it. Furthermore, because the dating of estimates will in general be different for different subnational units, the estimates will not be comparable across units.
A sensible approach is to fit a straight line to the estimates from whatever age groups are used and use this line to generate estimates of the infant mortality rate (or whatever statistics are decided on) at several uniform points in time. We might for example provide an estimate as of the time of the census, or for the middle of the year in which the census is taken, and estimates for 5, 10 and 15 years prior to this point in time. This approach provides simple results that are comparable across subnational units.
There now three requirements for producing these estimates for subnational areas. First, we need to fully automate the calculations for a single subnational area. Second, we need to provide for efficient scrutiny and exercise of judgement of the results for each area. Third, we need to fully automate the process of producing provisional estimates for any number of subnational areas.
One might imagine that the first requirement is met by the existence of package programs, but this is not the case for several reasons. First, we need to automate the entire process, including providing input data to the program, sensibly fitting a straight line (robust fitting methods will generally be required), and producing a plot of the results. It must be possible to do all of these things "under program control," as computer programers say; a single command issued to the computer specifying the source of the input data must result in one or more computer files containing all the results, and printouts if we want them.
There are two complementary approaches to efficient scrutiny and decisions on the results for each subnational area. We begin with plots like those shown in Figure 1 and Figure 2` above. The trained and practiced eye can assess these plots very rapidly. Non-problematic cases, which we may expect 80 percent of the time, can be sized up in less than a minute (what "problematic means will of course depend on context).
Most of the remaining cases are likely to involve a simple fix, such as down-weighting the influence of outliers on the fitted line. Cases that can't be fixed in this way can be put aside for more intensive study. In some cases it may be appropriate to reject the estimates altogether and supply the user with a "not available" token. Providing a "best"" estimate is a disservice if, despite being best, it is a very poor estimate.
The experienced data analyst will not shrink from processing scores or hundreds of plots in this way. Of course the work cannot be done at a stretch, and a protocol for recording judgements for problematic cases must be developed, preferably including review by a second analyst. A case could be make for working in pairs, in the manner of "extreme programing." For large countries with sufficient capacity, it may be appropriate to distribute the work among regional branches of the national statistical office.
When sufficiently many plots have been scrutinized, patterns are likely to emerge that can be captured in simple numerical summaries based on the residuals of the estimates from each age group and the fitted line. The numerical summaries may then be processed by a program that identifies problematic cases and produces plots of these cases for manual processing. In this way scrutiny of plots for every area may be avoided and the work lessened.
The third and final requirement is the ability to produce preliminary results, including plots for manual scrutiny, for any number of subnational units as easily as to produce results for a single unit, whether it be 30 districts in Malawi or 2,500 county level units in China.
The best solution to this challenge consists of a command line interface and "shell" scripting in the unix tradition. Graphical user interfaces ("GUIs") have become so universal that young non-programmers may not even know what a command line is. Older non-programmers may know only of the DOS command line, which lacks the power of the unix command line.
Eric S. Raymond's book The Art of Unix Programming (2004) provides an invaluable introduction for the uninitiated. Neal Stephenson's In the Beginning was the Command Line (1999) provides an immensely entertaining, if discursive introduction to the command line. Both are available free online. I am regularly reminded in my work of the power of the command line.
Unix command line tools are widely available, thanks to Linux, Cygwin, which provides a Linux-like command line environment for Windows, and Mac OS X. A rudimentary knowledge of the unix command line and unix shell scripting will prove valuable to most programmers and to many non-programmers who work extensively with computers.
There are equally useful alternatives, however, including Stata, which I don't know but have heard well of, and R, which I know and have used from time to time for over a decade. Stata is a commercial offering, but provides good discounts to developing countries. R is available free online.
Unfortunately, there exist no “off the shelf” resources that national statistical offices can use to produce child survivorship estimates for large numbers of subnational areas. The United Nations Population Divison's MortPak package includes a program for calculating child survivorship estimates, but it would require substantial reworking for the work described here.
In fact, the estimation procedure used by MortPak, described in Chapter III of Manual X: Indirect Techniques of Demographic Estimation (United Nations 1983), is obsolete. It implements a variant of the Brass "multiplier" technique, a computational shortcut developed developed half a century ago when calculations had to be done manually using hand calculators. Given the computing power now available to everyone who does this kind of work, the sensible approach is to program solution of the estimation equations directly.
Generation of child survivorship estimates at the national level and for a small number of subnational units is not difficult and is done by many national statistical offices.
Methods for assessing the quality of child survivorship estimates are less well known and are not practiced as widely or as assiduously as they should be. Useful references on data assessment generally are the Encyclopedia of Population entry on the subject (Feeney 2004) and Chapter III of the United Nations Statistics Division's Handbook on the Collection of Fertility and Mortality Data (United Nations 2004). It is of course particularly important to assess data quality meticulously at the national level before moving on to produce large numbers of subnational estimates.
Producing estimates for more than a handful of subnational areas is a challenge that has yet to be met (to the best of my knowledge--I would be delighted to be wrong about this). Let us hope that it will be for the 2000 round censuses. And let us hope that international organizations that can contribute to making this a reality find a way to do so.
Ewbank, D. 1982. The Sources of Error in Brass's Method for Estimating Child Survival: the Case of Bangladesh. Population Studies 36(3):459-474.
Feeney, Griffith. 1976a. Tabulation of census and survey data on child survivorship Asian and Pacific Census Forum 3(1), August 1976, 5-6.
Feeney, Griffith. 1976b. Estimating Infant Mortality Rates from Child Survivorship Data by Age of Mother Asian and Pacific Census Newsletter 3(2):12-16.
Feeney, G. 1980. Estimating infant mortality trends from child survivorship data Population Studies 34(1):102-128.
Feeney, Griffith. 1987. Estimating mortality from child survivorship data: A review In The Survey Under Difficult Conditions: Population Data Collection & Analysis in Papua New Guinea, Thomas M. McDevitt, Editor. Volume 3, pp. 353-370. New Haven, Connecticut: Human Relations Area Files, Inc.
Feeney, Griffith. 1991. Child Survivorship Estimation: Methods and Data Analysis Asian and Pacific Population Forum 5(2-3):12-16.
Feeney, Griffith. 1995. The analysis of children ever born data for post-reproductive age women Notestein Seminar, Office of Population Research, Princeton University, Tuesday 14 November 1995.
Feeney, Griffith. 2003. Data Assessment In Volume 1 of the Encyclopedia of Population, Ed. Paul Demeny and Geoffrey McNicoll. New York: Macmillan Reference USA.
Stephenson, Neil. 1999. In the Beginning Was the Command Line. New York: Avon Books, Inc.
United Nations Population Division 1983. Manual X: Indirect Techniques of Demographic Estimation Department of International Social and Economic Affairs, Population Studies, No. 81. New York: United Nations.
United Nations Statistics Division 2004. Handbook on the Collection of Fertility and Mortality Data. Department of Economic and Social Affairs, Statistics Division, Studies in Methods, Series F, No. 92. New York: United Nations.