The Vincent-El Badry Method
©1998 by Griffith Feeney
Revised January 1998 (Original January 1996)
In many past censuses enumerators have frequently omitted to record "0" for childless women, leaving the space on the census schedule blank or perhaps recording a dash. The census records will then show these women as having not responded to the question on number of children ever born (CEB), rather than as having no children. This creates a downward bias in reported proportions of childless women and a corresponding upward bias in reported proportions of women with one or more children ever born and in mean number of children ever born. The Vincent-El Badry method is a tool for diagnosing this problem and, in certain conditions, correcting for it.
Let the following denote observed values.
Wi - number of women in age group
NSi - number of women with CEB Not Stated
Ci - number of childless women
where i = 1, 2, ..., indexes age groups from youngest to oldest. Number of women refers to women to whom the children ever born question was addressed, typically either all women or all ever married women.
From these observed numbers the following proportions are computed.
nsi = NSi/Wi
ci = Ci/Wi
Note that upper case letters denote numbers, lower case letters proportions.
Let the corresponding (unknown) true values be the same prefaced by t. E.g., tCi denotes the true number of childless women.
(1) The true proportion of women not stating children ever born is the same in every age group. Let this proportion be denoted ns.
(2) The true proportion of zero parity women who are incorrectly recorded as CEB not stated is the same for all age groups. Let this proportion be denoted p.
These assumptions may be relaxed by interpretation in the context of particular applications, as illustrated below.
The second assumption means that
(1a) Ci = tCi - ptCi = tCi(1-p) and
(1b) NSi = tNSi + ptCi
These equations simply indicate the transfer of the improperly recorded women, ptCi in number, from the childless to the not stated category.
Solving (1a) for tCi gives
(2) tCi = Ci/(1-p).
Substituting this in (1b) gives
(3) NSi = tNSi + [p/(1-p)]Ci.
Dividing this by Wi gives
(4) nsi = tnsi + [p/(1-p)]ci.
These steps are pure (and very elementary) algebra. By assumption (1), however, tnsi has the constant value ns, so that we obtain finally
(5) nsi = ns + [p/(1-p)]ci.
This equation includes the observed values nsi and ci for each age group and two unknown parameters, ns and p.
Values for ns and p are estimated by fitting a straight line to the points (ci, nsi). The intercept of the fitted line gives an estimate of ns. The slope of the fitted line equals p/(1-p), whence p is given by s/(1+s), s denoting the slope.
The true ci values may be computed in either of two ways. First, from (1a),
(6a) tci = ci/(1-p)
Second, adding (1a) and (1b) and rearranging terms,
(6b) tci = ci + nsi - ns
If the fit is perfect, these two formulas will give the same result. In practice, both may be computed and their ratio examined to give an indication of how well the method is working.
The interpretation of p is straightforward: it is the estimated proportion of zero parity women who are incorrectly recorded as having failed to report number of children ever born. It is used to correct the observed proportions of childless women using formula (6a).
The interpretation of ns requires a distinction between "real" and "spurious" not stated cases. "Real" not stated cases are women for whom the enumerator attempted to obtain an answer to the children ever born question but was unable to do so. "Spurious" not stated cases are women who had no children, and who might have been accurately identified as such, but for whom improper behavior of the enumerator resulted in the "children ever born not stated" classification.
The estimated true proportions of women childless vary in quality according to age group. Estimates for women aged 20-50 are often rather good. Estimates for older women may be poor because true proportions not stated tend to increase with age beyond age 50. Estimates for women under 20, and especially for women under age 15, may be very poor. While the explanation for this is unclear, it evidently has to with the very high proportions of zero parity women at these young ages. The residuals of the fitted line and the ratios of the two estimates of the true proportion of zero parity women in each age group provide a guide for interpretation.
The data points (ci, nsi) should be plotted and scrutinized before fitting a line, and the fitted line should in general aim to minimize residuals for the points for reproductive age women. Once a line is fit, residuals should be plotted and examined. Resist any temptation to omit these steps, at the risk of producing silly and potentially embarrassing results. When working by computer, robust fitting methods should be used.
Children ever born data for ever married women in the Indian state of Maharashtra as of the 1981 census are given on page 574 of Maharashtra, Census of India - 1981, SR. 12, Maharashtra, Part - VI - A & B, Fertility tables. The proportions of women with CEB not stated and of childless women are as follows
Table 1: Input Data age nsi ci <15 0.5368 0.4446 15-19 0.3281 0.3541 20-24 0.1462 0.1419 25-29 0.0611 0.0548 30-34 0.0403 0.0334 35-39 0.0348 0.0272 40-44 0.0353 0.0280 45-49 0.0379 0.0275 >50 0.0474 0.0322
The following plot shows the scatter of nsi against ci together with a fitted line. The intercept and slope of the fitted line are 0.0080 and 0.9737.
The following table shows fitted nsi values and residuals.\
Table 2: Fitted Values and Residuals age fit res <15 0.4409 0.0959 15-19 0.3528 -0.0247 20-24 0.1462 0.0000 25-29 0.0614 -0.0003 30-34 0.0405 -0.0002 35-39 0.0345 0.0003 40-44 0.0353 0.0000 45-49 0.0348 0.0031 50+ 0.0394 0.0080
The residuals are plotted against age groups identified by number, 1 being the youngest age group <15, in the following figure. The fit is extremely good in the reproductive ages, with slight deterioration to both sides of this range, but extremely poor for the <15 age group.
The intercept and slope of the fitted line give ns = 0.0080 and p = 0.4933. The ns value indicates a "true" level of understatement of 0.8 percent. The p value indicates that nearly half of all childless women were recorded as children ever born not stated. The two possible estimates of corrected proportions of zero parity women are shown in the following table.
Table 3: Corrected Proportions of Zero Parity Women age meth1 meth2 ratio <15 0.8774 0.9734 1.11 15-19 0.6988 0.6742 0.96 20-24 0.2800 0.2801 1.00 25-29 0.1082 0.1079 1.00 30-34 0.0659 0.0657 1.00 35-39 0.0537 0.0540 1.01 40-44 0.0553 0.0553 1.00 45-49 0.0543 0.0574 1.06 50+ 0.0635 0.0716 1.13
The observed values of childlessness for older women shown in Table 1 above are around 2.8 percent. The p = 0.4933 implies a multiplication of the observed values by 1/(1-.4933) = 1.9736, i.e., just under a doubling of the observed values.
The bottomline is that the level of childlessness in Maharashtra as of the 1981 census is about double the level indicated by the unadjusted census data, roughly 5.5 percent as compared with 2.8 percent, a very large difference indeed.
This example, which was chosen merely because the Maharashtra data were conveniently at hand, shows how very important the Vincent-El Badry adjustment may be.
State of Selangor, Malaysia, 1970 Census
M. V. Del Tufo, A Report on the 1947 Census of Population, The Government Printer, Federation of Malaya, Kuala Lumpur. Contains a useful discussion of the problem (not the method) by a census taker. See pages 65-70.
Paul Vincent, L'Utilization des statistiques des familles, Population 1, January-March, 1946. Pages 143-148 seem to give the essentials of the method described here, though my french is not particularly good. My copy of this paper bears a note "Cf. Henry 1953, page 40," which I cannot at present track down.
M. R. El Badry, Failure of enumerators to make entries of zero: Errors in recording childless cases in population censuses, Journal of the American Statistical Association 56(296), 1961, pages 909-924. This is more widely cited, in the English speaking world, at any rate.
United Nations, Manual X, Indirect Techniques for Demograhic Estimation, Population Studies No. 81, Department of International Economic and Social Affairs, New York, 1983. Contains an expostion in Annex II, pages 230-235.
I find the expositions in the preceding two sources unsatisfactory. El Badry's exposition does not identify the two assumptions of the method with sufficient clarity and his restriction to ages below 40 keeps us from learning much that the data for older women have to tell us. Manual X states that equation (5) is "plausible," which it isn't. Without a derivation on explicit assumptions it is neither plausible nor implausible, and no derivation is given.
Alberto Palloni, Adjusting data on children-ever-born for nonresponse, Social Biology, Vol. 28, No. 3-4, 1981, pages 308-314. This paper would appear to contain relevant material, but despite several readings I have not been able to make sense of it.
You may download the download the files of which this document is comprised and give copies to others provided that you provide the complete set of files and do not alter the contents of any file. Print copies may be made for personal use. All other rights reserved.