The DEMOGRAPHY-STATISTICS-INFORMATION TECHNOLOGY Letter
#6 Jun 2013 Download PDF
INTERPOLATION is a way of filling in a series of given values with "in between" values.
Point interpolation fills in points between a series of given points. Given numbers of life table survivors at ages 1, 5 and 10, for example, values for ages 2, 3 and 4 and 6, 7, 8 and 9 may be required.
Group interpolation breaks down grouped data on a continuous variable into smaller constituent groups. Given a population age distribution in five year age groups, numbers for single years of age may be required.
Demographers have traditionally used polynomial interpolation methods, but these have several disadvantages.
Shryock and Siegel present an example in their old but classic and still useful compendium (see the RESOURCES section below). They observe that “In census or survey counts of population by 5-year age groups there may be a large drop from one age group to the next, followed by a much smaller drop”. The text then displays the following uncaptioned plot.
Shryock and Siegel example of polynomial interpolation breakdown
The interpolation breaks down because nothing in the data suggests the fall-rise-fall pattern in the interpolated values. The pattern is an artifact of the interpolation procedure.
I have developed a new approach to interpolation based on the idea of "minimum roughness". It has several advantages over traditional methods. In particular, it gives satisfactory results for observed data for which polynomial methods break down, as shown in Figure 2.
Minimum roughness interpolation applied to Shryock and Siegel example
The superiority of the interpolated points in Figure 2 is immediately evident.
The same problem may occur with point interpolation. The solid dots in Figure 3 below show mid-year population of persons aged 0-4 at 5 year intervals for 1965 through 1990 for Sri Lanka, as estimated by the United Nations Population Division. The hollow circles show the result of fifth degree polynomial interpolation.
Polynomial point interpolation
Figure 4 below shows minimum roughness interpolation applied to the same given points. The difference between Figures 3 and 4 is not as dramatic as between Figures 1 and 2, but we will probably prefer the minimum roughness interpolated values for the more sensible pattern of interpolated points between 1965 and 1975.
Minimum roughness point interpolation
How does minimum roughness interpolation work? We don't need any special mathematical prowess, only a bit of experience using formulas in spreadsheets and the Excel SOLVER Add-In or an equivalent facility.
We begin with the idea that the interpolated points together with the given points should form as smooth a curve as the given points allow. To get smoothness, we define a measure of “roughness” and then chose interpolated values to minimize this measure of roughness.
Given a series of points at one year intervals of age or time, calculate the following for each point except the first and the last.
Now sum these squared differences. The result is a measure of roughness.
To find interpolated values, follow these steps.
For point interpolation, the initial values between two given points may be set equal to the value of the first point. For group interpolation they may be set to the number in the given group divided by the number of constituent groups to be interpolated.
For group interpolation, constraints should be used to ensure that the sum of the interpolated values for each group equals the given number in the group. Constraints may also be used for point interpolation to ensure, for example, monotonicity of interpolated values where this is appropriate.
Minimum roughness interpolation has several advantages.
Why wasn't minimum roughness interpolation invented long ago? One obvious answer is that calculating the interpolated values in Figure 4 requires numerical minimization of a function of 20 variables. This took less than one second on my modest laptop computer, but 40 years ago, when Shryock and Siegel was first published, very few people had access to this kind of computing power.
Minimum roughness methods do have the disadvantage that the numerical minimization algorithm may fail to converge. Often this is easily fixed by tinkering with the goodness of fit measure, usually nothing more than multiplying differences by a multiple of ten before squaring. But it is possible one will hit a dead end. Another limitation is that problems involving larger numbers of parameters may choke SOLVER—for now.
A final caveat applies to all interpolation methods. We use them because the data don't provide as much information as we would like. We can use interpolation to supply what is missing, but the accuracy of the interpolated values is uncertain. There is no Taylor Remainder Theorem for demographic data!
Shryock and Siegel Volume 2 (1973) is available on Google Books. Follow the link and search the authors names. The plot shown above appears in Chapter 22 on page 701.
Readers familiar with Excel and the SOLVER tool (not installed by default) may download and study minimum-roughness-group-interpolation.xls, which illustrates the implementation of group interpolation, and minimum-roughness-point-interpolation.xls.