Griffith Feeney's Demography Website

The DEMOGRAPHY-STATISTICS-INFORMATION TECHNOLOGY Letter

#6 Jun 2013 Download PDF

INTERPOLATION is a way of filling in a series of given values with "in between" values.

*Point interpolation* fills in points between a series of given points. Given numbers of life table survivors at ages 1, 5 and 10, for example, values for ages 2, 3 and 4 and 6, 7, 8 and 9 may be required.

*Group interpolation* breaks down grouped data on a continuous variable into smaller constituent groups. Given a population age distribution in five year age groups, numbers for single years of age may be required.

Demographers have traditionally used polynomial interpolation methods, but these have several disadvantages.

- There are many methods; there may be no good reason for choosing one over the others
- The formulas are complex and tedious; understanding is difficult
- The methods may require that the values be given at regular intervals
- Most importantly, the methods break down for some data patterns

Shryock and Siegel present an example in their old but classic and still useful compendium (see the RESOURCES section below). They observe that “In census or survey counts of population by 5-year age groups there may be a large drop from one age group to the next, followed by a much smaller drop”. The text then displays the following uncaptioned plot.

Figure 1

Shryock and Siegel example of polynomial interpolation breakdown

The interpolation breaks down because nothing in the data suggests the fall-rise-fall pattern in the interpolated values. The pattern is an artifact of the interpolation procedure.

I have developed a new approach to interpolation based on the idea of "minimum roughness". It has several advantages over traditional methods. In particular, it gives satisfactory results for observed data for which polynomial methods break down, as shown in Figure 2.

Figure 2

Minimum roughness interpolation applied to Shryock and Siegel example

The superiority of the interpolated points in Figure 2 is immediately evident.

The same problem may occur with point interpolation. The solid dots in Figure 3 below show mid-year population of persons aged 0-4 at 5 year intervals for 1965 through 1990 for Sri Lanka, as estimated by the United Nations Population Division. The hollow circles show the result of fifth degree polynomial interpolation.

Figure 3

Polynomial point interpolation

Figure 4 below shows minimum roughness interpolation applied to the same given points. The difference between Figures 3 and 4 is not as dramatic as between Figures 1 and 2, but we will probably prefer the minimum roughness interpolated values for the more sensible pattern of interpolated points between 1965 and 1975.

Figure 4

Minimum roughness point interpolation

How does minimum roughness interpolation work? We don't need any special mathematical prowess, only a bit of experience using formulas in spreadsheets and the Excel SOLVER Add-In or an equivalent facility.

We begin with the idea that the interpolated points together with the given points should form as smooth a curve as the given points allow. To get smoothness, we define a measure of “roughness” and then chose interpolated values to minimize this measure of roughness.

Given a series of points at one year intervals of age or time, calculate the following for each point except the first and the last.

*Step 1*The average of the y-values for the preceding and following points.*Step 2*The difference between this average and the y-value of the point.*Step 3*The square of this difference.

Now sum these squared differences. The result is a measure of roughness.

To find interpolated values, follow these steps.

*Step A*create a spreadsheet with cells for the observed and interpolated points.*Step B*Enter the observed values in the observed value cells and sensible initial values in the interpolated value cells.*Step C*Put a formula that calculates the roughness measure described above in a cell labeled “Measure of Roughness”.*Step D*Use SOLVER to find interpolated values (“By Changing Cells” in SOLVER) that minimize this measure of roughness (“Set Target Cell” in SOLVER).

For point interpolation, the initial values between two given points may be set equal to the value of the first point. For group interpolation they may be set to the number in the given group divided by the number of constituent groups to be interpolated.

For group interpolation, constraints should be used to ensure that the sum of the interpolated values for each group equals the given number in the group. Constraints may also be used for point interpolation to ensure, for example, monotonicity of interpolated values where this is appropriate.

Minimum roughness interpolation has several advantages.

- It gives good results for data for which other methods break down, as illustrated by the above examples.
- It provides a single, unified approach to interpolation. The only formula required is the one defining the measure of roughness.
- It is not restricted to observed data given at equal intervals.
- It is easily implemented with a computer spreadsheet program (no tables of constants needed).

Why wasn't minimum roughness interpolation invented long ago? One obvious answer is that calculating the interpolated values in Figure 4 requires numerical minimization of a function of 20 variables. This took less than one second on my modest laptop computer, but 40 years ago, when Shryock and Siegel was first published, very few people had access to this kind of computing power.

Minimum roughness methods do have the disadvantage that the numerical minimization algorithm may fail to converge. Often this is easily fixed by tinkering with the goodness of fit measure, usually nothing more than multiplying differences by a multiple of ten before squaring. But it is possible one will hit a dead end. Another limitation is that problems involving larger numbers of parameters may choke SOLVER—for now.

A final *caveat* applies to all interpolation methods. We use them because the data don't provide as much information as we would like. We can use interpolation to supply what is missing, but the accuracy of the interpolated values is uncertain. There is no Taylor Remainder Theorem for demographic data!

Shryock and Siegel Volume 2 (1973) is available on Google Books. Follow the link and search the authors names. The plot shown above appears in Chapter 22 on page 701.

Readers familiar with Excel and the SOLVER tool (not installed by default) may download and study minimum-roughness-group-interpolation.xls, which illustrates the implementation of group interpolation, and minimum-roughness-point-interpolation.xls.

Griffith Feeney Ph.D.

Scarsdale, New York, USA

EMAIL feeney@gfeeney.com

OFFICE +1 914 595 1916

MOBILE +1 914 721 3950

SKYPE gfeeney

**DOWNLOAD CV Paper Size A4 or
LETTER**

DSITL is an occasional
email letter on demography, statistics, and information technology by
Griffith Feeney. License: Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
To subscribe (unsubscribe), send email to
feeney *at* gfeeney *dot* com
with “subscribe” (“unsubscribe”) in the subject line.