Griffith Feeney's Demography Website

October 12, 2009

Component projection is one of the most fundamental of all demographic techniques. This tutorial shows how to do basic component projections in five year age groups. A spreadsheet implementation of the calculations is provided by the file population-projection-china-example.xls.

Projecting a female population requires four inputs, an initial age distribution, a set of life table survival values, a set of age-specific birth rates, and a sex ratio at birth. Here are inputs for a projection of the female population of China beginning with the age distribution from the 1990 census, reference date mid-1990.

If you are looking at this display in color, blue signifies input data values.

To project the initial age distribution five years into the future we need to (1) “survive” the initial age distribution forward five years, (2) calculate the number of births to the population during this five year period, and (3) “survive” the births forward to the end of the period.

To survive the 0-4 age group—that is, to calculate how many of these persons will still be alive after five years, at which time they will be in the 5-9 age group—we multiply the number of persons (55,389) by the “survival ratio” 472,185/477,191. This gives 54,808 survivors. Display 2 shows a new column heading for the first projected distribution and the number of survivors in the 5-9 age group. The spreadsheet formula for the calculation in Cell F9 is =E9*C10/C9.

Display 2: Surviving the 0-4 Age Group

If we are doing this in a computer spreadsheet, surviving the older age groups can be accomplished in a single step by dragging this formula down to the 95-99 age group. This gives the results shown in Display 3.

Display 3: Result of Surviving the Initial Population

We are assuming here that no one survives past age 100. Though not strictly true, this is a reasonable approximation here. If it were not, we would take the oldest age group to be sufficiently high that it would be reasonable. While it is possible to work with open-ended age groups, it is simpler not to.

Next we calculate births to the population by (a) averaging the initial and projected numbers of women in the age groups 15-19, 20-24, ..., 45-49, (b) multiplying these averages by five to give person years lived by reproductive age women in each age group, (c) multiplying person years lived in each age group by the age-specific birth rate for this age group to give births to women in the age group during the period, and (d) summing over all reproductive age groups to give total births. The calculation looks like this.

Display 4: Calculating Births

The PYL column shows person years lived, the Births column the number of births to women in each age group, and, below, total births and female births. Female births are calculated by multiplying total births by the proportion of births that are female, calculated from the sex ratio at birth (SRB) as 100/(100+SRB).

To survive female births during the period to the end of the period—that is, to calculate how many of the female children born during the period will be alive at the end of the period, at which time they will be in the 0-4 age group—, we multiply the number of births (58,568) by 477,191/500,000. This gives 57,568 survivors in the 0-4 age group.

We now have projected numbers for all age groups, so this completes the description of one complete “cycle” of population projection. The result is shown in Display 5 below. Since we have numbers for all age groups, we add a formula for total population in Cell F26.

We usually want to do additional cycles, however, and this is greatly facilitated (a) encapsulating the entire calculation of the projected number of persons 0-4 into a single formula in Cell F6 and (b) anchoring all formula references to columns ‘C’ and ‘D’. These are complications, and we dislike complication, but the payoff here is so high that the complications are worthwhile, as we will soon see.

The encapsulation can be done directly, but it is less clumsy to use the AVERAGE and SUMPRODUCT functions. Instead of averaging the initial and projected population and then multiplying by the age-specific birth rates, we calculate the sum of the products of the initial age distribution times the age-specific birth rates and average this with the sum of products of the projected age distribution times the age-specific birth rates. The formula, including the sex ratio at birth term to obtain female births and the survival term to get surviving children, but before anchoring, is shown in Display 6A.

To “anchor” the cell references to the survival values in Column C and the age-specific birth rates in Column D we need to put a ‘$’ sign before each occurrence of the letters ‘C’ and ‘D’ in the formulas. The formula for surviving the initial 0-4 age group becomes `=E6*$C7/$C6`. We edit the formula in Cell F7 accordingly and drag it down again to anchor the references in the formulas for older age groups. The formula in Display 6A becomes the formula shown in Display 6B.

Display 5: The First Projected Age Distribution

Display 6: Spreadsheet Formula for Calculating Births

(A) Unanchored

=AVERAGE(SUMPRODUCT(D9:D15,E9:E15),SUMPRODUCT(D9:D15,F9:F15))*

5*(100/(100+D28))*(C6/500000)

5*(100/(100+D28))*(C6/500000)

(B) Anchored

=AVERAGE(SUMPRODUCT($D9:$D15,E9:E15),SUMPRODUCT($D9:$D15,F9:F15))

*5*(100/(100+$D28))*($C6/500000)

*5*(100/(100+$D28))*($C6/500000)

The reward for all this nuisance with the formulas is that we can now produce a second projected distribution simply by selecting the entire column containing the first projected distribution, Cells F2:F26, and dragging it to the right one column. Better yet, we can produce third and higher order projected distributions in a single operation by dragging to the right by more than one column. Projecting forward 100 years is no more work than projecting forward 10 years.

Display 7 shows projected age distributions for 2000 and 2005, effortlessly produced from spreadsheet shown in Display 5. Since that was so easy, why don’t we have a look at the results? The first step—heeding the Tukian injunction “Never fail to plot and look!”—is to plot the initial age distribution, as in Display 8.

Display 7: Projections to 2000 and 2005

The population is young, with an overall downward trend in numbers of persons in successively older age groups, but there are notable fluctuations in numbers below age 40—fewer people in the 30-34 age group than in the 35-39 age group, declining numbers as we move from the 20-24 age group to the 10-14 age group, and rising numbers moving from there to the 0-4 age group. These fluctuations reflect the remarkable changes in fertility in China over the past 40 years.

Display 8: Plot of the Initial Age Distribution:

China Females, 1990 Census

Now let’s add the first projected distribution to the plot, as in Display 9. What do we see? How does the projected distribution compare with the initial distribution? Can we tell by looking at the plots whether total population increased or declined? Based on this comparison, what do we expect the projection for 2005 to look like?

Display 9: Initial and First Projected Age Distribution:

China Females, 1990 Census

The principal focus of Part 1 of this tutorial has been on how to carry out the projection. But why do we want to calculate projections? “To forecast future population,” is one answer, but far from the most important one. The most important reasons are, first, to gain an understanding of population dynamics that can be gained in no other way, and second, to have at our disposal a powerful tool for assessing the quality of demographic data.

One thing we’ve learned, incidentally, is that we don’t need computer packages to do population projections. We might want to use them in national statistical offices for “production” purposes, but for learning about projection, computer spreadsheets are by far the best tool.