To correct for any other sampling method, the weight variable adjusts for the differing probabilities that cases have of being selected in a sample. In other words, cases don't all have an equal probability of being selected, i.e., not every case has a weight of 1. Using the weight variable permits making generalizations to the population from which the sample was drawn.
This can cause problems for analysts who want to perform traditional inferential statistical tests. Large N's (anything over a few hundred) will generate significant test results by the very nature of inferential statistics. One way to compensate for the scale of the weight variables used by Statistics Canada is to re-base the weight variable to the sample size.
The following SPSS code re-bases the weight variable for the Alberta
sub-sample of the 1991 individual PUMF.
compute wt=weightp*(75506/2516864).
weight by wt.
The re-basing is the result of dividing 75,506 (the sample N for Albertans
in the PUMF) by 2,516,864 (the population estimate of Alberta using the
weight variable provided with the PUMF.) The new weight (wt) now has a
sample N of 75,506 (which still is HUGE by inferential statistics standards.)An SPSS job was run to determine the two numbers used in the re-basing. This must be done before the new weight variable can be created. Below is the SPSS setup to do this. See if you can find the numbers from the output below.
> get file='/afs/ualberta.ca/dept/business/data/census91/census91ind.syst
> keep=provp sexp marstlp totincp weightp.
> select if provp eq 48.
> frequencies variables=sexp.
SEXP Sex
Valid Cum
Value Label Value Frequency Percent Percent Percent
Female 1 37617 49.8 49.8 49.8
Male 2 37889 50.2 50.2 100.0
------- ------- -------
Total 75506 100.0 100.0
Valid cases 75506 Missing cases 0
> weight by weightp.
> frequencies variables=sexp.
SEXP Sex
Valid Cum
Value Label Value Frequency Percent Percent Percent
Female 1 1253899 49.8 49.8 49.8
Male 2 1262965 50.2 50.2 100.0
------- ------- -------
Total 2516864 100.0 100.0
Valid cases 2516864 Missing cases 0
DVCURMS2 RESPONDENT'S CURRENT LEGAL MARITAL STATUSThe total number of cases in the raw data file is 13,495. There are exactly 6,759 cases with the value 1 (i.e., married) for the variable DVCURMS2 in this file, 1,500 for widowed, 528 for married but separated, etc. But as raw frequencies, they do not adjust for the sampling method and cannot be used to generalize to the Canadian population.
[ unweight ] [ weighted 1 ] [ weighted 2 ]
Value Label Value Freq. % Freq. % Freq. %
MARRIED 1 6759 50.1 11277210 54.9 7414 54.9
WIDOWED 2 1500 11.1 1124533 5.5 739 5.5
MARRIED BUT SEPARATE 3 528 3.9 598178 2.9 393 2.9
DIVORCED 4 975 7.2 1281271 6.2 842 6.2
SINGLE 5 3622 26.8 6124539 29.8 4027 29.8
NOT STATED 9 111 .8 119830 .6 79 .6
------- ----- -------- ----- ----- -----
Total 13495 100.0 20525561 100.0 13495 100.0
The middle figures for WEIGHTED 1 are based on applying the weight variable contained in the GSS 91 file, namely, FWGHT. Notice two things about applying the weight variable. First, the total N becomes 20,525,561 or the total number of Canadians in 1991 who were 18 years of age or older. Secondly, notice that the percentages differ between the unweighted and weighted distributions. This is due to the adjustment for the sampling methodology. When sampling, STC oversampled widows and divorced to ensure capturing people in these categories in the study. The weight variable corrects for this bias. Thus, 54.9% of Canadians in 1991 were married (not 50.1% as reflected in the unweighted frequency distribution.)
Finally, the re-based weight variable returns the overall N to 13,495 (i.e., the size of the sample.) Notice however, the percentages are maintained between STC's weight variables (FWGHT) and the re-based weight variable (WT). In other words, the weighted sample using WT corrects for the sampling method but also allows working with an N equal to the original sample size. The one advantage of working with the smaller N is that some researchers prefer using inferential statistical tests that simply have little meaning with the population N produced with the STC weight variable.
The bottom line: doing statistics without using one of the two weight variables produces biased results that prevents one from making generalizations to the full population. In other words, you need to apply a weight variable when doing statistical analysis with the 1991 GSS. Now, which weight variable? It doesn't really matter whether you use FWGHT or WT. Either corrects for the sampling methodology. The choice is more one of ease in working with statistical tests.