A researcher wants to determine what the average
income level is among adults in a town. The researcher randomly surveys 10
individuals, and comes up with the following results:
sex |
income |
men |
30,000 |
men |
35,000 |
men |
45,000 |
men |
50,000 |
women |
15,000 |
women |
21,000 |
women |
30,000 |
women |
30,000 |
women |
39,000 |
women |
45,000 |
|
|
Average |
34,000 |
The
researcher is worried that the average income from the sample may be too low.
She knows that there are equal numbers of men and women in the town, but the
sample contains more women than men. If women earn less than men do, then the
higher proportion of women in the sample will result in an average that is not
representative of the population average. It will be too low.
She
examines her data, and discovers that the average income for men is $40,000,
and the average income for women is $30,000. The over-representation of women
in the sample will bias the sample average it will result in an average that
is lower than the population average. The greater the disproportion of men and
women in the sample, the greater the bias. For example, if by fluke the sample
had consisted of entirely men, we might conclude that the average income of all
individuals is $40,000. If the sample had consisted entirely of women, we might
conclude that the average income of all individuals is $30,000.
To
correct for this bias we create a weighted average or mean. We give more weight
to the average income of men, since they are under-represented in the sample,
and we give less weight to the average income of women, since they are
over-represented in our sample. Our best estimate of the population average is
to weight the average incomes of men and of women according to the proportion
of men and women in the population. Since half the population are men and half
are women, the average income of men and the average income of women should be
weighted equally.
Weighted mean = Meanmen X pmen
+ Meanwomen X pwomen,
where pmen stands for the proportion of men in the population, and
pwomen stands for
the proportion of women in the population.
In the case of the town, the
formula for the weighted mean would be:
Weighted mean = $40,000 X
0.5 + $30,000 X 0.5 = $15,000 + $20,000 = $35,000
This is slightly higher than
the sample mean of $34,000.
In
this case the variable sex had only two values: men and women. Sometimes we
need to weight values by a variable that has more than two values. We may
discover that a survey overrepresents individuals living in a city, and
underrepresents those living in towns or in rural areas. This variable,
location, contains 3 values. Other variables may contain more. In this case we
would weight the mean on the variable location, but we would have to account
for all 3 values of the variable.
The
general formula for calculating a mean weighted on one variable is:
Weighted mean = S(mean for category X
population proportion for category)
A
sample may differ from the population in more than one characteristic. In such
a case, we may want to estimate population averages by weighting the data in
order to correct for possible biases due differences between people in each
category.
A
survey may ask individuals what their income was during the past year. We want
to estimate the average income for the entire population. After the survey has
been completed we discover that there are major differences between sample and
population values for 2 variables: sex and location of residence. The variable
sex has 2 categories: men and women, while the variable location has 3 values:
city, town and rural. This gives us 6 different combinations of sex and
location:
Men City |
Men Town |
Men Rural |
Women City |
Women Town |
Women - Rural |
In
order to calculate a weighted mean, we must know the mean income for
individuals in each category, and the population
percentage of individuals in each category.
|
City |
Town |
Rural |
Men |
50,000 |
40,000 |
25,000 |
Women |
40,000 |
40,000 |
15,000 |
|
City |
Town |
Rural |
Men |
35% |
12% |
7% |
Women |
35% |
8% |
3% |
The
following formula gives us the weighted average, or mean:
Weighted mean = S(mean for category X % in category) = $41,700
Below are the results of a survey that asked men and women if they were in favour or against a new law:
sex |
attitude |
men |
for |
men |
for |
men |
for |
men |
against |
women |
for |
women |
for |
women |
for |
women |
against |
women |
against |
women |
against |
This sample consists of 60% women and 40% men. Of the women sampled, 50% support the law, while 75% of the men support it. In total, 60% of the sample are in support. This may be lower than the population value, since women are over-represented in the sample, and they tend to be less favourable to the new law than men. The greater the difference between the sample and population proportions of men and women, the greater the size of the error when estimating the total proportion that support the law. However, if men and women have identical views towards the new law, then there will be no error.
If we know the true population proportions of men and women, we can correct for sampling biases by weighting. Lets take the above example. We know that the population consists of 50% women. To determine the proportion of individuals who would be for the new law if the sample had consisted of equal numbers of men and women, we can use the following formula:
% for = (%
of women for the law X % of women in the population) +
(% of men for the law X % of men in the population)
If we use the above formula to calculated the corrected or weighted proportion of individuals for the law, we get the following:
% for =(50% X 50%) + (75% X 50%) = 25% + 37.5% = 62.5%
To calculate the proportion against the law the same formula is used:
% against
= (% of women against X % of women in
the population) +
(% of men against X % of men in the population)
The % against is then =(50% X 50%) + (25% X 50%) = 25% + 12.5% = 37.5%.
In this case the weighting variable, sex, had only two categories: men and women. In the case where the weighting variable has more than two categories, the weighting formula is as follows:
% in favour = (% of category A in favour X % of A in the population) + + (% of category N in favour X % of category N in the population)
This formula can be written more generally as:
Weighted proportion = S(proportion for category X % of category in population)
In the previous example we have calculated the % for and the % against. There could be more than two values for the variable we are interested in. The variable that measures attitude toward the new law could have had 5 values: strongly agree, agree, neutral, disagree and strongly disagree. We use the same formula to calculate the % who strongly agree, the % who agree, and so on.