Quantitative data

 

A researcher wants to determine what the average income level is among adults in a town. The researcher randomly surveys 10 individuals, and comes up with the following results:

sex

income

men

30,000

men

35,000

men

45,000

men

50,000

women

15,000

women

21,000

women

30,000

women

30,000

women

39,000

women

45,000

 

 

Average

34,000

 

The researcher is worried that the average income from the sample may be too low. She knows that there are equal numbers of men and women in the town, but the sample contains more women than men. If women earn less than men do, then the higher proportion of women in the sample will result in an average that is not representative of the population average. It will be too low.

 

She examines her data, and discovers that the average income for men is $40,000, and the average income for women is $30,000. The over-representation of women in the sample will bias the sample average – it will result in an average that is lower than the population average. The greater the disproportion of men and women in the sample, the greater the bias. For example, if by fluke the sample had consisted of entirely men, we might conclude that the average income of all individuals is $40,000. If the sample had consisted entirely of women, we might conclude that the average income of all individuals is $30,000.

 

Weighting on one variable

To correct for this bias we create a weighted average or mean. We give more weight to the average income of men, since they are under-represented in the sample, and we give less weight to the average income of women, since they are over-represented in our sample. Our best estimate of the population average is to weight the average incomes of men and of women according to the proportion of men and women in the population. Since half the population are men and half are women, the average income of men and the average income of women should be weighted equally.

 

Weighted mean = Meanmen X pmen + Meanwomen X pwomen,

 

where pmen  stands for the proportion of men in the population, and

pwomen stands for the proportion of women in the population.

 

In the case of the town, the formula for the weighted mean would be:

Weighted mean = $40,000 X 0.5 + $30,000 X 0.5 = $15,000 + $20,000 = $35,000

 

This is slightly higher than the sample mean of $34,000.

 

In this case the variable sex had only two values: men and women. Sometimes we need to weight values by a variable that has more than two values. We may discover that a survey overrepresents individuals living in a city, and underrepresents those living in towns or in rural areas. This variable, location, contains 3 values. Other variables may contain more. In this case we would weight the mean on the variable location, but we would have to account for all 3 values of the variable.

 

The general formula for calculating a mean weighted on one variable is:

Weighted mean = S(mean for category X population proportion for category)

 

Weighting on multiple variables

A sample may differ from the population in more than one characteristic. In such a case, we may want to estimate population averages by weighting the data in order to correct for possible biases due differences between people in each category.

 

A survey may ask individuals what their income was during the past year. We want to estimate the average income for the entire population. After the survey has been completed we discover that there are major differences between sample and population values for 2 variables: sex and location of residence. The variable sex has 2 categories: men and women, while the variable location has 3 values: city, town and rural. This gives us 6 different combinations of sex and location:

Men – City

Men – Town

Men – Rural

Women – City

Women – Town

Women - Rural

 

In order to calculate a weighted mean, we must know the mean income for individuals in each category, and the population percentage of individuals in each category.

 

Mean Income

 

City

Town

Rural

Men

50,000

40,000

25,000

Women

40,000

40,000

15,000

 

Percent of Population

 

City

Town

Rural

Men

35%

12%

7%

Women

35%

8%

3%

 

The following formula gives us the weighted average, or mean:

 

Weighted mean = S(mean for category X % in category)      =        $41,700

 


Qualitative Data

 

Below are the results of a survey that asked men and women if they were in favour or against a new law:

sex

attitude

men

for

men

for

men

for

men

against

women

for

women

for

women

for

women

against

women

against

women

against

 

This sample consists of 60% women and 40% men. Of the women sampled, 50% support the law, while 75% of the men support it. In total, 60% of the sample are in support. This may be lower than the population value, since women are over-represented in the sample, and they tend to be less favourable to the new law than men. The greater the difference between the sample and population proportions of men and women, the greater the size of the error when estimating the total proportion that support the law. However, if men and women have identical views towards the new law, then there will be no error.

 

If we know the true population proportions of men and women, we can correct for sampling biases by weighting. Let’s take the above example. We know that the population consists of 50% women. To determine the proportion of individuals who would be for the new law if the sample had consisted of equal numbers of men and women, we can use the following formula:

 

% for =  (% of women for the law X % of women in the population) +
(% of men for the law X % of men in the population)

 

If we use the above formula to calculated the corrected or weighted proportion of individuals for the law, we get the following:

 

% for =(50% X 50%) + (75% X 50%)  =  25% + 37.5%  = 62.5%

 


To calculate the proportion against the law the same formula is used:

 

% against =  (% of women against X % of women in the population) +
(% of men against X % of men in the population)

 

The % against is then =(50% X 50%) + (25% X 50%)  =  25% + 12.5%  = 37.5%.

 

 

In this case the weighting variable, sex, had only two categories: men and women. In the case where the weighting variable has more than two categories, the weighting formula is as follows:

 

% in favour = (% of category A in favour X % of A in the population) + …+ (% of category N in favour X % of category N in the population)

 

This formula can be written more generally as:

 

Weighted proportion = S(proportion for category X % of category in population)

 

 

In the previous example we have calculated the % for and the % against. There could be more than two values for the variable we are interested in. The variable that measures attitude toward the new law could have had 5 values: strongly agree, agree, neutral, disagree and strongly disagree. We use the same formula to calculate the % who strongly agree, the % who agree, and so on.