October
31, 2003
Full answers are in the t:\students\public\201\cp2af03.doc file.
1. (a) Obtain a frequency
distribution, statistics (mean, median, and standard deviation), and histogram,
with a normal distribution superimposed, for JOBHOURS (hours per week at
job). To obtain this, use Analyze – Descriptive Statistics – Frequencies for JOBHOURS. Before clicking OK, click on the appropriate
statistics and click on Charts-Histogram-with normal curve, then Continue.
(b) (i) Using the mean,
standard deviation, and frequency distribution, calculate the percentage of
cases are within one standard deviation of the mean? What percentage of cases are within two standard deviations of
the mean? How do these compare with the
rules discussed in class. (ii) Explain
how the median study hours can be obtained from the frequency distribution.
(iii) In words, identify similarities and differences of the histogram and
normal distribution.
Answer:
(i)
The mean is 20.12 and the standard deviation is 11.764. Rounding these to one decimal, the values
are 20.1 and 11.8. The interval within
one standard deviation of the mean is from 20.1 – 11.8 = 8.3 to 20.1 + 11.8 =
31.9 or (8.3, 31.9). Adding up the number
of cases from 9 through 31 produces 2 + 24 + 2 + 13 + …. + 24 = 275 cases. Of the total of 396 valid cases, 275 cases is
(275/396) x 100% = 69.4% of cases within one standard deviation of the
mean. The rough rule in the text is
that approximately two-thirds or 67% of cases are within one standard deviation
of the mean, and the results for this distribution are very close to this
figure.
Two
standard deviations is 2 x 11.764 = 23.528 or 23.5. The interval for two standard deviations on either side of the
mean is from 20.1 – 23.5 = -3.4 to 20.1 + 23.5 = 43.6. Since there cannot be less than 0 hours
worked, this interval is form 0 to 43.6 hours or (0.0, 43.6). This interval contains all the cases up to
the category of 40 hours (there are no 41, 42, 43) and from the cumulative per
cent column this is 97.2% of the cases.
The rough rule of thumb was that 95% of the cases are within two
standard deviations and this distribution of hours worked has two percentage
points more within two standard deviations.
(ii)
To obtain the median hours worked, find the number of hours so that there are
fifty per cent of the cases less than this and the other fifty per cent
greater. Using the cumulative per cent
column, look for where the values cross the fifty per cent point and this is at
20 hours. At 19 hours, there are 48.7%
of the cases and by accounting for all those working exactly 20 hours, there
are 64.1% of cases. The median is at 20
hours worked per week – the same as indicated in the summary statistics table
of the printout.
(iii)
The histogram follows the general shape of a normal distribution fairly
closely. But there are more cases
around 20 hours worked than would be suggested by an exact normal distribution
of hours worked. Then there are fewer
cases around 30 hours worked than for the normal distribution (bar falls below
the normal curve here). Finally,
although difficult to see, it appears that there may be more cases at 50 or
more hours worked than for a normal distribution. This latter might have been expected – the normal distribution is
perfectly symmetrical around the center point, but the distribution of hours
worked is asymmetrical, since it stops at 0 hours worked on the left but can
extend to very large values of hours worked, well over 50 hours. In summary, the actual distribution is close
to the normal distribution, although the actual distribution is more peaked at
the centre and then falls off more rapidly than the normal to the right of
centre, and then extends more the right than the normal.
2. (a) Question 51 of SSAE contains information about the number of hours
of work spent at various activities.
Using the same procedures as in question 1, obtain the frequency
distributions, statistics (mean, median, and standard deviation), and
histograms with normal distribution for the three variables study hours, extracurricular
hours, and care of dependents hours.
(b) (i) From the statistics on the printout, calculate the coefficient
of relative variation (CRV) for each of the three distributions. Comparing the standard deviations and CRV,
write a short note comparing the variation in the three distributions. (ii) Which distribution appears most similar
to and which most different from the normal distribution? Explain in a sentence or two.
Answer
(i)
The coefficient of relative variations are as follows:
For study hours, CRV =
(11.849/16.61) x 100 = 71.3.
For extracurricular hours,
CRV = (4.472/2.12) x 100 = 210.9
For study hours, CRV =
(16.993/5.65) x 100 = 300.8
Measure |
Study hours |
Extracurricular hours |
Care of dependents |
s |
11.8 |
4.5 |
17.0 |
CRV |
71.3 |
210.9 |
300.8 |
In terms of number of hours,
care of dependents is most variable (s=17.0 hours), with most respondents
concentrated at 0 hours but then with several having quite large hours for care
of dependents. Study hours has a middle
level of variability (s=11.8 hours) and extracurricular hours is least variable
(s=4.5 hours). But since the number of
extracurricular hours averages only 2 hours, in relative terms extracurricular
hours is more than twice as varied (CRV = 210.9) than study hours (CRV = 71.3).
Given the low mean number of hours
caring for dependents (5.6 hours) and the high standard deviation, the CRV for
care of dependents is greatest of all.
So clearly, hours caring for dependents is more variable, with the
ranking of variability for the other two variables depending on whether
absolute or relative variation is considered.
(ii) From the three diagrams,
study hours appears most like the normal curve. While the distribution peaks before the normal curve, the
distribution of study hours follows the general shape of the normal
distribution. Neither of the other two
variables is at all like the normal distribution – for each of these histograms
there is a distinctive peak at 0 hours, and then a trailing off to the
right. Each of these two distributions
is highly skewed to the right. It would
be difficult to pick one over the other as being less like the normal.
3. (a) Use Analyze-Compare Means-Means to
obtain means and standard deviations of the three variables of question 2 (on
the dependent list), with sex of the respondent (on the independent list).
(b) (i) Using the means of
hours spent at various activities, describe the main similarities and
differences between male and female use of time. (ii) For the variable, hours spent caring for dependents, which
of males or females are more varied.
(i) For the three variables, both males and females are similar in spending by far the mean number of hours at studying – over 16 hours for both males and females, more than double that for any other activity. But, on average as measured by the mean, males spend less than one-half the time females do caring for dependents. In terms of extracurricular hours, the means are similar, although males spend a little more time, on average, at these activities than do females.
(ii)
For care of dependents, females have greater variation in terms of variability
in number of hours spent, with a standard deviation of over 19 hours, almost
double the standard deviation of 11.6 for males. But relative variation, as measured by the CRV, gives a different
picture. Since the males have a lower
mean time spent caring for dependents, the CRV for them is 396, considerably
higher than the CRV of 301 for females.
4. (a) Use Analyze-Compare
Means-Means to obtain means and standard deviations of JOBHOURS and DEBT1
(debt at the start of the semester, question 49) by household income.
(b) From the means of hours
worked at jobs and student debt of (a), write a short note comparing the
patterns that students of different incomes appear to use to help finance
attendance at university.
From these data, an analyst cannot be sure how students finance their studies, but the data provide some indication of possible means of finance. First, work at jobs appear to be important at all income levels, with the mean being at least 18 hours of work per week. Perhaps surprisingly, it is those at the highest income levels who work the greatest number of weekly hours. For those from households of eighty thousand dollars plus, the mean weekly hours worked is 21.9 or more hours. In contrast, for those at lower income levels, mean hours worked is below 20 hours for all groups except those from the $20-40,000 category, and there the mean hours worked is still below the mean hours for those from the three highest income groups. (These data might be misleading because of the relatively small sample sizes at the upper income groups). In terms of debt, the pattern is more consistent, in that those of the lowest income groups have the largest level of accumulated debt and, as one moves up the income categories, the mean debt is lower at each successive income level (with the exception of the 80-100 thousand dollar category – although here the pattern differs in a relatively small way).
In summary, it appears that students from upper income groups may rely more on own earnings or support from parents and other sources that do not lead to accumulated debt. In contrast, lower income groups appear to rely on earnings from jobs and loans to finance their studies. A researcher should not over-interpret these data, but this is the direction the evidence appears to indicate.
Paul Gingrich
October 31,
2003