October 31, 2003

# Answers to Computer Problem 2

Full answers are in the t:\students\public\201\cp2af03.doc file.

1. (a) Obtain a frequency distribution, statistics (mean, median, and standard deviation), and histogram, with a normal distribution superimposed, for JOBHOURS (hours per week at job).  To obtain this, use Analyze – Descriptive Statistics  – Frequencies for JOBHOURS.  Before clicking OK, click on the appropriate statistics and click on Charts-Histogram-with normal curve, then Continue.

(b) (i) Using the mean, standard deviation, and frequency distribution, calculate the percentage of cases are within one standard deviation of the mean?  What percentage of cases are within two standard deviations of the mean?  How do these compare with the rules discussed in class.  (ii) Explain how the median study hours can be obtained from the frequency distribution. (iii) In words, identify similarities and differences of the histogram and normal distribution.

(i) The mean is 20.12 and the standard deviation is 11.764.  Rounding these to one decimal, the values are 20.1 and 11.8.   The interval within one standard deviation of the mean is from 20.1 – 11.8 = 8.3 to 20.1 + 11.8 = 31.9 or (8.3, 31.9).  Adding up the number of cases from 9 through 31 produces 2 + 24 + 2 + 13 + …. + 24 = 275 cases.  Of the total of 396 valid cases, 275 cases is (275/396) x 100% = 69.4% of cases within one standard deviation of the mean.  The rough rule in the text is that approximately two-thirds or 67% of cases are within one standard deviation of the mean, and the results for this distribution are very close to this figure.

Two standard deviations is 2 x 11.764 = 23.528 or 23.5.  The interval for two standard deviations on either side of the mean is from 20.1 – 23.5 = -3.4 to 20.1 + 23.5 = 43.6.  Since there cannot be less than 0 hours worked, this interval is form 0 to 43.6 hours or (0.0, 43.6).  This interval contains all the cases up to the category of 40 hours (there are no 41, 42, 43) and from the cumulative per cent column this is 97.2% of the cases.  The rough rule of thumb was that 95% of the cases are within two standard deviations and this distribution of hours worked has two percentage points more within two standard deviations.

(ii) To obtain the median hours worked, find the number of hours so that there are fifty per cent of the cases less than this and the other fifty per cent greater.  Using the cumulative per cent column, look for where the values cross the fifty per cent point and this is at 20 hours.  At 19 hours, there are 48.7% of the cases and by accounting for all those working exactly 20 hours, there are 64.1% of cases.  The median is at 20 hours worked per week – the same as indicated in the summary statistics table of the printout.

(iii) The histogram follows the general shape of a normal distribution fairly closely.  But there are more cases around 20 hours worked than would be suggested by an exact normal distribution of hours worked.  Then there are fewer cases around 30 hours worked than for the normal distribution (bar falls below the normal curve here).  Finally, although difficult to see, it appears that there may be more cases at 50 or more hours worked than for a normal distribution.  This latter might have been expected – the normal distribution is perfectly symmetrical around the center point, but the distribution of hours worked is asymmetrical, since it stops at 0 hours worked on the left but can extend to very large values of hours worked, well over 50 hours.  In summary, the actual distribution is close to the normal distribution, although the actual distribution is more peaked at the centre and then falls off more rapidly than the normal to the right of centre, and then extends more the right than the normal.

2. (a) Question 51 of SSAE contains information about the number of hours of work spent at various activities.  Using the same procedures as in question 1, obtain the frequency distributions, statistics (mean, median, and standard deviation), and histograms with normal distribution for the three variables study hours, extracurricular hours, and care of dependents hours.

(b) (i) From the statistics on the printout, calculate the coefficient of relative variation (CRV) for each of the three distributions.  Comparing the standard deviations and CRV, write a short note comparing the variation in the three distributions.  (ii) Which distribution appears most similar to and which most different from the normal distribution?  Explain in a sentence or two.

(i)  The coefficient of relative variations are as follows:

For study hours, CRV = (11.849/16.61) x 100 = 71.3.

For extracurricular hours, CRV = (4.472/2.12) x 100 = 210.9

For study hours, CRV = (16.993/5.65) x 100 = 300.8

 Measure Study hours Extracurricular hours Care of dependents s 11.8 4.5 17.0 CRV 71.3 210.9 300.8

In terms of number of hours, care of dependents is most variable (s=17.0 hours), with most respondents concentrated at 0 hours but then with several having quite large hours for care of dependents.  Study hours has a middle level of variability (s=11.8 hours) and extracurricular hours is least variable (s=4.5 hours).  But since the number of extracurricular hours averages only 2 hours, in relative terms extracurricular hours is more than twice as varied (CRV = 210.9) than study hours (CRV = 71.3).  Given the low mean number of hours caring for dependents (5.6 hours) and the high standard deviation, the CRV for care of dependents is greatest of all.  So clearly, hours caring for dependents is more variable, with the ranking of variability for the other two variables depending on whether absolute or relative variation is considered.

(ii) From the three diagrams, study hours appears most like the normal curve.  While the distribution peaks before the normal curve, the distribution of study hours follows the general shape of the normal distribution.  Neither of the other two variables is at all like the normal distribution – for each of these histograms there is a distinctive peak at 0 hours, and then a trailing off to the right.  Each of these two distributions is highly skewed to the right.  It would be difficult to pick one over the other as being less like the normal.

3. (a)  Use Analyze-Compare Means-Means to obtain means and standard deviations of the three variables of question 2 (on the dependent list), with sex of the respondent (on the independent list).

(b) (i) Using the means of hours spent at various activities, describe the main similarities and differences between male and female use of time.  (ii) For the variable, hours spent caring for dependents, which of males or females are more varied.

(i) For the three variables, both males and females are similar in spending by far the mean number of hours at studying – over 16 hours for both males and females, more than double that for any other activity.  But, on average as measured by the mean, males spend less than one-half the time females do caring for dependents.  In terms of extracurricular hours, the means are similar, although males spend a little more time, on average, at these activities than do females.

(ii) For care of dependents, females have greater variation in terms of variability in number of hours spent, with a standard deviation of over 19 hours, almost double the standard deviation of 11.6 for males.  But relative variation, as measured by the CRV, gives a different picture.  Since the males have a lower mean time spent caring for dependents, the CRV for them is 396, considerably higher than the CRV of 301 for females.

4. (a) Use Analyze-Compare Means-Means to obtain means and standard deviations of JOBHOURS and DEBT1 (debt at the start of the semester, question 49) by household income.

(b) From the means of hours worked at jobs and student debt of (a), write a short note comparing the patterns that students of different incomes appear to use to help finance attendance at university.

From these data, an analyst cannot be sure how students finance their studies, but the data provide some indication of possible means of finance.  First, work at jobs appear to be important at all income levels, with the mean being at least 18 hours of work per week.  Perhaps surprisingly, it is those at the highest income levels who work the greatest number of weekly hours.  For those from households of eighty thousand dollars plus, the mean weekly hours worked is 21.9 or more hours.  In contrast, for those at lower income levels, mean hours worked is below 20 hours for all groups except those from the \$20-40,000 category, and there the mean hours worked is still below the mean hours for those from the three highest income groups.  (These data might be misleading because of the relatively small sample sizes at the upper income groups).  In terms of debt, the pattern is more consistent, in that those of the lowest income groups have the largest level of accumulated debt and, as one moves up the income categories, the mean debt is lower at each successive income level (with the exception of the 80-100 thousand dollar category – although here the pattern differs in a relatively small way).

In summary, it appears that students from upper income groups may rely more on own earnings or support from parents and other sources that do not lead to accumulated debt.  In contrast, lower income groups appear to rely on earnings from jobs and loans to finance their studies.  A researcher should not over-interpret these data, but this is the direction the evidence appears to indicate.

Paul Gingrich

October 31, 2003