# Sociology 405/805

Problem Set 4

Due March 18, 2004

# Problems on correlation and regression

1.  Corporate social responsibility correlations.  The data in Table 1 come from Report on Business, March 2004, pp.46-47, inserted in The Globe and Mail of February 27, 2004.

a.  Choose one pair of indicator variables (COMM to HR) for which you anticipate a high correlation between the two variables and one pair for which you anticipate a smaller correlation.  Obtain the Spearman rank correlation coefficient for each pair and test for significance.  (See section 11.4.6, pp. 819-821 of chapter 11 of my text for the formulas).  Show calculations for computation of one of the Spearman correlation coefficients and the associated t-test.  Using the ranks from one pair, obtain the Pearson correlation coefficient for the ranks.  Compare with the Spearman correlation.

b.  Two commentators on corporate social responsibility argue in opposite directions  one claims that larger corporations can afford to be more socially responsible while the other argues that smaller corporations need to be more socially responsible in order to establish a more positive image.  By examining correlations between TOTAL and (i) employment, and (ii) revenues, is there evidence for either of these claims?

c.  Write a short note summarizing the results of a. and b. and any other observations you have on these data.

## Table 1.  Scores (out of 100) on indicators of corporate social responsibility

COMPANY       REV    EMP  TOT COMM GOVERN CUSTOMER EMPLOYEE ENVIRON HR

H-P           1.8    --    75   69   71      63       77       83   69

Siemens       3.1    6.6   66   56   76      47       55       82   53

Nortel       10.6   37.0   61   78   71      50       61       59   45

Xerox         1.3    4.0   61   88   50      67       58       67   32

Celestica     8.3   38.0   52   56   62      50       53       52   24

GE            3.4    7.0   48   52   59      34       49       45   40

Bombardier   23.8   80.0   44   56   54      50       37       43   34

Rogers        2.0    4.1   42   39   52      52       48       32   27

ATI           1.4    2.3   39   22   45      50       45       31   41

Key:

Rev  corporate revenue in billions of dollars

Emp  employment in thousands

Tot  overall responsibility score

Comm  community and society

Govern  corporate governance

Customer  customers

Employee  employees

Environ  environment

HR  human rights

2.  Explaining wage differences.  The data in Table 2 are from a random sample of full-time, full-year Saskatchewan employees, surveyed by Statistics Canada in the Survey of Labour and Income Dynamics.  Survey data were obtained in 2001 and refer to respondents income and labour force activity during the 2000 calendar year.  This question asks you to obtain regression equations with wages and salaries (WAGES) as the independent variable and years of education (EDYRS) as the independent variable.

a.  Draw the scatter diagram with WAGES on the vertical axis and EDYRS on the horizontal axis.

b.  One argument presented in models of human capital is that increased education is associated with increased wages and salaries.  In order to test this contention and obtain an estimate of the effect of education on wages and salaries, calculate the regression equation relating WAGES, as the dependent variable, to EDYRS as the independent variable.  Also calculate the standard error of estimate, the standard deviation of the slope of the line, and R-squared.  Also conduct a test for statistical significance of the line.  Show your calculations and describe your findings in words.

c. On the diagram of a., draw the line estimated in b. and the bands representing a distance of one standard error on each side of the line.  What proportion of individuals are within one standard error of the line?  Within two standard errors?

d.  A group of individuals completing doctorates, each having spent eight years in graduate school, following four-year undergraduate degrees, have salary offers ranging from \$45,000 to \$75,000, with an average of \$55,000.  Those at the lower end of the spectrum consider this range unfair, arguing they have just as many years of education as those offered higher salaries.  Using information from this sample, what might you say to them?

e.  Someone claims that the female (id 16) who is paid \$60,000, but has only 13 years of education, is being paid more than she deserves.  Using information from this sample, what might you comment about this claim?

f.  From the regression equation, what is the expected increase in wages and salary for the female id 5, if she obtains a university degree?  Would you have any hesitation in recommending this course of action to her?

g.  Comment on any difficulties you observe with these data, including any observations you might have about possible violation of assumptions (see Lewis-Beck, p. 26).

3. Use the data is Table 2 to obtain a regression equation with wages and salaries as the dependent variable and two or more explanatory variables.  Attempt to improve the fit of the regression equation over that of question 2.  Explain your results.

Table 2.  Data from a random sample of twenty Saskatchewan respondents, SLID 2000 (Person File)

ID AGE SEX HOURS EXP    WAGES  TINCOME EDYRS

1    40     2     1955     22        55000       55000       20.0

2    41     1      2448    21        55000       58800       14.0

3    41     1     2086     22        23000       23300       12.0

4    43     1     1926     21        39000       42475       12.5

5    41     2     2439       2          5250         6650       12.5

6    43     2     2086     22        27000       27350       12.0

7    37     2     1981     16        15500       16800       12.0

8    35     1     1945     14        60000       60000       16.5

9    46     2     2086       7        11500       13250       12.0

10   57     2     1564     36        22000       24400       10.0

11   41     2     2086       8        57500       57500       12.0

12   51     2     2086     97        49000       50000       15.1

13   30     1     2086       5        45000       45000       20.0

14   36     2     1825     13        23000       23000       12.0

15   57     1     1955     35        60000       65500       17.5

16   45     2     1825     31        60000       60000       13.0

17   30     2     1929     11        25000       25025       12.5

18   54     1     1564     97        12000       12050       11.0

19   26     2     1955       5        35000       35650       15.0

20   37     2     2086     18        22000       22000       13.0

Key:

ID  identification number of respondent

Age  age of respondent in years

Sex  sex of respondent: 1 = male, 2 = female

Hours  hours of paid work, 2000

Exp  years of experience in full-time, full-year equivalents, since respondent first worked full-time

Wages  wage or salary in dollars, 2000

Tincome  total income of respondent in dollars, before taxes, 2000

Edyrs  years of education of respondent, full-time equivalent