**Problem Set 5**

Use the
data in the SPSS data file *t:\students\public\805\slidsk.sav* for this
problem set. Some documentation on the
data set is attached and further documentation is in same folder – a fuller
description of the variables is available in the files *slid2000cbk_v2.pdf*
(Adobe acrobat format) and a description of methods is in *Income Historical
Review User Guide* (html file).

The data
file was downloaded from IDLS (Internet Data Library System) through the Data
Library (http://uregina.ca/datalibrary/)
of the University of Regina Library web site.
Once on the IDLS section, select *Canada – Labour, Employment and
Income (465)* and scroll down until you find *Survey of Labour and
Income Dynamics (SLID), 2000: Person file*.
Further documentation about the file, including the questionnaire, can
be obtained there. I understand that
you can obtain access to these files on campus and only if you are logged in to
the network with your own userid and password.
Read the *PERMISSION TO ACCESS
DATA FILES THROUGH U of R DATA LIBRARY SERVICES* before using the
files from the data library.

The *slidsk.sav*
data file is composed of all the Saskatchewan respondents to the *Survey of
Income and Dynamics, 2000* and this is the individual or person file, not
the family or household file. According
to the IDLS web site, the proper citation for the data set is

Statistics Canada. Survey of Labour and Income Dynamics
(SLID), 2000: Person file [machine readable data file]. Ottawa, ON: Statistics
Canada. 7/16/2003.

I altered
the original file in two ways. I
renamed several of the variables, to make it a bit easier to work with the data
– the original file has very unusual variable names that are difficult to
recognize or remember. In the list
below, I give both the new and original names, in case you wish to check any of
these with the files on the IDLS web site.
I also recoded the variable *SEX *so male is 0 and female is 1, the
codes for using *SEX *as a dummy variable.

** **

**Problems**

1. Obtain regression equations with wages and
salaries (*WAGES)* as dependent variable and as independent
variables years of schooling (*YRSCHL*), sex, annual hours of work (*HOURS*),
and years of experience (*EXP*).

- Use the stepwise method and the enter method.
- Explain the results of a., commenting on the fit of the equation, statistical significance, and the meaning of the regression coefficients.
- From the stepwise method or other considerations, do you detect any violation of assumptions with this equation?
- Rerun
the equation with weights attached and comment on any changes in the fit
of the equation or the coefficients.
In order to do this, in
*SPSS Data Editor*use*Data-Weight Cases*, click on*Weight cases by*, use*WT*as the*Frequency Variable*, and click*OK*. There should be a notice*Weight On*at the bottom right of the data editor window. Then run the regression equation. Note that the statistical significances reported are now unreliable since SPSS considers the weighted number of cases as sample sizes rather than as the estimates of population sizes. But the estimated coefficients may be more accurate estimates of the correct coefficients. Briefly describe.

2. Attempt to improve the regression equation from what you obtained in problem 1. In order to do this, try some other variables that you hypothesize have an influence on wages and salaries, for example, age and age squared, immigrant status, unionized or public sector job, urban/rural residence, etc. For some of these you will have to create new dummy variables or transform the variables. Another alternative is to select subsets of the data set that are more uniform – e.g. select all those of age 40-50, or select those in a particular set of occupations or industries. Explain your findings.