Problem Set 5

# Multiple Regression Models

Use the data in the SPSS data file t:\students\public\805\slidsk.sav for this problem set.  Some documentation on the data set is attached and further documentation is in same folder – a fuller description of the variables is available in the files slid2000cbk_v2.pdf (Adobe acrobat format) and a description of methods is in Income Historical Review User Guide (html file).

The data file was downloaded from IDLS (Internet Data Library System) through the Data Library (http://uregina.ca/datalibrary/) of the University of Regina Library web site.  Once on the IDLS section, select Canada – Labour, Employment and Income (465) and scroll down until you find Survey of Labour and Income Dynamics (SLID), 2000: Person file.  Further documentation about the file, including the questionnaire, can be obtained there.  I understand that you can obtain access to these files on campus and only if you are logged in to the network with your own userid and password.  Read the PERMISSION TO ACCESS DATA FILES THROUGH U of R DATA LIBRARY SERVICES before using the files from the data library.

The slidsk.sav data file is composed of all the Saskatchewan respondents to the Survey of Income and Dynamics, 2000 and this is the individual or person file, not the family or household file.  According to the IDLS web site, the proper citation for the data set is

Statistics Canada. Survey of Labour and Income Dynamics (SLID), 2000: Person file [machine readable data file]. Ottawa, ON: Statistics Canada. 7/16/2003.

I altered the original file in two ways.  I renamed several of the variables, to make it a bit easier to work with the data – the original file has very unusual variable names that are difficult to recognize or remember.  In the list below, I give both the new and original names, in case you wish to check any of these with the files on the IDLS web site.  I also recoded the variable SEX so male is 0 and female is 1, the codes for using SEX as a dummy variable.

Problems

1.  Obtain regression equations with wages and salaries (WAGES) as dependent variable and as independent variables years of schooling (YRSCHL), sex, annual hours of work (HOURS), and years of experience (EXP).

1. Use the stepwise method and the enter method.
2. Explain the results of a., commenting on the fit of the equation, statistical significance, and the meaning of the regression coefficients.
3. From the stepwise method or other considerations, do you detect any violation of assumptions with this equation?
4. Rerun the equation with weights attached and comment on any changes in the fit of the equation or the coefficients.  In order to do this, in SPSS Data Editor use Data-Weight Cases, click on Weight cases by, use WT as the Frequency Variable, and click OK.  There should be a notice Weight On at the bottom right of the data editor window.  Then run the regression equation.  Note that the statistical significances reported are now unreliable since SPSS considers the weighted number of cases as sample sizes rather than as the estimates of population sizes.  But the estimated coefficients may be more accurate estimates of the correct coefficients.  Briefly describe.

2.  Attempt to improve the regression equation from what you obtained in problem 1.  In order to do this, try some other variables that you hypothesize have an influence on wages and salaries, for example, age and age squared, immigrant status, unionized or public sector job, urban/rural residence, etc.  For some of these you will have to create new dummy variables or transform the variables.   Another alternative is to select subsets of the data set that are more uniform – e.g. select all those of age 40-50, or select those in a particular set of occupations or industries.  Explain your findings.