Problem Set 5
Use the data in the SPSS data file t:\students\public\805\slidsk.sav for this problem set. Some documentation on the data set is attached and further documentation is in same folder – a fuller description of the variables is available in the files slid2000cbk_v2.pdf (Adobe acrobat format) and a description of methods is in Income Historical Review User Guide (html file).
The data file was downloaded from IDLS (Internet Data Library System) through the Data Library (http://uregina.ca/datalibrary/) of the University of Regina Library web site. Once on the IDLS section, select Canada – Labour, Employment and Income (465) and scroll down until you find Survey of Labour and Income Dynamics (SLID), 2000: Person file. Further documentation about the file, including the questionnaire, can be obtained there. I understand that you can obtain access to these files on campus and only if you are logged in to the network with your own userid and password. Read the PERMISSION TO ACCESS DATA FILES THROUGH U of R DATA LIBRARY SERVICES before using the files from the data library.
The slidsk.sav data file is composed of all the Saskatchewan respondents to the Survey of Income and Dynamics, 2000 and this is the individual or person file, not the family or household file. According to the IDLS web site, the proper citation for the data set is
Statistics Canada. Survey of Labour and Income Dynamics (SLID), 2000: Person file [machine readable data file]. Ottawa, ON: Statistics Canada. 7/16/2003.
I altered the original file in two ways. I renamed several of the variables, to make it a bit easier to work with the data – the original file has very unusual variable names that are difficult to recognize or remember. In the list below, I give both the new and original names, in case you wish to check any of these with the files on the IDLS web site. I also recoded the variable SEX so male is 0 and female is 1, the codes for using SEX as a dummy variable.
1. Obtain regression equations with wages and salaries (WAGES) as dependent variable and as independent variables years of schooling (YRSCHL), sex, annual hours of work (HOURS), and years of experience (EXP).
2. Attempt to improve the regression equation from what you obtained in problem 1. In order to do this, try some other variables that you hypothesize have an influence on wages and salaries, for example, age and age squared, immigrant status, unionized or public sector job, urban/rural residence, etc. For some of these you will have to create new dummy variables or transform the variables. Another alternative is to select subsets of the data set that are more uniform – e.g. select all those of age 40-50, or select those in a particular set of occupations or industries. Explain your findings.