Sociology 405/805

January 8, 2004

Review and Introduction

1. Production of Data

Use existing data sources in this class. Be familiar with variables and how they are defined, and with the methods of data collection. For this class, we will have to work with these data as they come to us, but when working with these data, should be aware of definitions and methods used, and what procedures were used to obtain the data.

· Data are produced not just collected.

· Social research issues, theory, approach.

· Definition of variables – theoretical and operational.

· Questions used and potential answers.

· Organization of responses when data are presented.

· Sampling procedures.

¨ Population or sample.

¨ Non-probability or probability.

¨ Random, stratified, cluster, multistage.

¨ Interview, questionnaire, administrative data.

· Errors in data – non-sampling and sampling.

· Integrate data production with statistical procedures to be used, if possible.

· Use statistical analysis of previous data sets to improve data production for subsequent projects.

· Replicate existing studies or use questions from existing studies for comparative purposes.

2. Types of Measurement

a. Discrete or Continuous

· Discrete – number of possible values can be counted.

· Continuous – cannot count all possible values; possible values can be matched with some portion of a line segment.

b. Level of Measurement

· Nominal – Can classify values into categories; name or number them. Sex, ethnicity.

· Ordinal – Values can be ordered or ranked as less than, greater than, or equal to. Order of finish, Likert-scale attitudes.

· Interval – Differences or intervals are meaningful; equal numerical differences represent equal magnitudes; well-defined unit of measure. Height, weight, time, age, income.

· Ratio – Ratios of values meaningful; non-arbitrary 0 point. Height, weight, income, age.

· Levels of measurement are hierarchical. That is, all scales are nominal. Ordinal scales are also nominal. Interval scales are both nominal and ordinal. Ratio scales are nominal, ordinal, and interval.

· Most interval scales are also ratio. Temperature may be only interval and not ratio.

· These levels of measurement determine the type of summary statistic that can be calculated and the type of statistical analysis that can be used.

· Where possible, construct variable in order to measure it at the highest possible level.

· Attitudes, opinions and many psychological variables are measured only at the ordinal level, but statistical analysis appropriate for interval or ratio level scales is commonly used.

· Some statistical methods appropriate only for interval or ratio level scales can use nominal or ordinal level scales through appropriate reconstruction of variables – e.g. dummy variables.

3. Descriptive Statistics

a. Distributions. Frequency, percentage, proportional distributions. Organization of data into categories. Need nominal scale only.

b. Positional Measures. Percentiles, deciles, quintiles, quartiles, median. Need ordinal scale.

c. Central Tendency. Mode, median, mean. Mode for nominal scale, median for ordinal, mean for interval and ratio.

d. Variation. Variation, variance, standard deviation. Assume interval or ratio level scales. Use variation for ANOVA. For ordinal scales, interquartile range or other measures based on positional measures can be used.

e. Measures of Association. Lambda, phi, V, Q, tau. Covariation and correlation coefficients.

f. Regression. Regression coefficients and regression equation.

g. Standardization. Z-values and beta coefficients.

· Each statistic provides a particular summary view of the distribution of the data.

· Use statistics appropriate to level of measurement and anticipated use of the data.

· Interpretation of statistics.

· Where data come from samples or where the data make inferences concerning the influence of particular variables, the variability of the estimate must be considered. Use of interval estimates and hypothesis tests.

4. Probability and Sampling Distributions

· Classical, frequency, subjective interpretations of probability.

· Independence and dependence of events.

· Random variables and expected values.

· Probability distributions – Normal, t, F, and chi-square distributions.

· Sampling

¨ Representative

¨ Random

¨ Sampling Distributions

¨ Stratified, cluster, multistage samples

¨ Sampling error

· Models

¨ Probabilistic, not deterministic

¨ Multivariate

¨ Description and explanation

5. Inferential Statistics

a. Sampling Error

b. Interval Estimates

Confidence level
Confidence interval
Meaning of interval estimates

c. Hypothesis Tests

Null and alternative (research) hypotheses
Level of significance
One or two-tailed tests
Test statistic
Interpreting test results
Types of error

6. Other Issues

Independent and dependent variables
Interaction
Assumptions
Interpretation
Unobserved variables
Errors in variables

7. Statistical programs – SPSS and Minitab

Constructing a data set – data entry, labels
Organizing data – selection of cases, transformation of variables
Sampling and weighting
Statistical procedures
Different types of data sets – survey, weighted, time series
SPSS syntax files

Last edited January 10, 2004