Social Studies 201

Fall 2003

Monday, September 8, 2003

Introductory notes - see Class Syllabus

Contact information and offfice hours. Monday 1:30 - 2:30 p.m., Thursday, 9:00 - 10:00 a.m. or by appointment. This is the only class I am teaching this semester, but I also have many committee meetings, so may be out of my office much of the time. If you wish to arrange a time to meet me, please contact me before or after class, leave a message on the voice mail on my telephone (585-4196), or send me an email message (paul.gingrich@uregina.ca). I'll try to respond promptly to email, so if you have any brief questions that can be dealt with by email, I'll attempt to answer those quickly.

Text. There is only one textbook, with two parts to it. Total cost is around $50. I wrote this textbook about ten years ago so it is now out of date on some examples, but it covers the materials for this introductory statistics class. Point out any errors or misleading explanations in the text. I have provided lots of examples, but if you need more around exam time, I will be providing more - on reserve in the University Library and on the web site for the class. Note that most of the examples in the text are worked out. You may want to try to work out some of the examples yourself, before looking at the answers, since there are few problems that do not have solutions provided.

The textbook is too long, but it combines the explanations with the problems, so that it is like a text and a manual together. As we cover each part of the course, I will tell you which section or pages to read.

If you want another approach than mine, it would be useful to look at some other statistics texts. The text I used for several years was Ott, Larson and Mendenhall, Statistics: A Tool for the Social Sciences - this provides a good introduction to statistics. Almost any introductory statistics text can be useful in presenting a little different explanation. While it can be helpful to use more than one text, the mathematical notation may differ, and this can cause confusion.

For the computer work, and for some later parts of the course, I will provide extra handouts with examples. Also, I will place old problem sets and answers to problems on reserve in the University Library and on the web site.

Grading - problem sets and examinations. 40% of the total grade comes from the problem sets, 15% from each of the two midterms, and 30% from the final examination.

There will be 5 or 6 problem sets, two before the first midterm, two between the two midterms and one or two between the midterm and the final examination. I will usually give you a week to complete these and, once they are handed in, I will attempt to provide model answers, or schedule sessions in the lab time to discuss the problems. I will hand out the first problem set on Friday, September 12 and it will be due on September 19. The second will be from September 19 to 29, and we will attempt to have it marked and returned to you before the first midterm. 40% of the final grade will be based on the problem sets including the computer problems.

In addition to the regular problem sets, there will be several computer problem sets. Later in the semester, the computer problem sets will be merged with the regular problem sets.

The purpose of the problem sets is to learn how to do statistics, practice for exams, and to obtain points. With all the problem sets, this is a class where you have to keep up with the assignments. If you put in the time, you should be able to obtain a reasonably good grade. Do not expect to do well if you just study before each exam and ignore the work in between.

Examinations are all open book, but can be difficult - they ask you to develop answers to problems, similar to the problems on the problem sets. Final examination will be in the regularly scheduled time. I will schedule review sessions before each examination - some of these may be during the regularly scheduled lab times.

Calculator and math background. In order to do statistics, it is necessary to use some mathematics, and the statistical problems build on the ordinary arithmetic operations (addition, subtraction, multiplication, and division) and also use some algebra that you should be familiar with from secondary school. But I will explain all the formulae and do example problems on the blackboard. The aim of this class is not to become proficient in manipulating algebraic materials, but to use the formulae to do statistical problems. At the same time, there is a lot of mathematics used in the course, and you have to develop a certain ability in this area. If this has been difficult for you in the past, attempt to tackle the problems or additional examples from the web site. There will also be a student assistant available, so consult the assistant or me if you are having difficulty. We can also use parts of the Tuesday labs to go over problems and work on extra problems.

Labs and computer. Much of the work in a statistics class can be done with a calculator, or even with a paper and pencil. Later in the course, the formulae become more complex, and require more calculation. In addition, some data sets have many cases and many variables, so it can be very time consuming to do everything with a calculator. For these reasons, we will do some of the statistical work on the computer. This semester we will be using SPSS (Statistical Package for the Social Sciences), a program that many statisticians and survey researchers use to analyze survey data. Along with the SPSS program, we will use a data set produced from a survey of University of Regina undergraduates in 1998. I will provide more details about when we begin the computer labs.

There should be time in the computer lab times to do all the computer problems. However, if you need extra time, the computer lab (CL109) is generally open, except when other classes are scheduled there. I will prepare handouts for SPSS and this should be sufficient to do the work in this class.

The computer labs will begin Tuesday, September 16.

Accommodation and university policies. Take note of the possibilities for accommodation for those with special needs. If you have any special needs, please contact me as soon as possible. All students should be familiar with the relevant University policies, so take note of these on the attached sheet or in the University Calendar 2003-2004.

Web site. I have constructed a web site for the class for this semester. At present, there are only limited materials on the web site, but I will be adding more as we proceed through the semester. The address of the web site is http://uregina.ca/~gingrich/. Note that there is no www. before uregina.ca. Some of the material on the web site is in Acrobat Reader format, that is, with file type pdf. If you are unable to view files of this type on your computer, you can use the computers in CL109 at times when there is no class in that room.

The web site has three parts. The first part is where I will put the notes, examples, and problems for this semester. I will attempt to update these at least weekly, and hopefully more frequently. The second part is examples and examinations from previous semesters. I have posted all the problem sets and model answers from the Winter 2001 semester, the last time I taught this class. I have also included all the examinations from Fall 2001 and Winter 2001, but without answers - I will be using some of the questions from these examinations for your problem sets this semester. The final part of the web site is the textbook. I will be posting large parts of the text on the web site. I will not guarantee that everything from the text will be available on the web site, but I will provide large parts of it as we proceed through the semester. This will generally be in pdf format.

If anyone is not able to use the web site, please notify me. I can provide printed copies of all materials on the web site to the University Library, and make them available there at the Reserve Desk.



Aim of the Class

1. Learning basic statistical methods. After you have completed this class, you should be familiar with basic statistical concepts and measures. Do not expect to be an expert in statistical methods after one semester, just as you would not expect to be an expert on other areas after only one class in the area. But you should be able to understand what statistics is, what can be done with it, and tackle some straightforward types of statistical problems.

2. Healthy skepticism. I encourage students to have a healthy skepticism to statistical data and interpretations. This involves developing an appreciation of the usefulness of statistical data and methods at the same time as adopting a critical approach to the data and methods. Here I outline a few aspects of this and I will attempt to provide more examples as we proceed through the semester.

a. Misleading statistics. Some statistics that are stated in verbal arguments or published are incorrect or misleading. Examples include partial results presented by publicists, politicians, or advertisers - a decline in the unemployment rate may be touted as evidence of solid economic gains when in fact the job market is not so positive. Some election polls are inaccurate or unable to predict election results - at the same time other polls are fairly accurate representations of public opinion. One result of encountering misleading statistics is to take the view that statistical data and methods are useless or invalid. Some claim that any result can be proved by statistics and one well-known book is How to Lie with Statistics. Some researchers also reject statistics on theoretical or practical grounds, considering that only qualitative information is valid and that quantitative data have so many weakesses that the data can be ignored.

b. Hard data. At the other end of the spectrum are those who are unwilling to believe anything that does not have what they consider to be hard, statistical data to back it up - number and quantitative data. Such researchers may consider qualitative data as being soft, incomplete, or only suggestive. Those adopting this approach may consider statistics as the best method of proof, and perhaps the only method of proof. For example, a claim that student debt has increased dramatically, along with statements of personal experiences of debt problems may be ignored by politicians or policy-makers unless there is quantitative data to support such claims. Folk or traditional medical care methods treated with skepticism by much of medical profession because carefully controlled experiments, using well-established statistical methods, have not been conducted.

c. Useful if carefully applied. My approach is in between these two extremes. I view statistics as a legitmate and powerful approach to the study of social phenomena and issues. It is often a useful approach to describing people, their characteristics, views, and behaviour, and it can help in building models and theories that can be used to understand the social world. Sometimes basic data is essential to understanding a social issue. Examples include the study of poverty and inequality, equity issues in the labour force and politics, or trends in crime rates is associated with statistical data and approaches. In each of these cases, well constructed statistical data has identified both problems and suggested possible solutions. At the same time, each of these subject areas has been associated with some poor or misleading data. Differences of views concerning the causes, severity, and solution to social issues or problems have been associated with debates over definitions and interpretations of data and statistical methods and models.

d. One of many social science methods. Part of the problem with statistics is that it is often misused, so that it can provide misleading results or be used to bolster incorrect arguments and draw misleading conclusions. It is always necessary to remember that statistics is only one tool available to social science practitioners, and one that is best suited to certain types of data and certain types of problems. Other tools may be more useful in other circumstances. Methods such as in-depth interviews, stories, oral histories, participant observation, historical records, novels or movies, etc. may provide a better understanding and interpretation of social issues. Such methods often complement statistical methods. Theoretical approaches and the use of human reason are also necessary, although these must be informed by some types of data (quantitative or qualitative).

3. Reading journals and articles. Academic articles, books, and journals tend to be more quantitative today than in earlier periods. After taking a statistics class, you should be able to understand some of the statistical approaches used in academic articles, even if you do not know all details of the methods. Some are difficult to understand though, eg. regression, factor analysis, analysis of variance, or structural equation models. In order to understand some of these later methods, you will need to do further study of statistics.

4. Further study of statistics. Social Studies 201 should provide all the basic statistical concepts and approaches so that you can continue the study of statistics later. Each discipline emphasizes a somewhat different set of techniques, and this is what upper level course in statistics cover. For example, psychologists tend to emphasize the analysis of variance (ANOVA), t-tests and factor analysis. Economists tend to rely heavily on regression.

If you find you like statistics, and work with data, I would encourage you to take an upper level class in your discipline. Many of you will need more statistics for your honours or graduate work, and others may find it useful in employment after completing a degree.

For Sociology majors, the next class is Social Studies 306, where we again use SPSS, producing and analyzing survey data. There is a more advanced statistics course also, Social Studies 405/805, which the Department attempts to offer every other year.



Class outline

A. Descriptive statistics. The first section of the class, chapters 1-5 of the text, deals with descriptive statistics. This is the most commonly used type of statistics, and the type that you will most commonly encounter. Almost all the statistics published in newspapers and reports are descriptive in nature, describing various phenomena. Descriptive statistics includes tables, graphs, charts, maps, diagrams, etc. - hopefully helping us understand what is being described.

Chapter 1 of the text is an introduction to statistics and the text. Read it quickly to get an overview.

In order to present data, it is necessary to consider how data are organized. This is the aim of Chapters 2 and 3. Chapter 2 examines issues related to the production of data, including discussion of assumptions involved in their production. In this chapter, I attempt to identify some of the main issues that must be addressed by anyone who works with quantitative data or produces such data. Do not spend a long time on this chapter, but attempt to be generally familiar with the issues raised in sections 2.3 through 2.6. Section 2.7 uses the example of Statistics Canada's Labour Force Survey as an example of a relatively successful and thorough approach to data production, albeit one that has some problems and shortcomings, as do any data. This example is now outdated, but I will provide some more recent summary data on labour statistics.

Chapter 3 discusses how social constructs can be measured. Phenomena such as length or weight are measured in well understood and well defined units such as metres or kilograms, respectively. Social constructs such as attitudes, opinions, intelligence, ability, alienation, social solidarity, class consciousness, and ethnic or national identity are not so clear cut and are much more difficult to define and measure. After some of the different approaches to measurement have been examined, Chapter 4 deals with organizing these data for purposes of description. A few of the ways of organizing data into charts, tables and graphs are discussed in Chapter 4.

Another way to present data is to calculate summary measures that succintly describe the phenomena being examined. It may be tedious to look at all the data, or we may be overwhelmed by all the data concerning a population or a social issue. In Chapter 5, summary measures of centrality and variation are discussed. Examples include average income, expectation of life, median score on standardized tests; variation in incomes and income inequality, standard scores on tests. Measures of central tendency and variation are the most widely used summary statistics in both popular and academic applications.

If we keep to the time schedule of the Class Syllabus, the first midterm will be based on this section of the course and we should complete Part I of the text just before the first midterm.

B. Inferential statistics. Part II of the text deals with inferential statistics and this occupies the remaining time and work in Social Studies 201.

1. Probability. As an introduction to the concepts required to study inferential statistics, Chapter 6 deals with probability and the normal distribution. Probability is used for two main reasons in the social sciences. One reason is that data may be obtained from some type of random or probability sample of population members, using surveys or experiments. The randomness of sample selection means that probability can be used to obtain inferences about the social science issue or concern under investigation. A typical example of how probability is used in assessing the reliability of sample results is the following quote from the polling agency Ipsos-Reid (available from Ipsos-Reid web site: http://www.ipsos-reid.com/media/ in the item ''Two-Thirds (66%) of Canadians Believe Blackout Result of Technical, Not Supply Problems – 61% in Ontario Agree'').

These are the findings of an Ipsos-Reid/CTV/Globe and Mail poll conducted between August 19th and August 21st, 2003. The poll is based on a randomly selected sample of 1,030 adult Canadians. With a sample of this size, the results are considered accurate to within ± 3.1 percentage points, 19 times out of 20, of what they would have been had the entire adult Canadian population been polled. The margin of error will be larger within regions and for other sub-groupings of the survey population. These data were statistically weighted to ensure the sample's regional and age/sex composition reflects that of the actual Canadian population according to the 2001 Census data.

Second, the social sciences use various models to describe or explain what occurs in the real world. Some of these models are probabilistic in nature, and require some understanding of the principles of probability. One example is insurance rates - high risk occupations pay higher life insurance rates, older people lower home insurance rates.

One of these models is the normal probability distribution - the so called bell curve. Some researchers consider this distribution to describe characteristics of actual populations. In particular, some instructors think that class grades should follow a normal or bell curve. This is an application of a mathematical model. The normal curve has many other applications in statistics, and learning to use this curve is essential to understanding statistics. Whether this curve does describe the distribution of characteristics or behaviour of an actual population is another question - it may be that human populations are not so well-described by the normal distribution as some researchers and analysts claim.

Our concern with probability is not with probability as a study in and of itself. Rather, we will be mainly concerned with the principles of probability, and their applications in inferential statistics.

2. Sampling distributions. Chapter 7 is a transitional chapter, dealing with what are called sampling distributions. These are mathematical distributions that are useful when conducting samples of a population or experiments in a population. They describe how a particular measure behaves under repeated sampling. For example, opinion polls such as the Gallup poll provide estimates of the proportion of people with a particular characteristic. But another researcher selecting a different set of individuals in a sample would obtain a somewhat different estimate. The potential variation in these proportions can be described mathematically through the sampling distribution of the proportion. We will examine this in Chapter 7.

The third section of the class deals with techniques used in inferential statistics. These techniques are discussed in detail in Chapters 7-10. These chapters include what statisticians call hypothesis tests and estimation procedures. The aim of these is to infer, from survey or experimental results, conclusions about a whole population. Such conclusions always have probabilities attached to them.

3. Estimation. Estimation is the method used in surveys and polls about opinions of members of a population, where a survey of a small group of people can be used to make inferences about the nature of opinion, attitudes, or other characteristics of a large population. The probability of obtaining a result that is in error by no more than some specified amount can be calculated if the data come from a sample which has been randomly selected from the whole population.

4. Hypothesis testing begins by making an hypothesis, and then using a sample or experiment to test it. The hypothesis may be that people higher on the income scale are more likely to vote in a more conservative fashion, while those with lower income may vote NDP, with Liberals in the middle. These hypotheses may be based on previous research findings and on our observations of what interests each party is regarded as addressing. Hypotheses may be more complex, involving extensive theoretical and quantitative research. In each case, principles of probability are used to determine the probability of the hypothesis being true or false.

There are many different types of hypothesis testing, but the principles involved in each type are much the same. By the end of the semester you should have a good grasp of these methods.

5. Regression and correlation have been left out of this semester's outline. The chapter on regression is also left out of Part II of the textbook - to make the book a bit shorter and less expensive. If we do have time at the end of the semester, we will briefly examine the relationship between two variables by studying the methods of correlation and regression.



Last edited on August 27, 2003