Social Studies 201

Fall 2004

Wednesday, September 8, 2004

 

Introductory notes – see Class Syllabus

 

Contact information and offfice hours.   Monday 1:30  – 2:30 p.m., Thursday, 11:00 a.m. – 12:00 p.m. or by appointment.   My other class is from 11:30 to 12:20, Monday, Wednesday, Friday, so I am not available at those time.  I also have many committee meetings, so will not be in office all day.   If you wish to arrange a time to meet me, please contact me before or after class, leave a message on the voice mail on my telephone (585-4196), or send me an email message (paul.gingrich@uregina.ca).  I will try to respond promptly to email, so if you have any brief questions that can be dealt with by email, I’ll attempt to answer those quickly.

 

Text.  There is only one textbook, with two parts to it.  Total cost is around $50.  I wrote this textbook about ten years ago so some examples are now out of date, but it covers the materials for this introductory statistics class.  Point out any errors or misleading explanations in the text.  I have provided lots of examples in the text and there are many examples on the web site.  Note that most of the examples in the text are worked out.  You may want to try to work out some of the examples yourself, before looking at the answers, since there are few problems that do not have solutions provided.  

 

The textbook is too long, but it combines the explanations with the problems, so that it is like a text and a manual together.  As we cover each part of the course, I will mention which section or pages to read.

 

If you want another approach than mine, it would be useful to look at some other statistics texts.  The text I used for several years was Ott, Larson and Mendenhall, Statistics: A Tool for the Social Sciences – this provides a good introduction to statistics.  Almost any introductory statistics text can be useful in presenting a little different explanation.  While it can be helpful to use more than one text, the mathematical notation may differ, and this can cause confusion.

 

Class handouts from time to time.

 

Grading – problem sets and examinations.    40% of the total grade comes from the problem sets, 15% from each of the two midterms, and 30% from the final examination.

 

There will be 5 or 6 problem sets, two before the first midterm, two between the two midterms and one or two between the midterm and the final examination.  I will usually give you a week to complete these and, once they are handed in, I will attempt to provide model answers, or schedule sessions in the lab time to discuss the problems.   I will hand out the first problem set on Monday, September 13 and it will be due on September 20.  The second will be from September 20 to October 1, and we will attempt to have it marked and returned to you before the first midterm.  40% of the final grade will be based on the problem sets, including the computer problems.

 

In addition to the regular problem sets, there will be several computer problem sets.  Later in the semester, the computer problem sets will be merged with the regular problem sets. 

 

The purpose of the problem sets is to learn how to do statistics, practice for exams, and to obtain points.  With all the problem sets, this is a class where you have to keep up with the assignments.  If you put in the time, you should be able to obtain a reasonably good grade.  Do not expect to do well if you just study before each exam and ignore the work in between.

 

Examinations are all open book, but can be difficult – they ask you to develop answers to problems, similar to the problems on the problem sets.   Final examination will be in the regularly scheduled time.  I will schedule review sessions before each examination – some of these may be during the regularly scheduled lab times.

 

The mean grade in the Fall 2003 semester was 73% and in the Winter 2004 semester was 70%.  The grade distribution for these semesters was as follows:

 

Grade

Frequency

Fall 2003

Winter 2004

90+

3

2

80-89

6

2

70-79

17

2

60-69

4

4

50-59

5

2

Total

35

12

 

Calculator and math background.   In order to do statistics, it is necessary to use some mathematics.  The statistical problems build on the ordinary arithmetic operations (addition, subtraction, multiplication, and division) along with some algebra that you may  be familiar with from secondary school.  But I will explain all the formulae and do example problems on the blackboard.  The aim of this class is to prepare students to use the formulae to do statistical problems, with less emphasis on becoming proficient in manipulating the algebraic materials.  At the same time, there is a lot of mathematics used in the course, and you have to develop a certain ability in this area.   If this has been difficult for you in the past, attempt to tackle the problems or additional examples from the web site.  There will also be a student assistant available, so consult the assistant or me if you are having difficulty.  We can also use parts of the Tuesday labs to go over problems and work on extra problems.

 

Labs and computer.   Much of the work in a statistics class can be done with a calculator, or even with a paper and pencil.  Later in the course, the formulae become more complex, and require more calculation.  In addition, some data sets have many cases and many variables, so it can be very time consuming to do all the computations with only a calculator.  For these reasons, we will do some of the statistical work on the computer.  This semester we will be using the Statistical Package for the Social Sciences (SPSS), a program that many statisticians and survey researchers use to analyze survey data.  Along with the SPSS program, we will use a data set produced from a survey of University of Regina undergraduates in 1998.  I will provide more details about when we begin the computer labs.  

 

There should be time in the computer lab times to do all the computer problems.  However, if you need extra time, the computer lab (CL109) is generally open, except when other classes are scheduled there.  I will prepare handouts for SPSS and this should be sufficient to do the work in this class.

 

The computer labs will begin on Tuesday, September 14.

 

Accommodation and university policies.  Take note of the possibilities for accommodation for those with special needs.  If you have any special needs, please contact me as soon as possible.  All students should be familiar with the relevant University policies, so take note of these on the attached “Faculty of Arts Academic Announcements Fall 2004” or in the University Calendar 2004-2005

 

Web site.  Materials for Social Studies 201 are on my web site.  For earlier semesters, connect to the links for Social Studies 201 for Fall 2003 and Winter 2004 – all the problem sets, answers, and examinations from those semesters are on the web site, along with some notes.  For this semester, I will be adding new notes or revising those from the previous two semesters.  The address of the web site is http://uregina.ca/~gingrich/. Note that there is no www. before uregina.ca.  Some of the material on the web site is in Acrobat Reader format, that is, with file type pdf.  If you are unable to view files of this type on your computer, you can use the computers in CL109 at times when there is no class in that room. 

 

The web site has various sections.  In the section “Fall 2004 Semester,” I will put the notes, examples, and problems for this semester.  I will attempt to update these at least weekly, and hopefully more frequently.  In the Fall 2003 and Winter 2004 sections, there are examples and examinations from previous semesters.  I have posted all the problem sets and model answers from the Winter 2001 semester.  I have also included all the examinations from Fall 2001 and Winter 2001, but without answers – I will be using some of the questions from these examinations for your problem sets this semester.  Another part of the web site is the textbook.  I have posted most of the text, with the exception of the diagrams, on the web site.  These files are in Acrobat/pdf format.

 

If anyone is not able to use the web site, please notify me.  I can provide printed copies of all materials on the web site to the University Library, and make them available there at the Reserve Desk.

 

 


Aim of the Class 

 

1. Learning basic statistical methods.  After you have completed this class, you should be familiar with basic statistical concepts and measures.  Do not expect to become an expert in statistical analysis after only one semester of introductory statistics, just as you would not expect to be an expert on other areas after only one class in the subject.  But you should be able to understand what statistics is, what can be done with it, and tackle some straightforward types of statistical problems. 

 

2. Healthy skepticism.  I encourage students to have a healthy skepticism about statistical data and interpretations.  This involves developing an appreciation of the usefulness of statistical data and methods at the same time as adopting a critical approach to the data and methods.  Here I outline a few aspects of this and I will attempt to provide more examples as we proceed through the semester. 

 

a. Misleading statistics.  Some statistics that are stated in verbal arguments or published are incorrect or misleading.  Examples include partial results presented by publicists, politicians, or advertisers – a decline in the unemployment rate may be touted as evidence of solid economic gains when in fact the job market is not so positive.  Some election polls are inaccurate or unable to predict election results – at the same time other polls are fairly accurate representations of public opinion.  One result of encountering misleading statistics is to take the view that statistical data and methods are useless or invalid.  Some claim that any result can be proved by statistics and one well-known book is How to Lie with Statistics.  Some researchers also reject statistics on theoretical or practical grounds, considering that only qualitative information is valid and that quantitative data have so many weakesses that the data can be ignored.  

 

b. Hard data.  At the other end of the spectrum are those who are unwilling to believe anything that does not have what they consider to be hard, statistical data to back it up – number and quantitative data.  Such researchers may consider qualitative data as being soft, incomplete, or only suggestive.  Those adopting this approach may consider statistics as the best method of proof, and perhaps the only method of proof.   For example, a claim that student debt has increased dramatically, along with statements of personal experiences of debt problems may be ignored by politicians or policy-makers unless there is quantitative data to support such claims.  Folk or traditional medical care methods treated with skepticism by much of medical profession because carefully controlled experiments, using well-established statistical methods, have not been conducted.

 

c. Useful if carefully applied.   My approach is in between these two extremes.  I view statistics as a legitmate and powerful approach to the study of social phenomena and issues.  It is often a useful approach to describing people, their characteristics, views, and  behaviour, and it can help in building models and theories that can be used to understand the social world.  Sometimes basic data is essential to understanding a social issue.  Examples include the study of poverty and inequality, equity issues in the labour force and politics, or trends in crime rates is associated with statistical data and approaches.  In each of these cases, well constructed statistical data have assisted in identifying both problems and possible solutions.  At the same time, each of these subject areas has been associated with some poor or misleading data.  Differences of views concerning the causes, severity, and solution to social issues or problems have been associated with debates over definitions and interpretations of data and statistical methods and models.

 

d. One of many social science methods.  Part of the problem with statistics is that it is often misused, so that it can provide misleading results or be used to bolster incorrect arguments and draw misleading conclusions.  It is always necessary to remember that statistics is only one tool available to social science practitioners, and one that is best suited to certain types of data and certain types of problems.  Other tools may be more useful in other circumstances.  Methods such as in-depth interviews, stories, oral histories, participant observation, historical records, novels or movies, etc. may provide a better understanding and interpretation of social issues.  Such methods often complement statistical methods.  Theoretical approaches and the use of human reason are also necessary, although these must be informed by some types of data (quantitative or qualitative).

 

3. Reading journals and articles.  Academic articles, books, and journals tend to be more quantitative today than in earlier periods.  After taking a statistics class, you should be able to understand some of the statistical approaches used in academic articles, even if you do not understand all details of the statistical techniques employed by researchers.  Some techniques are very difficult to understand.  For example, regression, factor analysis, analysis of variance, or structural equation models are widely used but not always well understood, even by some who use these methods.  In order to understand some of these later methods, you will need to do further study of statistics.

 

4. Further study of statistics.  Social Studies 201 should provide all the basic statistical concepts and approaches so that you can continue the study of statistics in other courses on statistics.  Each discipline emphasizes a somewhat different set of techniques, and this is what upper level course in statistics examine.  For example, psychologists tend to emphasize the analysis of variance (ANOVA), t-tests, and factor analysis.  Economists tend to rely heavily on regression.

 

If you find you like statistics, and work with data, I would encourage you to take an upper level class in your discipline.  Many of you will need more statistics for your honours or graduate work, and others may find it useful in employment after completing a degree.

 

For Sociology majors, the next methodology classes are Social Studies 203 and 306 or 307.  Social Studies 203 introduces students to a variety of methods, not primarily statistical.  In Social Studies 306 uses SPSS, producing and analyzing survey data.  Social Studies 307 examines and applies qualitative methods.  There is a more advanced statistics course also, Social Studies 405/805, which the Department attempts to offer every other year.


Class outline

 

A. Descriptive statistics.  The first section of the class, chapters 1-5 of the text, deals with descriptive statistics. This is the most commonly used type of statistics, and the type that you will most commonly encounter.  Almost all the statistics published in newspapers and reports are descriptive in  nature, describing various phenomena.  Descriptive statistics includes tables, graphs, charts, maps, diagrams, etc. – hopefully helping the reader to develop an understanding of the social issue being described.

 

Chapter 1 of the text is an introduction to statistics and the text.  Read it quickly to get an overview.

 

In order to present data, it is necessary to consider how data are organized.  This is the aim of Chapters 2 and 3.  Chapter 2 examines issues related to the production of data, including discussion of assumptions involved in their production.  In this chapter, I attempt to identify some of the main issues that must be addressed by anyone who works with quantitative data or produces such data.   Do not spend a long time on this chapter, but attempt to be generally familiar with the issues raised in sections 2.3 through 2.6.  Section 2.7 uses the example of Statistics Canada’s Labour Force Survey as an example of a relatively successful and thorough approach to data production, albeit one that has some problems and shortcomings, as do any data.  This example is now outdated, but I will provide some more recent summary data on labour statistics.

 

Chapter 3 discusses how social constructs can be measured.  Phenomena such as length or weight are measured in well understood and well defined units such as metres or kilograms, respectively.  Social constructs such as attitudes, opinions, intelligence, ability, alienation, social solidarity, class consciousness, and ethnic or national identity are not so clear cut and are much more difficult to define and measure.  After some of the different approaches to measurement have been examined,  Chapter 4 deals with organizing these data for purposes of description.  A few of the ways of organizing data into charts, tables and graphs are discussed in Chapter 4.

 

Another way to present data is to calculate summary measures that succinctly describe the phenomena being examined.  It may be tedious to look at all the data describing a social issue or a population, or we may be overwhelmed by all the data about these.   In Chapter 5, summary measures of centrality and variation are discussed.   Examples include average income, expectation of life, median score on standardized tests; variation in incomes and income inequality, standard scores on tests.  Measures of central tendency and variation are the most widely used summary statistics in both popular and academic applications. 

 

If we keep to the time schedule of the Class Syllabus, the first midterm will be based on this section of the course and we should complete Part I of the text just before the first midterm.

 

B. Inferential statistics.  Part II of the text deals with inferential statistics and this occupies the remainder of the semester in Social Studies 201.

1. Probability.  As an introduction to the concepts required to study inferential statistics, Chapter 6 deals with probability and the normal distribution.  Probability is used for two main reasons in the social sciences.  One reason is that data may be obtained from some type of random or probability sample of population members, using surveys or experiments.  The randomness of sample selection means that probability can be used to obtain inferences about the social science issue or concern under investigation.  A typical example of how probability is used in assessing the reliability of sample results is the following from the polling agency Ipsos-Reid (available from Ipsos-Reid web site: http://www.ipsos-reid.com/media/dsp_displaypr_cdn.cfm?id_to_view=1058).

With a national sample of 1,000 and 1,500 (for each component), one can say with 95% certainty that the overall results are within a maximum of ± 3.1 percentage points of what they would have been had the entire population of Canada’s regular online users been surveyed. The margin of error will be larger for sub-groupings of the survey population.

Second, the social sciences use various models to describe or explain what occurs in the real world.  Some of these models are probabilistic in nature, and require some understanding of the principles of probability.  One example is insurance rates – high risk occupations pay higher life insurance rates, older people lower home insurance rates.

 

One of these models is the normal probability distribution – the so called bell curve.  Some researchers consider this distribution to describe characteristics of actual populations.  In particular, some instructors think that class grades should follow a normal or bell curve.  This is an application of a mathematical model.  The normal curve has many other applications in statistics, and learning to use this curve is essential to understanding statistics.  Whether this curve does describe the distribution of characteristics or behaviour of an actual population is another question – it may be that human populations are not so well-described by the normal distribution as some researchers and analysts claim. 

 

Our concern with probability is not with probability as a study in and of itself.  Rather, we will be mainly concerned with the principles of probability, and their applications in

inferential statistics.

 

2. Sampling distributions.  Chapter 7 is a transitional chapter, dealing with what are called sampling distributions.  These are mathematical distributions that are useful when conducting samples of a population or experiments in a population.  They describe how a particular measure behaves under repeated sampling.  For example, opinion polls such as the Gallup poll provide estimates of the proportion of people with a particular characteristic.  But another researcher selecting a different set of individuals in a sample would obtain a somewhat different estimate.  The potential variation in these proportions can be described mathematically through the sampling distribution of the proportion.  We will examine this in Chapter 7.

 

The third section of the class deals with techniques used in inferential statistics.  These techniques are discussed in detail in Chapters 7-10.  These chapters include what statisticians call hypothesis tests and estimation procedures.  The aim of these is to infer, from survey or experimental results, conclusions about a whole population.  Such conclusions always have probabilities attached to them.

 

3. Estimation.   Estimation is the method used in surveys and polls about opinions of members of a population, where a survey of a small group of people can be used to make inferences about the nature of opinion, attitudes, or other characteristics of a large population.  The probability of obtaining a result that is in error by no more than some specified amount can be calculated if the data come from a sample which has been randomly selected from the whole population.

 

4. Hypothesis testing begins by making an hypothesis, and then using a sample or experiment to test it.  The hypothesis may be that people higher on the income scale are more likely to support health clinics outside the medicare system, where a patient can get quick health services by paying for these services themselves.  The hypothesis might say state that those with lower incomes are more likely to  be opposed to such clinics and services, arguing that health services should be equal for all.  These hypotheses may be based on previous research findings and on our observations of what type of opinions and interests individuals of each income are likely to have.  If survey data are available about incomes and extra billing, then this hypothesis can be tested, using these data.  Hypotheses may be more complex, involving extensive theoretical and quantitative research.  In each case, principles of probability are used to determine the probability of the hypothesis being true or false.

 

There are many different types of hypothesis testing, but the principles involved in each type are much the same.  By the end of the semester you should have a good grasp of these methods.  

 

5. Regression and Correlation have been left out of this semester's outline.  The chapter on regression is also left out of Part II of the textbook – to make the book a bit shorter and less expensive.  If we do have time at the end of the semester, we will briefly examine the relationship between two variables by studying the methods of correlation and regression. 

 

Next day – Chapter 2, Production of data.

 

Last edited on September 10, 2004