Sociology 6Z03 Topic 1: Introduction

Sociology 6Z03 Topic 1: Introduction John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03Topic 1: Introduction Fal...
Author: Osborn Franklin
4 downloads 1 Views 374KB Size
Sociology 6Z03 Topic 1: Introduction John Fox McMaster University

Fall 2016

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

1 / 31

Sociology 6Z03Topic 1: Introduction

Fall 2016

2 / 31

Outline

Why study statistics? Lying with statistics. Statistical data.

John Fox (McMaster University)

Why Study Statistics?

“Thou shalt not sit with statisticians nor commit a social science.” – W. H. Auden (1907-1973)

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

3 / 31

Fall 2016

4 / 31

Why Study Statistics?

Thought Question Who was W. H. Auden? A The founding professor of the Department of Sociology at McMaster University. B A famous poet. C An early hip-hop artist. D A prime minister of Canada. E I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Why Study Statistics?

Auden notwithstanding, a great deal of interesting and important work in sociology — and in other social sciences, not to mention medical, biological, and natural sciences, and the popular media — employs statistical methods. This will be partly demonstrated by the illustrations that I employ during the course of the semester and in the illustrations in the text, and partly by your work in other courses.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

5 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You

The Challenger disaster (from Edward Tufte’s book, Visual Explanations, Cheshire CT: Graphics Press, 1997). Sometimes poor statistical data analysis is costly: On January 28, 1986, the U. S. space shuttle Challenger exploded shortly after blastoff, killing seven astronauts. See .

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

6 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You

The cause of the explosion was the failure of rubber O-rings sealing two sections of one of the booster rockets attached to the shuttle (B and C in the cross-section picture to the right). This failure, in turn, was caused by the low temperature at the time of launch which made the O-rings lose their elasticity.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

7 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You

On the day before the ill-fated launch, engineers at Morton Thiokol, the company that built the boosters, recommended that the launch be postponed because of the low forecast temperature for the following day. Using a graph essentially similar to the one on the next slide, officials at NASA and Thiokol examined data concerning O-ring damage that had occurred on previous launches; although some of these incidents were serious, none was disastrous. These officials determined that there was no convincing evidence linking O-ring failure to ambient temperature, and they decided to proceed with the launch. Thus, the Challenger disaster took place even though its cause was identified on the day before the accident.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

8 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You



8

26−29 degree range of forcasted temperatures for the launch of space−shuttle Challenger on January 28, 1986

4

O−ring damage index

12

Index of extent of O-ring damage by temperature (degrees F) at time of launch, for shuttle launches prior to the Challenger disaster in which O-ring damage occurred. (Source: Adapted from Tufte, Visual Explanations, page 45.)

● ●

● ●



70

75

0



25 John Fox (McMaster University)

30

35

40

45

50

55

60

65

80

Temperature (degrees field joints at time of launch Sociology Farenheit) 6Z03Topic 1:ofIntroduction

85 Fall 2016

9 / 31

Why Study Statistics?

Thought Question What does the graph show? A The extent of O-ring damage is not related to temperature. B There tends to be more O-ring damage at high temperatures than at low temperatures. C There tends to be more O-ring damage at low temperatures than at high temperatures. D The graph doesn’t have enough information to know whether O-ring damage is related to temperature. E I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

10 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You

The problem with the preceding graph is that it doesn’t include launches on which no O-ring damage occurred. An effective presentation of the data makes the relationship between O-ring damage and temperature clear.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

11 / 31

Why Study Statistics? The Challenger Disaster: Bad Statistical Analysis Can Kill You

12

Index of extent of O-ring damage by temperature (in degrees Farenheit) at time of launch, for all shuttle launches prior to the Challenger disaster. (Source: Redrawn and slightly adapted from Tufte, Visual Explanations, page 45.) ●

some damage no damage

8



4

O−ring damage index



26−29 degree range of forcasted temperatures for the launch of space−shuttle Challenger on January 28, 1986

● ●

● ●





0

● ● ● ● ● ● ● ●

25 John Fox (McMaster University)

30

35

40

45

50

55

60

65

70

● ●

● ● ●

75

● ●



80

Temperature (degrees field joints at time of launch Sociology Farenheit) 6Z03Topic 1:ofIntroduction

85 Fall 2016

12 / 31

Why Study Statistics?

Thought Question What does the graph show? A The extent of O-ring damage is not clearly related to temperature. B There tends to be more O-ring damage at high temperatures than at low temperatures. C There tends to be more O-ring damage at low temperatures than at high temperatures. D The graph doesn’t have enough information to know whether O-ring damage is related to temperature. E I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

13 / 31

Why Study Statistics?

To understand quantitative work in sociology (about half the field), as well as reports in the popular media (e.g., on the results of polls and social surveys), it is important to have a basic knowledge of statistical methods. Many occupations require individuals to produce and interpret statistical data, and to analyze data, typically with the help of a computer. See, e.g., . Statistical reasoning has its own logic and fundamental concepts. It is interesting. Methods of statistical data analysis and inference are among the most important intellectual products of the last century.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

14 / 31

Lying With Statistics “Lies, damned lies and statistics.” – attributed to Benjamin Disraeli (1804-1881) There is a common conception that it is particularly simple to “prove whatever you want to prove” using statistical data. I believe that statistics lends itself no more to deception than other forms of argument, and indeed careful analysis of data makes self-deception — if not deception of others — more difficult. In analyzing data, I am often struck by how hard it is to find what I expect to find, and how frequently I discover characteristics of data that I did not anticipate. I find it much easier to fool myself when I evaluate non-quantitative evidence.

Fooling others is, however, another matter, but the modes of statistical deception are essentially the same as in any other misleading presentation of “evidence.”

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

15 / 31

Lying With Statistics Kinds of Lies

Outright lies: At one extreme, you can make up or falsify data. Consider the following table, published by the British psychologist Sir Cyril Burt (1961), and purporting to show the relationship between the average IQ scores of children and adults in six “social classes”: Social Class Higher Professional Lower Professional Clerical Skilled Semiskilled Unskilled

John Fox (McMaster University)

Adults’ Mean IQ 139.7 130.6 115.9 108.2 97.8 84.9

Sociology 6Z03Topic 1: Introduction

Children’s MeanI Q 120.8 114.7 107.8 104.6 98.9 92.6

Fall 2016

16 / 31

Lying With Statistics Kinds of Lies

100 110 120 130 90

Children's Mean IQ

A graph of Burt’s “data” reveals the fraud:

Higher Professional ● Lower Professional



Clerical

● ● Skilled

● Semiskilled ● Unskilled

90

100 110 120 130 140 Adults' Mean IQ

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

17 / 31

Fall 2016

18 / 31

Lying With Statistics Kinds of Lies

Thought Question How can you tell from the graph that Burt’s data were “cooked”? A We can’t tell that the data were falsified just by looking at the graph. B It is well known that Cyril Burt committed scientific fraud. C Real data wouldn’t line up perfectly along a straight line. D I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Lying With Statistics Kinds of Lies

It is ironic that the quantitative character of the fraud facilitated its detection (see, e.g., Kamin, The Science and Politics of I. Q., 1974), but one can make up data more cleverly. Be skeptical when you evaluate a study (and not only of statistical results). The possibility of falsification of data is one of the reasons that replication of results is important in science.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

19 / 31

Lying With Statistics Kinds of Lies

Obfuscation: At another extreme, you can take advantage of the ignorance and insecurity of your audience, overwhelming them with details and procedures that they do not understand. It is especially confusing when different “experts” use the same data to support opposing conclusions. One shouldn’t expect magic from statistical studies: It is, for example, difficult to draw convincing causal evidence from observational data. Be critical of the design of statistical (and other) studies.

Partial Truth: And, of course, you can present accurate information partially and selectively to support a conclusion that is not supported by a more complete analysis of the available evidence. Be alert to apparent omissions in a presentation (again, not only of statistical evidence). When data are public, you can subject them to your own analysis John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

20 / 31

Statistical Data The Data Table

Statistical data are usually organized as tables in which the rows of the table represent units of observation (such as individuals or countries) and the columns represent variables or characteristics of the units (such as age, gender, and annual income for individuals, or area, type of political system, and per-capita income for countries). Part of an illustrative dataset is shown on the next slide. The full dataset includes 207 nations. The information in this table was collected around 1998. The data are ordered alphabetically, which is not terribly useful. As in many real datasets, some of the information in the data table is missing.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

21 / 31

Statistical Data The Data Table

Nation Afghanistan Bosnia Canada Chile China Congo Cuba Gaza Strip Israel Libya United Kingdom United States John Fox (McMaster University)

GDP Per Capita, $US 2,848 271 18,943 4,736 582 1,008 1,983 missing 16,738 5,498 18,913 26,037

Infant Mortality per 1000 154 13 6 13 38 90 9 37 7 56 6 7

Sociology 6Z03Topic 1: Introduction

Region Asia Europe Americas Americas Asia Africa Americas Africa Asia Africa Europe Americas Fall 2016

22 / 31

Statistical Data The Data Table

Thought Question In this data table: A The units of observation are nations and the variables include GDP per capita (average gross domenstic product per person, in US dollars), infant-mortality rate (number of infant deaths per 1,000 live births), and region. B The units of observation are GDP per capita, infant-mortality rate, and region, and the variables are the nations. C I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

23 / 31

Statistical Data Here is a graph showing the relationship between infant mortality and GDP per capita, with different symbols for the several regions of the world: Region

● Sierra.Leone

Liberia

150





Afghanistan



Infant Mortality Rate (per 1000)

● ● ● ● ● ●

100

50

● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● Gabon ●● ●●Myanmar ●● ● ● ● ● ● ● ● ● ● ● ● Libya ● ● ● ● ● ● ●

Africa America Asia Europe Oceania

Iraq

French.Guiana ●



Switzerland



0 0

10000

20000

30000

40000

GDP per Capia (US Dollars)

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

24 / 31

Statistical Data The Data Table

Thought Question What does this graph show? A As GDP per capita rises, infant mortality also tends to rise. B As GDP per capita rises, infant mortality tends to decline. C GDP per capita and infant mortality do not appear to be related. D I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

25 / 31

Statistical Data Kinds of Data

Two of the variables in the table — GDP per capita and infant-mortality — are quantitative; region, in contrast, is a qualitative, categorical variable. There are several sorts of quantitative data: Counts, such as the number of individuals residing in a country. Counts are non-negative integers (whole numbers). Amounts, such as GDP per capita. Amounts are also non-negative, but they need not be integers. Amounts are also called ratio variables, because it is meaningful to form ratios of two values (i.e., divide one value by another). As the table shows, for example, the per-capita GDP in Canada was US$18,943, while that in Chile was $4,736. Thus, Canada had a per-capita GDP that was 18,943/4,736, or 4.0 times, as large as that of Chile. The unit of a ratio variable (e.g., the dollar) is arbitrary, but the zero point of the scale (zero GDP per capita) is not. John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

26 / 31

Statistical Data Kinds of Data

Quantitative data (continued): Relative Frequencies, including proportions, percents, and rates. Proportions, percents, and many types of rates have both minimum and maximum values. The infant-mortality rate, for example, is defined as 1,000 ×

number of children dying in their first year number of live births

and has a minimum of 0 and a maximum of 1,000. Some types of rates, however, can exceed 1,000. For example, the total fertility rate is defined as the average number of children born to a group of 1,000 women surviving through their child-bearing years, and typically takes on values well in excess of 1,000.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

27 / 31

Statistical Data Kinds of Data

Quantitative data (continued): Interval Scales, which have both an arbitrary unit of measurement and an arbitrary zero point. A simple example is Celsius temperature, because 0 on the Celsius scale does not represent “no heat.” Here we can compare ratios of differences — intervals — of scale scores, but we cannot form ratios of the scores themselves. Thus, the temperature difference between 10 and 20 degrees Celsius is the same as between 30 and 40 degrees, but 40 is not twice as hot as 20. Some methods for constructing scales of attitudes, judgments, and abilities produce interval scales.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

28 / 31

Statistical Data Kinds of Data

There are two common types of categorical data: Qualitative or nominal variables (such as region in the data table) in which there is no intrinsic order to the categories. Ordinal variables, in which the categories have a natural order. For example, survey respondents are often asked about their degree of agreement with an attitude statement, recording their responses in several ordered categories, such as Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

29 / 31

Statistical Data Kinds of Data

Thought Question Consider the following two variables: (1) Gender (male and female) and (2) Education (in years). A Both variables are quantitative variables. B Both variables are categorical variables. C Gender is a quantitative variable and education is a categorical variable. D Gender is a categorical variable and education is a quantitative variable. E I don’t know.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

30 / 31

Statistical Data Kinds of Data

Important Point The methods of analysis that are appropriate to statistical data are partly dependent upon the nature of the variables. Different methods usually apply to qualitative variables, for example, than to quantitative variables.

John Fox (McMaster University)

Sociology 6Z03Topic 1: Introduction

Fall 2016

31 / 31