Chapter 19 - Sample Surveys PART

Chapter 19 - Sample Surveys PART VI : SAMPLING Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Chapter 19 - Sample Surveys 1 / 20...
Author: Jeffery Walsh
8 downloads 1 Views 610KB Size
Chapter 19 - Sample Surveys PART VI : SAMPLING Dr. Joseph Brennan Math 148, BU

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

1 / 20

Parameters & Statistics

Parameter: A numerical fact about a population. The average age of all eligible voters. The percentage of eligible voters registered to vote. The percentage of adults retired/employed/students. As many populations are too large to feasibly take a consensus, sampling is used to estimate a parameter.

Statistic: A numerical fact about a sample. A statistic is used to estimate a parameter. An investigator knows a statistic and wants to know a parameter.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

2 / 20

Bias

Recall that bias error is not random but affects the data in a single direction. When a selection procedure is biased, taking a larger sample does not lessen the bias. One is just repeating the same basic mistake on a larger scale.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

3 / 20

Example: Roosevelt and Landon

In 1936, Franklin Delano Roosevelt and Alfred Landon were in competition for the American presidency. The Literary Digest magazine predicted Roosevelt would poll at 43% on election day. The Literary Digest had accurately predicted the winner of every election since 1916. The 1936 prediction was based upon a sample of 2.4 million voters; an obscenely large opinion poll before the advent of the internet.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

4 / 20

Example: Roosevelt and Landon

Roosevelt won with a majority 62%. The Literary Digest was off by 19 percentage points! The Literary Digest folded after the 1936 election and is best-remembered today for the circumstances surrounding its demise. Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

5 / 20

Example: Roosevelt and Landon

The Literary Digest compiled their 2.4 million responses (todays typical political survey asks between 500 & 1000 respondents) from a total 10 million questionnaires. The response rate was low; 24%. The 10 million names for the questionnaires were pulled from telephone books and car registrations. In 1936 only 1 in 4 American households had telephones and far less owned a car. Remember that 1936 was the height of the Great Depression. The Literary Digest’s error can be partially attributed to sampling bias. During the depression, in an election split ideologically on economic policy, the Literary Digest’s questionnaire was biased against the poor.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

6 / 20

Bias Sampling Bias: A bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. The bias can lead to an over/underrepresentation of the corresponding parameter in the population. Almost every sample in practice is biased because it is practically impossible to ensure a perfectly random sample.

Non-response Bias: A bias that results when respondents differ in meaningful ways from nonrespondents. Nonresponse is often problem with mail surveys, where the response rate can be very low. With a response rate of 25%, the Literary Digest’s error can be partially attributed to nonresponse bias. Respondents and nonrespondents can differ in ways beyond their willingness to answer a questionnaire. Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

7 / 20

Quota Sampling Quota Sampling: A sampling method in which interviewers are assigned a fixed quota of subjects to interview. Example: An interviewer during the 1948 presidential election is sent to St. Louis to interview 13 subjects: 6 from the suburbs and 7 from the city. 7 men and 6 women. Of the 7 men: 3 are to be under 40 and 4 are to be over 40 years old. 1 is to be black and 6 are to be white. Of the 6 white men; concerning their rent: 1 pays more than $44. 3 pay between $18 and $44. 2 pay less than $18. Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

8 / 20

Example: Truman and Dewey In the 1948 presidential election, the three major polls sampled using quota sampling.

The polls gave Dewey a 5 percentage point lead over Truman; people were very confident of Dewey’s eventual victory.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

9 / 20

Quota Sampling

Quota Sampling is inherently flawed in that unintentional bias is introduced by the interviewers. The point of a quota sample is to find a sample which accurately represents the populations diversity subject to known parameters. The quotas are an indirect way to force a sampling distribution, though this control can be misguided. There are many factors that may influence the results of a sample besides the ones a survey organization controls for.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

10 / 20

Example: Truman and Dewey In the 1948 election, the Gallup Poll’s quota sampling had a Republican bias.

For a number of confounding reasons, interviewers chose too many Republicans. Within enough demographic groups, the Republicans were marginally easier to interview. Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

11 / 20

Probability Methods

Probability Methods: Sampling techniques which implements an objective chance process to choose subjects from the population, leaving no discretion to the interviewer. It is possible to compute the chance that any particular individual in the population will get into the sample.

Simple Random Sampling: A sampling technique where selection of individuals is equally likely and drawing for the sample is performed without replacement.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

12 / 20

Probability Methods National elections have a population (eligible voters) of approximately 200 million individuals. To frame an opinion poll where all individuals have an equally likely chance of being polled is not a simple task. Between 1952 and 1984, Gallup used a technique called Multistage Cluster Sampling:

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

13 / 20

Probability Methods

Gallup’s Multistage Cluster Sampling: Divide the country into four geographical regions, each having similar population sizes. In each region, group together all the population centers of similar size and take a random sample. For election purposes, population centers are divided into wards and wards may be further divided into precincts. Among the chosen population centers, take a random sample of wards and precincts. Among these samples, take a random sample of households.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

14 / 20

Probability Methods

Interviewers sent to individual households are not given discretion as to who they poll. Interviewers may be instructed to: ”Speak with the youngest male eligible voter, of if none are available, speak with the oldest female eligible voter.” This method of questioning is constructed like a quota sample: it is constructed so that the sample distribution and the population distribution are similar for a certain set of parameters. The difference between this probability method and quota sampling is the lack of discretion an interviewer is allowed.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

15 / 20

Gallup’s Probability Methods Using Probability Methods, Gallup has been able to reduce sample sizes and increase accuracy:

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

16 / 20

Gallup’s Probability Methods

The best probability methods minimize bias by being impartial and objective in choosing samples. Section 6 of this chapter has an in-depth analysis of the Gallup survey from the presidential election of 1984. The goal of the survey is to infer, from the sample, how the nation will vote. However, the population is much more complicated then just eligible voters, and should be amended to eligible voters who intend to vote.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

17 / 20

Gallup’s Probability Methods Gallup attempts to screen for nonvoters by having them answer questions: 1) How much thought have you given to the coming November elections? 2) Have you ever voted in this precinct or district? 3) Where do people who live in this neighborhood go to vote? 4) In the previous election did things come up which kept you from voting, or did you happen to vote? For whom? Since not voting can have a negative stigma, the above questions can screen for nonvoters without forcing people to admit that they do not attempt to vote. Question 3 is not multiple choice and asks for a specific location. Polling nonvoters will introduce a bias into your sample. The preferences of likely voters differ from preferences of likely nonvoters. Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

18 / 20

Gallup’s Probability Methods Even with personal interviews, many subjects are missed. Since they tend to be different from the subjects available for the interview a non-response bias is created. This bias is adjusted out by giving more weight to subjects who were available but hard to get.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

19 / 20

Estimating Parameters

When we infer a parameter (population) from a statistic (sample), our estimate has an error described: estimate = parameter + bias + chance error Chance error is sampling error and is natural and measurable for all samples, as they are only part of the whole population. Bias is non-sampling error as this area arises from the way the sample was drawn and is a systematic error.

Dr. Joseph Brennan (Math 148, BU)

Chapter 19 - Sample Surveys

20 / 20