Chapter 12: Sample Surveys

Chapter 12: Sample Surveys Census? Remember that a census is a data collection method where the entire population is measured. Also remember that in m...
Author: Guest
47 downloads 0 Views 118KB Size
Chapter 12: Sample Surveys Census? Remember that a census is a data collection method where the entire population is measured. Also remember that in most scenarios, this is impossible. We must gather data some other way.

Not A Census. In Chapter 11, we discussed using simulations. There are two other ways to gather data (which we will then run through our battery of statistical analyses)—we can gather data without any attempt to influence outcomes, or we can gather data while trying to influence some (or all) of the outcomes. When there is no deliberate attempt to influence any observations, we are conducting an observational study (we’re just observing). In this case, a primary concern is obtaining a sample from the population. When we attempt to influence some (or all) observations, then we are conducting an experiment. In this case, we usually do not have a sample—the primary concern will be the design of the experiment. In this chapter, we’ll talk about observational studies—thus, we’ll talk mainly about methods of sampling. We’ll talk about the design of experiments in Chapter 13. The textbook makes a distinction between sample surveys and observational studies (in fact, they are listed as separate items in the AP Statistics course outline), but I can’t see any relevant distinction. I’ll talk about both of these as if they are the same thing.

Sampling As it turns out, the method used to collect a sample is very important. Poor sampling techniques will ruin any attempt at analysis!

Big Ideas Most of what we do can be thought of as starting with a question…

Population …and this question is about some group of individuals (people, cars, candies, etc.) that is so large that measuring all of them is either really difficult or impossible. This large group—the group about whom you have some question—is called the population.

Sampling Frame Since you can’t measure every individual in the population, you’ll need to select a representative sample instead. Often, the method use to select a sample will not actually be able to reach every individual. The part of that population that you can get is called the sampling frame. You want the sampling frame to be as large as possible—the same as the population, if you can.

HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 1 OF 6

Sample The sample is the group of individuals that are actually selected and measured. Ultimately, we will use information about the sample to say something interesting (and mathematically defensible) about the population. Perhaps we’ll even get some sort of answer to the question that started the whole process…

Examples [1.] A state politician wants to know how mothers of young children might feel about an upcoming piece of legislation. His staff obtains a list of all live births from the past year registered in the county he represents, then randomly selects 100 of the mothers to be contacted an interviewed. What are the population, sampling frame and sample in this situation? The population would be all mothers of young children in his county. The sampling frame would be those mothers who registered a live birth in the county. Note that this is not the same as the population! A mother might have moved to the county without having given birth in the county, for example. The sample is the group of mothers that are actually contacted and interviewed. [2.] The director of the State Fair wants to know what patrons think about a proposal to include different types of food during the event. On the two days that are historically the most attended, the manager asks every 20th patron about the new food proposal. What are the population, sampling frame and sample in this situation? The population is all patrons of the State Fair. The sampling frame is those patrons who attended on the two busiest days. The sample is the group of patrons that are actually interviewed.

Good Methods of Sampling We need for our sample to be representative of the population—looking at this small part needs to feel a lot like looking at the large part (like fractals). There are many ways to do this— correctly and incorrectly. Let’s look at some good ways first.

Simple Random Sample The best (theoretically, at least) kind of sample is the Simple Random Sample. A Simple Random Sample (SRS; of size n) is chosen in such a way that every possible group (of size n) from the population has an equal chance of being selected. The best example to get the idea of an SRS is drawing names out of a hat (or slips of paper out of a box…or something equivalent). Since every group has an equal chance of being selected, every individual also has an equal chance. This is not reversible! If every individual has an equal chance, there is no guarantee that every group has an equal chance. Theoretically, the simple random sample gives you the best chance of obtaining a sample that is representative of the population. Also, this type of sample makes probability calculations fairly simple. In fact, all of the inference procedures that you’ll learn in this course are based on the simple random sample. HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 2 OF 6

You must be able to use a Table of Random Digits in order to select an SRS. Naturally, this means that you must first describe how you are going to use the table, then show your implementation of your method. This is very similar to using a ToRD for simulation! I assume you’ve read the Chapter 11 notes, so I won’t repeat any specific instructions on how to use a Table of Random Digits. Example [3.] The quality control manager at a tire plant needs to randomly select ten tires from the day’s production. Assuming that 1000 tires were made during the day, describe how to select a random sample of those tires using a Table of Random Digits. Assign each tire a number 000 through 999—in particular, the first tire of the day is 000 and the last tire of the day is 999. Go to the ToRD and read three digits. If those three digits represent a tire that has not yet been accepted, then accept those digits—otherwise, discard them (repeated digits are not allowed). Continue reading digits until 10 unique tires have been selected.

Stratified Random Sample To take a Stratified Sample, first divide the population into homogenous groups (strata). Now take an SRS from each group. Combine these individual SRS’s into a single sample from the population. For example, I could divide the student body into Freshmen, Sophomores, Juniors and Seniors; then take an SRS from each group; and finally combine these to make a single sample. The point of this type is to ensure that known subgroups of the population are represented in the sample. We won’t talk about why you would use this version instead of a simple random sample—that’ll come if you take any more statistics courses at the University level. The textbook makes a nice example using a Boston Creme Pie. A stratified sample of such a pie would involve taking a bit out of each layer. You can read the entire example on page 276 of the textbook. Using a ToRD for a stratified sample requires a little more effort—in particular, you must divide your digits groups into strata, then take a sample from each stratum. Example [4.] An apartment manager wants to survey residents about noise in the apartments, but she believes that people who live on the second floor will have very different opinions than those who live on the first floor. There are five buildings with eight apartments each—four on the first floor and four on the second floor. At the moment, all of the apartments are occupied. Describe how the manager could select a stratified random sample of eight apartments. Assign the first floor apartments numbers between 01 and 20. Read two digits from the ToRD. If the digits are between 01 and 20, and they represent an apartment that hasn’t yet been selected, then accept them—otherwise, reject them (no repeats are allowed). Continue reading digits until four apartments have been selected. Now assign the second floor apartments numbers between 01 and 20. Read two digits from the ToRD. If the digits are between 01 and 20, and they represent an apartment that hasn’t yet been selected, then accept them—otherwise, reject them (repeats are still not allowed). Continue reading digits until four more apartments have been selected. HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 3 OF 6

Cluster Sampling Cluster Samples are similar to stratified samples in that there are strata in the population. The difference is that a cluster sample uses naturally occurring groups that are heterogeneous (“all mixed up”). The idea is that each group (each cluster) is individually representative of the population—so we take a sample of those clusters (to help deal with random variation in the various clusters). In the textbook’s pie example (page 276), a cluster sample is a single vertical slice of the pie. Cluster sampling is a good way to get a representative slice of the population—assuming, of course, that there isn’t some other reason why the individuals are clustered together… Example [5.] To continue the apartment example…now describe how a cluster sample of 16 apartments could be selected. It seems to me that one building would make a good cluster! Assign the buildings numbers between 1 and 5. Read one digit from the ToRD. If the digit is between 1 and 5 and it represents a building that hasn’t yet been picked, accept it—otherwise, reject it (no repeats are allowed). Continue this until two buildings are accepted.

Systematic Random Sampling In a Systematic Random Sample, there is some “stream” of individuals from which to select. The method involves selecting every kth individual (every 10th, or every 50th, for example). When using a ToRD for this, the big idea is that only the first individual is selected randomly. After that, the system kicks in. One common student mistake is to think that the system is used to select every kth set of random digits—this is, of course, crazy. This method is useful in a lot of cases—people entering a sporting event, bottles rolling off of an assembly line… Example [6.] Back to the tire example—describe how to use systematic random sampling to select 10 of the 1000 tires. Use the same numbering scheme as before. Read three digits from the ToRD. If those digits are between 000 and 099, accept them—otherwise, reject them. Keep reading until one group is accepted—let k be the number that is accepted. The other tires that will be accepted are k + 100, k + 200...k + 900 .

Multistage Sampling “Layers of randomization.” Or, “strata within strata.” This is, actually, a widely used technique. Fortunately for you, it isn’t really part of the AP curriculum. Read about it in your textbook.

Bad Methods of Sampling There can be no light without the dark…there can be no good without the bad. Let’s look at two bad sampling methods. HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 4 OF 6

Convenience Sampling When the subjects are chosen based on ease of access, then you have a Convenience Sample. For example, if I want to know about the opinions of High School students, and I decide to sample from students in my classes only, then I’ve chosen my sample from those that are easiest to reach—that’s a convenience sample. It’s a pretty good bet that convenience samples are not representative of the population.

Voluntary Response Sampling When the subjects select themselves for the sample, then you have a Voluntary Response Sample. For example, call-in polls use voluntary response (since the caller decides to call in, and become part of the sample). Voluntary Response is bad—there is often some reason why the subjects join the sample, and this reason may have some effect on what you are trying to measure.

Problems with Measurement Bias Bias is something that creates results that are different from what they ought to be—in other words, something that systematically favors certain outcomes/measurements over others. I’m going to categorize boas into three basic types: selection bias, measurement/response bias, nonresponse bias. Selection Bias Selection bias is introduced during the selection process—certain individuals are given greater (than intended) probabilities of being selected, or are excluded from the selection process. Failing to include all individuals in the selection process is often called undercoverage. Notice that a voluntary response sample (or even a convenience sample) automatically suffers from selection bias! Using a good sampling method should minimize any selection bias. Measurement Bias Measurement bias is introduced when the measurement process tends to give results that differ (systematically) from the population. For example, if a light meter is not properly calibrated, then the measurements it gives will not be correct! A common source of measurement bias is wording bias—the way in which a question is worded can often have an effect on the responses. Some people use the terms measurement bias and response bias synonymously, but they don’t quite mean the same thing. Measurement bias refers exclusively to problems with the measurement device, where response bias refers to anything that might affect the measured results—for example, you are likely to get very different answers to a survey if the person conducting the survey is wearing a Darth Vader costume! Nonresponse Bias Nonresponse bias is introduced when individuals (people, mostly!) refuse to be measured/refuse to answer questions. Telephone surveys suffer from this! This type of bias is almost unavoidable, so minimizing its effects is important. There are a host of methods (beyond the scope of this course) to address this. HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 5 OF 6

And Beyond… That isn’t all there is. The terms that I’ve used aren’t the only ones used for those ideas, and there are other ideas that I haven’t mentioned. What is important is the ability to recognize the presence and potential effect of bias. Examples [7.] A researcher wants to understand people’s opinions about a proposed business park. The reporter goes to a public hearing (where many people voice opposition to the project) and randomly selects ten people to interview. How is this plan biased? This is an example of selection bias. The reporter’s results will probably show more people opposed to the project than if a representative sample were chosen. [8.] A survey by an environmental group asked a sample of people “Do you believe that a new landfill for trash should be constructed, or should the natural beauty of the area be preserved?” How is this plan biased? This is an example of measurement (wording) bias. The results will be less favorable than if a representative sample were selected.

HOLLOMAN’S AP STATISTICS

BVD CHAPTER 12, PAGE 6 OF 6