Preliminary Chapter. Statistics:

Preliminary Chapter      Statistics: Chapters 1-4: Tools and strategies for organizing, describing, analyzing data Chapter 5: How to produce...
Author: Tyler Scott
6 downloads 0 Views 1MB Size
Preliminary Chapter 



 



Statistics:

Chapters 1-4: Tools and strategies for organizing, describing, analyzing data Chapter 5: How to produce data Chapter 6-9: Probability: The study of chance behavior Chapter 10-15: Testing Claims/Computing estimates

Ask the following questions 1. What ____________ do the data describe? __________________ individuals are there? 2. How many ___________? Defs of these ______________? What ____________? 3. ______________ the data was gathered? (For a ____________ or ______________________)? Every set of data comes with background information to help us understand the data!

Data Production Where can we find good data? Library Internet www.nces.ed.gov (Nat’nl Center for Education Statistics website) www.fedstats.gov (good source for projects)  Statistical offices of foreign countries (www.statcan.ca, www.inegi.gob.mx )

Get GOOD Data! No amount of...

Is this good data? 

Suppose you want to find out if your classmates prefer cheeseburgers from McDonald's or Burger King. You decide to ask 50 people under the age of 20 which fast-food they prefer. In order to save time and energy, you conduct your survey at the McDonald’s closest to campus. Is there a problem with this?

Definitions: 

Available data are:



In an observational study, we...



In an experiment, we...

Examples: Does drinking at least five carbonated sodas a week improve a student’s GPA? 

Observational:



Experiment:

Good and bad survey results In 1976, Shere Hite published The Hite Report on Female Sexuality, Seven Stories Press, Ny, Ny 2004. The conclusions reported in her book were based on 3,000 returned surveys from 100,000 surveys distributed by women’s groups. The results were that women were highly critical of men. In what way might the author’s findings have been biased?

More definitions... 

Individuals are...



A variable is...  



A categorical variable is... A quantitative variable is...

The distribution of a variable tells us...

W5HW        

W5HW: Who, What, Why, Where, When, How, by Whom? Who – What – Why – Where – When – How – By Whom –

Describes public education in the USA State

Region Pop

SAT verbal

SAT math

% taking

% No HS

Teacher pay ($1000)

CA

PAC

35894

499

519

54

18.9

54.3

CO

MTN

4601

551

553

27

11.3

40.7

CT

NE

3504

512

514

84

12.5

53.6

More Defs! 



Exploratory Data Analysis:

2 steps 1) 2) DON’T FORGET YOUR...





Distribution of a Variable:

The pattern of a Variable:

Do you wear your seat belt? Region NE

% Wearing belts 2003 74

% Wearing belts 1998 66.4

MW

75

63.6

South

80

78.9

West

84

80.8

Compare – Bar Graphs

Dotplots Number of goals scored by the US women’s soccer team in 34 games played in the 2004 season are: 3027824351145311333212 224356155115 What does this tell us about the performance of the US women’s team in 2004?

TI-83: 1 Var Stats L1

Exploring Relationships between variables (P.17) Air travelers would like their flights to arrive on time. Airlines collect data about on-time arrivals and report them to the department of Transportation. Here’s one month’s data for flights from several western cities for two airlines:

On time

Delayed

3274

501

American West 6438

787

Alaska Air

Simpson’s Paradox An association or comparison that holds for all of several groups can...

Probability

Statistical Inference  



Population values (_____________) are fixed Sample values (______________) vary from sample to sample. A sample value will not give us precise information about a population parameter (but if _____________________, it will provide us with ______________________on a parameter).

How unlikely must an event be before we conclude that it isn’t due to chance? 25%? 10% 1%? 0.01? Our willingness to declare an event “unlikely” is usually based on…. Communication in Statistics:

Case Closed! P. 26 (groups)

Population We are almost always interested in knowledge about a population. We would have little interest in samples if we could always ask everyone what they think about any particular issue – that is, we would conduct a ___________. The reality is that we can’t, so we need to get a sample that is _______________________ of the population (i.e., shares characteristics of the population!)

Think about it… • If you were to take a blood test, wouldn’t you rather have a doctor take a random sample of your blood rather than take all of it?

Simple Random Sample All possible samples of a _______________ must be __________________in order for the sample to be an SRS. Example: 1) Do we get an SRS from a class of 36 when we pick six names out of a hat? 2) Do we get an SRS if we only put rows 1-6 in a hat and pick out a row?

Examples 5.2, 5.3, and 5.4 are…

Technology Tip Example 5.5 demonstrates the use of the random number table to select an SRS of size 5 from a population of size 30. To do the same thing on your calcuator, select MATH PRB randInt(1,30) and press ENTER five times to get your sample. Ignore repeats. Or: randInt(1,30,5) (works fine as long as there are no repeats).

Sampling Designs • Stratified random sampling: • Cluster Sampling: • Multistage Sampling: …all involve the use of chance to minimize bias in the sample. Note: This doesn’t eliminate bias – it just controls for systematic bias.

Undercoverage vs. Nonresponse If I’m a prospective survey respondent, 1)________________________ occurs when the group I’m a part of is left out of the sample somehow. 2)________________ occurs when they can’t find me or I refuse to answer. In #1, my group isn’t even in the sampling frame. In #2, I don’t respond.

Sampling Techniques • Bad sampling techniques that do not produce good data: • Good sampling techniques that do produce good data: ______________________; it is a sample chosen by ____________. We must know what samples are possible and what probability each possible sample has.

Types of bias Never say “biased” – Name the bias! • Response Bias: • Wording of Questions: • Undercoverage: • Nonresponse:

1.

In late 1995, a Gallup survey reported that Americans approved sending troops to Bosnia by a 46 to 40 percent margin (46% said yes and 40% said no). The poll did not mention that 20,000 U.S. troops were committed to go. A CBS News poll mentioned the 20,000 figure and got the opposite outcome -- a 58 to 33 percent disapproval rate. Identify the bias(s). 2. A church group interested in promoting volunteerism in a community chooses an SRS of 200 community addresses and sends members to visit these addresses during weekday working hours and inquire about the residents attitude toward volunteer work. Sixty percent of all respondents say that they would be willing to donate at least an hour a week to some volunteer organization. Bias is present in this sample design. Identify any and all types of bias involved and state whether you think the sample proportion obtained is higher or lower than the true population proportion. 3. A university’s financial aid office wants to know how much it can expect students to earn from summer employment. This information will be used to set the level of financial aid. The population contains 3478 students who have completed at least one year of study but have not yet graduate. A questionnaire will be sent to an SRS of 100 of these students, drawn from an alphabetized list. a) Describe how you will label the students in order to select the sample. b) Use Table B, beginning at line 105 to select the first 5 students in the sample.

Principles of Experimental Design • The basic principles of statistical design of experiments are: 1. 2. 3.

• Experimental units: • Subjects: • Treatment:

Example 5.14

Example 5.18, p. 360

• Statistical significance is:

Example 5.19, p. 362

Blocking • A _________ is a group of subjects that are similar in some way that is expected to affect the response to the treatments. • In a ________________, the random assignment of units to treatments is carried out separately within each block.

More on blocking • Unless the link is obvious, you must justify the reason for blocking on any variable! • The difference between randomization and blocking:  You __________ to control for the variables you know about that might influence the response  You ______________ to control for the variables you do NOT know about. Note: Blocks allow us to draw separate conclusions about each block.

In a double-blind experiment:

• A way to control the placebo effect! • Avoids unconscious bias

Weakness of experiments:

Matched Pairs Experiment - options 1) Matching individuals based on a set of characteristics such as gender, age, income, level of education, and religion, then placing them into __________ and _____________ groups. The assumption is that the two groups should be similar in terms of the factors on which they were matched. 2) Measure the same individuals at ____ different times.

Example 1 A medical study of heart surgery investigates the effect of a drug called a beta-blocker on the pulse rate of the patient during surgery. The pulse rate will be measured at a specific point during the operation. The investigators will use as subjects 20 patients facing heart surgery. You have a list of these patients, numbered 1 to 20, in alphabetical order. a) Outline a well-designed experiment for this (list and label everything)

b) Use the random digit table starting at line 125 to carry out the randomization required by your design and list the results.

Example 2 You’re interested in whether white Americans’ attitudes about prejudice towards African-Americans change after they learn more about the history of African-Americans. You identify a diverse sample of whites that contains men and women of varying ages, incomes, religions, and levels of education. You decide to use a documentary as a way of teaching individuals more about the history of the African-American experience. How would you use each type of matched pairs experiment to examine your hypothesis on attitudes about prejudice? Design with a partner and share out.

Example 3 Professor Hampton conducts an experiment on the possible effects of room temperature on the speed at which a machine produces rubber snakes. He identifies 100 machines and sets up the following design: 50 machines are placed in a “warm” room (85 degrees) 30 of the machines in the “warm” room are placed near the windows 20 of the machines in the “warm room are placed away from the windows 50 machines are placed in a “cold” room (55 degrees) 15 of the machines in the “cold” room are placed near the windows 35 of the machines in the “cold” room are placed away from the windows a) What are the factors in this experiment? What are the levels of each of these factors? b) How many treatment conditions are there in this experiment? What are the treatment conditions? How many machines are in each treatment condition? c)

What is the response variable? How would you measure it?