Chapter 10 Notes: Hypothesis Tests for two Population Parameters (Tests involving data from Two Samples)

Chapter 10 Notes: Hypothesis Tests for two Population Parameters (Tests involving data from Two Samples) Chapter 8: Confidence Interval Estimates for ...
Author: Angelica Rice
3 downloads 1 Views 322KB Size
Chapter 10 Notes: Hypothesis Tests for two Population Parameters (Tests involving data from Two Samples) Chapter 8: Confidence Interval Estimates for an unknown population parameter  μ mean when population standard deviation is known  μ mean when population standard deviation is not known  p proportion Chapter 9: Testing a hypothesis about an unknown population parameter  μ mean when population standard deviation is known  μ mean when population standard deviation is not known  p proportion Chapters 8 and 9 investigate one parameter for one population Chapter 10 investigates two populations to compare how the values of a parameter in these populations are related to each other. Chapter 10: Testing a hypothesis about an unknown population parameter in two populations Characteristics of this type of test: Test of two means compares the population means for 2 populations for the same variable: Example: Comparing the average age of male and female community college students Test of two proportions compares population proportions for 2 populations for the same variable: Example: Comparing proportions of high school students and college students who take math classes. Inappropriate uses of this type of test:  Comparing the percent of males who like action movies to the percent of females who like romance movies (the variable is different for the two populations)  Comparing the percent of community college students who intend to transfer to the percent of community college students who already have degrees and are only attending college to get additional job training (one population, two variables) Look for Data from 2 Samples to recognize this type of hypothesis test Examples: a. We want to perform a hypothesis test to determine if the average length of time that boys and girls play sports daily is different. A table is given showing data for a sample of boys and a sample of girls.” Is this a test of two population averages? Why or why not? Write the hypotheses.

b. We want to perform a hypothesis test to determine if the average number of children in a family for Surf City is different from Ocean County. For Ocean County, the average number of children in a family is 2.5. In a sample of Surf City families, the average was 2.1 and the standard deviation was 0.3. Is this a test of two population averages? Why or why not? Write the hypotheses.

Chapter 10: Testing a hypothesis about an unknown population parameter in two populations Type of tests:  Test of two proportions  Test of two independent means – when both population standard deviations are known  Test of two independent means – when at least one population standard deviation is not known  Test of means for dependent (matched, paired) samples a. A hypothesis test is performed to determine if the proportion of patients undergoing a particular surgery who are cured is greater than 60%

b. A hypothesis test is performed to determine if the proportion of patients undergoing a particular surgery who are cured is greater than the proportion who are cured by the current non-surgical standard treatment.

c. A hypothesis test is performed to determine if the average time that a medication lasts is more than 3 hours.

d. A hypothesis test is performed to determine if the average times that medication A and medication B last are the same.

e. A hypothesis test is performed to determine if the average time to graduation is at Hudson University is 4 years.

f.

A hypothesis test is performed to determine if the average time to graduation is longer at a public university than at a private college.

g. A hypothesis test is performed to determine if taking an SAT training class increases SAT scores of those who take the training, on average.

h. A hypothesis test is performed to determine if recent female college graduates are subject to pay discrimination, earning less on average for similar work than recent male college graduates with similar qualifications.

Examples for Chapter 10: Hypothesis Tests comparing 2 unknown population parameters Hypothesis Tests involving data from TWO SAMPLES Some but not all these examples will be used as class lecture examples. Those examples with references noted for Introductory Statistics at OpenStax or for Collaborative Statistics by Illowsky and Dean at Connexions.org can be downloaded for free at https://openstaxcollege.org/textbooks/introductory-statistics or http://cnx.org/content/col10522/1.21/

Types of Tests: Two population proportions Two population means, independent samples, population standard deviation unknown Two population means, independent samples, population standard deviation known Two population means, dependent (matched, paired) samples Example 10.8 in OpenStax Introductory Statistics: Two medications for hives are being tested to determine if there is a difference in the percentage of adult patient reactions. 20 people in a random sample of 200 adults given medication A still had hives thirty minutes after taking the medication. 12 people in another random sample of 200 adults given medication B still had hives thirty minutes after taking the medication. At a 1% level of significance, is there a difference in the "non-response" rate for medication A and medication B? Example 10.1 in OpenStax Introductory Statistics: The average amount of time boys and girls ages 7 through 11 spend playing sports each day is believed to be the same. Data is collected for a random sample of boys and a random sample of girls, resulting in the data given below Sample Sample Average Hours Playing Sample Size Sports Per Day Standard Deviation Girls 9 2 0.866 Boys 16 3.2 1.00 Do the data show that there is a difference in the average amount of time that boys and girls ages 7 through 11 play sports each day? Perform a hypothesis test using a 5% level of significance. (This is Example 10.1 in the textbook; as of spring 2015 the Introductory Statistics textbook by OpenStax shows the sample standard deviation for girls as 0.866 which is an error. The square root symbol should not be there; the sample standard deviation for girls should be 0.866 as shown in the table above in these notes). Assume that the populations of times that individual boys play sports and that individual girls play sports are approximately normally distributed. Example 10.2 in OpenStax Introductory Statistics: A study is done of two colleges to see if students from College A graduate with more math classes on average than students who graduate from College B. Sample Sample Mean Sample Size number of math classes Standard Deviation College A 11 4 1.5 College B 9 3.5 1.0 Do the data show that graduates from College A have more math classes on average than graduates from College B? Perform a hypothesis test using a 1% level of significance. Assume that the populations of numbers of math classes for individual students at each college are approximately normally distributed.

Example 10.6 in OpenStax Introductory Statistics: The mean lasting times of two floor waxes is to be compared. 20 floors are randomly assigned to test each wax. The data are given in the following table Sample Mean time in months Population that the wax lasts Standard Deviation Wax 1 3 0.33 Wax 2 2.9 0.36 Do the data indicate that wax 1 is more effective than wax 2? Perform a hypothesis test using a 5% level of significance. Example 10.11 in OpenStax Introductory Statistics A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in the table. The "before" value is matched to an "after" value. Are the sensory pain measurements, on average, lower after hypnotism? Perform a hypothesis test using a 5% significance level. Subject A B C D E F G H Before 6.6 6.5 9 10.3 11.3 8.1 6.3 11.6 After 6.8 2.5 7.4 8.5 8.1 6.1 3.4 2 Example 10.12 in OpenStax Introductory Statistics A college football coach was interested in whether the college’s strength development class increased his players’ maximum lift (in pounds) on the bench press exercise. He asked 4 players to participate in a study. The amount of weight they could each lift was recorded before and after they took the strength development class. The coach wants to know if the strength development class makes his players stronger, on average. The data are: Weight (in pounds) Player 1 Player 2 Player 3 Player 4 Weight lifted before the class 205 241 338 368 Weight lifted after the class 295 252 330 360 Perform a hypothesis test to determine if the strength development class makes players stronger, on average.

Example E1: ( not in textbook) : A local in-home health care service has ten nurse's aides in the company that visit patients' homes to provide assistance. Under the old assignment system, appointments were scheduled on a first come first serve basis to fill available time. The company director decides to try a new scheduling system based on location. This table presents the number of visits before and after the new system is implemented on randomly selected days for each aide. Is there sufficient evidence to conclude that there is an average increase in the population number of visits per day made by the nurse's aides using the new schedule, as compared to the old schedule? Perform a hypothesis test using a 5% level of significance. Number of Visits/Day Old Schedule Number of Visits/Day New Schedule

Ana

Binh

Cyd

Dina

Ed

Fran

Greg

Hal

Ido

Juna

6

7

8

6

8

7

11

9

10

12

10

10

9

11

5

10

10

13

8

15

Example E2 (not in textbook): In a study of 15600 patients, patients were randomly assigned to a treatment group receiving the medication plavix or to a control group receiving aspirin. Assume that the 15, 600 patients were equally divided between the two groups. (data source: Mercury News 3/13/2006) In the treatment group, 6.8% suffered heart attack or stroke. In the control group, 7.3% suffered a heart attack or stroke. Perform a hypothesis test at the 2% level of significance to determine whether the treatment is effective at reducing the occurrence of heart attacks and strokes.

Example E3 (not in textbook): In 1998,the FDA approved the drug tamoxifen to prevent breast cancer in high risk women, stopping the study earlier than planned based on the strength of the data obtained thus far. According to data contained in an article in the San Jose Mercury News opinion column on 11/16/98, 13,175 women were randomly assigned to the treatment (tamoxifen) or control (placebo) groups. Of the 6,576 women in the tamoxifen group, 89 developed invasive breast cancer. Of the 6,599 women in the placebo group, 175 developed invasive breast cancer. Perform the appropriate hypothesis test to determine whether the sample data provides sufficient evidence that the incidence of invasive breast cancer is lower in the tamoxifen group than in the placebo group. Use a 1% level of significance. Example E4: ( not in textbook) “A/B” Testing: When internet users visit websites, the companies and organizations running the websites collect data about how customers use the website. One way in which they use this data is to test different appearances or formats of the website to determine which gets better responses; responses can be measured in the form of more time spent at the website, or more purchases made at the website, or more donations made at the website, or other metrics meaningful to the company or organization. This is called A/B testing and is commonly used in many varied contexts, such as trying to increase sales on shopping sites, increasing advertising viewership on sites that show advertising, or trying to increase donations to political campaigns made through candidates websites. The example below gives one view of how A/B testing may sometimes be conducted; however the sites that conduct A/B testing usually have very large data sets. To determine if there is a difference in the amount of time that people spend on a social networking website depending on the appearance of the site, the social networking company conducts a hypothesis test to compare the average time spent at the site for samples of users seeing one of two interfaces being tested. For 124 randomly selected users seeing interface A, the average time spent was 3 minutes with a standard deviation of 0.85 minutes. For 82 randomly selected users seeing interface B, the average time spent was 2.25 minutes with a standard deviation of 1.05 minutes. Conduct a hypothesis test of determine if there is a difference between the average times that users spend on the site when using interface A vs when using interface B. For this problem, assume that the populations of times spent at the site by individual users for interface A and interface B are approximately normally distributed. (Note that if this assumption were not true, then other statistical methods beyond the scope of Math 10 would be used to perform the testing.)

Example E5 (not in textbook): A frozen pizza manufacturer wants to determine whether the average time needed to cook its low fat pizza is less than for its regular pizza. For a sample of 15 low fat pizzas, the average cooking time was 14.8 minutes with a standard deviation of 2.3 minutes. For a sample of 15 regular pizzas, the average cooking time was 16.1 minutes with a standard deviation of 2.8 minutes. All pizzas were cooked in identical ovens at the same temperature. Can we conclude that the true average cooking time is less for low fat pizzas?

Example E6 (not in textbook): Strain A: Strain B: 24 9 13 13 26 27 15 12 22 15 14 21 10 9 8 28 10 18 18 15 33 23 21 25 30 12 20 18 26 29 23 7 16 8 17 10 14 14 9 15 30 18 24 16 25 28 7 6 19 15

A biologist is studying the average germination times of two strains of seeds to determine whether the two strains of seeds have the same mean germination time. The number of days until germination are given for a random sample of 25 seeds of strain A and a random sample of 25 seeds of strain B. All seeds are grown in identical greenhouse conditions. Assume that the underlying populations of germination times of individual plants is approximately normally distributed. At a 2% level of significance, is there sufficient evidence of a difference in mean germination times for the two strains of seeds? What type of hypothesis test is appropriate for this problem? Why?

Sample A B

Sample Mean 18.96 16.44

Sample Standard Deviation 7.35 6.98

Chapters 8, 9, 10: Summary of Intervals and Tests Chapter 8: Confidence Intervals Unknown Other Random Parameter Conditions Variable

µ µ

 known  NOT known

p

X X

Distribution used to calculate Interval

Point Estimate

N  x ,     n 

x

t n1

x

Error Bound

Confidence Interval

Z/2     

x  Z/2     

t/2  s   

x  t/2  s   

 n

 n

P'

 n

 n

p'

N  p' , p' q'  p'  Z/2 Z/2 p ' q '   n n   Note: Both notations p' and pˆ are used to represent the value of the sample proportion. Chapter 9: Unknown Parameter µ µ

Hypothesis Tests Other Random Conditions Variable  known X  NOT known

p

Distribution used to calculate pvalue

Point Estimate

Test Statistic

N   0 ,    

x

Z = x  0  n

X

t n1

x

P'

N  p ,  0 

t = x  0 s n Z = p ' p 0



n

p0 q0 n

p'

   

p' q' n

Calculator Test Z-Test T-Test 1PropZTest

p0 q0 n Note: Both notations p' and pˆ are used to represent the value of the sample proportion. Note: Symbols µ0, and p0 represent the numerical values in the null hypothesis

Chapter 10: Hypothesis Tests Comparing Two Population Parameters (using 2 sets of sample data) Unknown Other Conditions Random **Distribution used to Point Calculator Parameter Variable calculate pvalue Estimate Test Independent Samples 2SampZTest 2 2   µ1  µ2 1 AND 2 both known X 1 X 2 N  0,  1   2  x 1 x 2

 

Independent Samples µ1  µ2 µd p 1 p 2

1 OR 2 NOT known Dependent, Matched, Paired Samples Samples must be independent. (Matched samples for proportions require other tests not covered in Math 10)

X 1 X

X

n1

n2 

t with df as given by your calculator

2

d

P1  P2

t n1

xd

   N  0, pˆ C qˆ C  1  1      

2SampTTest

x 1 x 2

 n1

n2   

p1  p2

TTest 2PropZTest

where

x1  x 2 n1 pˆ 1  n2 pˆ 2  n1  n2 n1  n2 ˆ Note: Both notations p' and p are used to represent sample proportions. ** These distributions are based on the assumption that in Math 10 we are testing whether the difference in parameters for the two populations is 0 (i.e. whether the parameters are equal). pˆ C 

Suggest Documents