Question Total Points Points Received Total 85

Name:_________________________________ Check one: Mon.-Wed. Section:_____ Tues.-Thurs. Section:_____ Statistics 102 Final Exam Dec. 21, 2000 4-6pm ...
Author: Arline Wheeler
28 downloads 0 Views 70KB Size
Name:_________________________________ Check one:

Mon.-Wed. Section:_____ Tues.-Thurs. Section:_____

Statistics 102 Final Exam Dec. 21, 2000 4-6pm

This exam is closed book. You may have three pages of notes. You may use a calculator. You may write in pen or pencil. Show all your work. Statistical Tables are attached. There are blank pages at the end of the exam if you need more room.

Question 1 2 3 4 5 6 7 8 9 10 Total

Total Points 10 14 10 3 12 5 6 7 8 10 85

Points Received

1

1) The following is a list of some of the statistical methods that you have learned about in this course. • One sample t-test. • One sample z-test. • Two sample t-test. • Two sample z-test. • Matched pairs t-test. • χ 2 test for independence • χ 2 goodness of fit test • One-way ANOVA • Two-way ANOVA • Randomized Block Design ANOVA • Simple linear regression • Multiple Linear Regression For each of the situations described below, state the technique (from the list above) that you believe is the most applicable. a) [2 points] A researcher for OSHA (Occupational Safety and Health Administration) wants to see whether cutbacks in enforcement of safety regulations coincided with an increase in work related accidents. For 20 industrial plants, she has the number of accidents in 1980 and 1995.

b) [2 points] An automobile manufacturer is trying to determine if 5 different types of bumpers differ in their reaction to low-speed collisions. An experiment was conducted where 40 bumpers of each of 5 different types were installed on midsize cars, which were then driven into a wall at 5 miles per hour. The cost of repairing the damage in each case was assessed.

c) [2 points] Is marital status related to health in the elderly? To answer this question, two hundred elderly people whose marital status is known (single, married, widowed, or divorced) are rated as to whether they are in good, fair, or poor health. Is there evidence of a relationship?

2

d) [2 points]A researcher wants to investigate how fertilizer affects soybean yield. She divides a farm into 30 one-acre plots. Each plot receives a different amount of fertilizer. Soybeans were then planted and the amount of soybeans harvested at the end of the season from each plot was recorded.

e) [2 points] An inspector for the Atlantic City Gaming Commission suspects that a particular blackjack dealer may be cheating when he deals at expensive tables. To test her belief, she observed 500 hands each at the $100-limit table and the $3,000-limit table. For each hand she recorded whether the dealer won or lost (ignoring ties).

3

2) State whether each statement below is True or False. [1 point each] Write out the entire word TRUE or FALSE (not just "T" or "F"). a) _______________ The P-value is defined as the probability that the null hypothesis ( H 0 ) is true. b) _______________ In a test of H 0 : µ1 = µ 2 versus H A : µ1 ≠ µ 2 , a test with n1 = 50 and n2 = 50 will be more powerful than a test with n1 = 25 and n2 = 75 , all other things being equal. c) _______________ A two-sample t-test of H 0 : µ1 = µ 2 versus H A : µ1 ≠ µ 2 will give the same conclusions as a 1-way ANOVA with 2 groups. d) _______________ The margin of error in a confidence interval is the width of the confidence interval. e) _______________ The margin of error in a confidence interval accounts for all potential sources of error in the estimate. f) _______________ If the correlation of X and Y is 0.80 and Y is regressed on X (in a simple linear regression with no other X-variables) then the coefficient, β! 1 , of X will always be positive and statistically significant. g) _______________ A correlation of zero between two quantitative variables implies that they are totally unrelated. h) _______________ In a regression analysis, deleting an observation with high leverage will always decrease R 2 . i) _______________ If the units of X are changed from miles to kilometers, then the value of the coefficient, β! 1 , (in a simple linear regression of Y on X), will change but the t-statistic for the test of H 0 : β1 = 0 and the regression R 2 will not change. j) _______________ The regression below would have a smaller R 2 if the outlier at Y=15 and X=15 were removed.

15

Y

10

5

0 0

5

10 X

Linear Fit

15

4

k) _______________The graph below shows a plot of the number of votes received by Buchanan in the 2000 Presidential election in all Florida counties (on the Yaxis) versus the number of votes received by Bush in each of the counties (on the X-axis). True or False: Palm Beach County is an outlier.

l) _______________ Again referring to the Buchanan/Bush Presidential vote graph above. True or False: In a regression of votes for Buchanan (Y) on votes for Bush (X), Palm Beach county is a high leverage point. m) _______________ When running a linear regression, high leverage points should always be deleted. n) _______________ When comparing two regression models, the one with the higher R 2 is not necessarily the better model.

5

3) [Each part is worth 1 point] a) Which of the following statements (i, ii, iii) is false? Circle iv if you think all three statements i, ii, iii are true. i) The width of a confidence interval estimate of the population mean narrows when the sample size increases. ii) The width of a confidence interval estimate of the population mean narrows when the value of the sample mean increases. iii) The width of a confidence interval estimate of the population mean widens when the confidence level increases. iv) All of the above of above statements are true. b) Which of the following statements (i, ii, iii) is false? Circle iv if you think all three statements i, ii, iii are true. i) The width of a confidence interval estimate of a population proportion narrows when the sample size increases. ii) The width of a confidence interval estimate of the population proportion narrows when the value of the sample proportion gets farther away from 0.5. iii) The width of a confidence interval estimate of the population proportion widens when the confidence level increases. iv) All of the above of above statements are true.

c) In a criminal trial, a Type I error is made when i) A guilty defendant is acquitted. ii) An innocent person is convicted. iii) A guilty defendant is convicted. iv) An innocent person is acquitted. d) In a criminal trial, a Type II error is made when i) A guilty defendant is acquitted. ii) An innocent person is convicted. iii) A guilty defendant is convicted. iv) An innocent person is acquitted. e) In order to determine the P-value, which of the following is not needed? i) The level of significance. ii) Whether the test is one or two tailed. iii) The value of the test statistic. iv) All of the above are needed.

6

f) Under which of the following circumstances is it impossible (given the statistical tools that we used in this class) to construct a confidence interval for the population mean? i) A non-normal population with a large sample and an unknown population variance. ii) A normal population with a large sample and a known population variance. iii) Non-normal population with a small sample and an unknown population variance. iv) A normal population with a small sample and an unknown population variance. v) We can construct confidence intervals in all of the above situations. g) The number of degrees of freedom associated with the matched pairs t-test, when 30 people have been matched into 15 pairs, is: i) 30 ii) 15 iii) 28 iv) 14 h) One-way ANOVA is applied to three independent samples having means 10, 13, and 18, respectively. If each observation in the third sample were increased by 30, the value of the F-statistic would: i) Increase ii) Decrease iii) Remain unchanged iv) Increase by 30 i) The F-statistic in a one-way ANOVA represents the variation: i) Between the treatments plus the variation within the treatments ii) Within the treatments minus the variation between the treatments iii) Between the treatments divided by the variation within the treatments iv) Variation within the treatments divided by the variation between the treatments. j) Which of the following is not a required condition for one-way ANOVA? i) The sample sizes must be equal in each group. ii) The populations (Y) must all be normally distributed. iii) The population variances (for Y) must be equal in each group. iv) The samples for each group must be selected randomly and independently.

7

4) Fill in the blanks and short answers: a) [1 point] The Z-value or t-value (whichever is appropriate) for a 96.6% confidence interval for a population proportion is _________________. b) [1 point] In a one-tail test, the P-value is found to be equal to 0.068. If the test had been two-tailed, the P-value would have been: ___________________ c) [1 point] Suppose you do a survey of 200 Stat102 students from last spring. You ask each person 100 questions (every person gets asked the same 100 questions). Assume that all responses are coded so that the responses are suitable to be used as X's in a multiple regression. Suppose that the goal of this survey is to come up with a model that can predict Y=final Stat102 grade (final Stat102 grade is the 101st question you ask). Suppose that none of these X's are correlated with final Stat102 grade (you asked useless questions) and that none of the X's are correlated with each other. There will be 100 β! 's in this model (excluding the intercept). How many of these β! 's do you expect to have P-values less than .05? ___________________

8

5) A controversial ballot, dubbed the "butterfly ballot" was used in the Presidential elections in Palm Beach County Florida in recent elections. Voters complained that the ballot was confusing, causing many Gore supporters to accidentally vote for Buchanan. A recent experiment reported in Nature (Dec. 2000, volume 408, 665-666) studied the potential confusion caused by this ballot design. In this study, 106 random shoppers in a mall were asked to cast mock votes in mock polling booths set up in a the mall. Each shopper was randomly given a butterfly ballot or a more traditional single column ballot. After they "voted", shoppers were asked several questions about the ballot including who they think they voted for and how confusing the ballot was (on a 7-point scale with high score indicating more confusion). a) The 53 shoppers who complete the butterfly ballot reported an average confusion rating of 3.52 (high scores indicate the ballot is judged to be more confusing) with a standard deviation of 1.95 and the 53 shoppers who completed the singlecolumn ballot reported an average confusion rating of 2.30 with a standard deviation of 1.63. Researchers want to test if one ballot is more confusing than the other: Ho: population mean confusion rating for butterfly ballot = population mean confusion rating for single-column ballot Ha: population mean confusion rating for butterfly ballot ≠ population mean confusion rating for single-column ballot i)

ii)

[2 points] State the value of the test statistic for this test:

[1 point] State the distribution of the test statistic (including degrees of freedom).

iii)

[1 point] State the P-value of the test.

iv)

[2 points] State the conclusion of the test.

9

b) [3 points] Out of all 106 shoppers , only 4 made mistakes. All of these mistakes occurred on the butterfly ballots. That is, 4 of the shoppers given the butterfly ballot accidentally voted for someone other than their intended candidate. Give a 95% confidence interval for the proportion of mistakes made on butterfly ballots. Show all your work.

c) [3 points] In this butterfly ballot experiment, estimate how many shoppers would have needed to received the butterfly ballot in order to obtain a 99% confidence interval for the proportion of mistakes made on butterfly ballots with a 1 percentage point margin of error? Show your work.

10

6) [5 points] EZ-Answers, an up-and-coming survey firm, wants to test how it can increase the response rate to its questionnaires. Believing that the inclusion of an incentive to respond may be important, it sends out 1,000 questionnaires: 200 promise to send respondents a summary of the survey results, 300 indicate that 20 respondents (selected by lottery) will be awarded gifts, and 500 are accompanied by no incentives. Of these, 80 questionnaires promising a summary, 100 questionnaires offering gifts by lottery, and 120 questionnaires offering no inducements are returned. What can you conclude from these results? (Use α =0.05). Justify your answer. Show all your work.

11

7) A linear regression analysis predicting the price of a diamond ring (in Singapore dollars) from the size of the diamond (in grams) from a sample of 48 rings is given below. A one carat diamond stone weights 0.2 grams. Bivariate Fit of price By grams 1100 1000 900 800 pric 700 e 600 500 400 300 200 .1

.15

.2

.25

.3

.35

grams Linear Fit

Linear Fit price = -259.6259 + 3721.0249 grams

Summary of Fit RSquare

????????

RSquare Adj

????????

Root Mean Square Error

31.84052

Mean of Response

500.0833

Observations (or Sum Wgts)

48

Analysis of Variance Source

DF Sum of Squares

Mean Square

F Ratio

2098596.0

2098596

2069.991

46

46635.7

1014

Prob > F

47

2145231.7

Model

1

Error C. Total

|t|

Intercept

-259.6259

17.31886

-14.99

F

399.86024 169.75976 569.62000

|t|

6.0611463 2.604023 -0.007806 0.066414 0.6034819 0.09656 -0.070246 0.05237

Estimate

2.33 -0.12 6.25 -1.34

0.0244 0.9069 F

0.0138 39.0600 1.7992

0.9069 F ?????

Parameter Estimates Term Intercept Time[2-1] Time[3-2] Time[4-3] Time[5-4] Time[6-5] Time[7-6] Time[8-7] Time[9-8] Time[10-9] Time[11-10] Time[12-11] Courier[Courier1 ]Courier[Courier2 ]

Estimate 66.666667 9.6666667 -11.33333 -9.333333 2.3333333 2.3333333 6 2 -8.333333 2.3333333 8.6666667 -2.333333 2.9444444 -0.055556

Std Error 2.73892 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 3.873418 1.118159 1.118159

t Ratio 24.34 2.50 -2.93 -2.41 0.60 0.60 1.55 0.52 -2.15 0.60 2.24 -0.60 2.63 -0.05

Prob>|t| F ????? ?????

Residual by Predicted Plot 10 De 5 l.ti me Re 0 sid ual -5 -10 50

55

60 65 70 Del.time P di t d

75

80

85

18

a) [4 points] Can we conclude at the 5% level of significance that there are differences in average delivery times among the three couriers? Justify your answer. Show all your work.

b) [4 points] Did the statistician choose a useful design? Justify your answer. Show all your work.

19

c) [2 points] Tukey-Kramer results are given below. Interpret these results.

20

Suggest Documents