UNIVERSITY OF SASKATCHEWAN

UNIVERSITY OF SASKATCHEWAN Department of Mathematics and Statistics STATISTICS 245 FINAL EXAMINATION April 22, 1995 Instructor: M.J. Miket Time: 3 h...
25 downloads 1 Views 76KB Size
UNIVERSITY OF SASKATCHEWAN Department of Mathematics and Statistics STATISTICS 245 FINAL EXAMINATION April 22, 1995 Instructor:

M.J. Miket

Time: 3 hrs A textbook, formulae sheets and a calculator are allowed Note: NoT shall stand for ’none of these’

I. For the following sample of measurements, given as a stem-and-leaf display, 0 1 2 3

69 117 04 1

Select the correct value of the statistic in questions (1) - (4). (a) 11

(b) 17

(c) 25

(d) 8.493

(e) 8.333

(f ) 14

(g) 10

(h) 16.125

(i) 72.125

(j) 8.110

1. Mean 2. Median 3. Q1 4. Standard deviation.

II. Given that E and F are independent events with probabilities P (E) = .2 and P (F ) = .4, determine: 5. P (E|F ) 6. P (F |E) 7. P (F



E) 1

8. P (F



E)

Choices for questions (5) - (8) are: (a) 0.08

(b) 0.14

(c) 0.26

(d) 0.33

(e) 0.20

(f ) 0.42

(g) 0.52

(h) 0.64

(i) 0.77

(j) 0.40

III. A completely unprepared student decides to guess the answer to each of the 50 questions on a STATS 245 supplementary exam. 9. The first 25 questions are of a true-false type. Find the probability that the student will pass this half of the exam (for a pass 60% of questions must be answered correctly). 10. The last 25 questions are of a multiple choice type and each question has five possible responses. For a pass, 50% of these questions must be answered correctly. Find the probability of passing this half. Choices for questions (9) - (10) are: (a) 0.000

(b) 0.115

(c) 0.120

(d) 0.008

(e) 0.115

(f ) 0.146

(g) 0.054

(h) 0.002

(i) 0.212

(j) NoT

IV. In the Framingham Study, serum cholesterol levels were measured for a large number of healthy males. The population was then followed for 16 years. At the end of this time, the men were divided into two groups - those who had developed coronary heart disease and those who had not. The distributions of the initial serum cholesterol levels for each group were found to be approximately normal. Among individuals who eventually developed coronary heart disease, the mean serum cholesterol level was µd = 244 mg/100ml, and the standard deviation was σd = 51 mg/100ml; for those who did not develop the disease, the mean serum cholesterol level was µnd = 219 mg/100ml, and the standard deviation was σnd = 41 mg/100ml. 11. Suppose that an initial serum cholesterol level of 260 mg/100ml or higher is used to predict future coronary heart disease. What is the probability of predicting the disease for a man who will never develop it (”false positive error”)? 12. What is the probability of failing to predict coronary heart disease for a man who will develop it (”false negative error”)? 13. If repeated samples of size 10 are selected from the population of males that do not develop the coronary heart disease, what proportion of the samples will have a mean serum cholesterol level greater than 260 mg/100 ml? Choices for questions (11) - (13) are (a) 0.0008

(b) 0.1587

(c) 0.2821

(d) 0.3148

(e) 0.6217

(f ) 0.6852

(g) 0.7179

(h) 0.8531

(i) 0.9664

(j) NoT

2

V. Questions (14) - (29) are based on the following scenarios. You might find it useful to consider questions (14) - (18) as you read the scenarios.

• Scenario 1: Returning to the Framingham Study, it is believed that the mean serum cholesterol level of the men who do not develop heart disease must be less than the mean level of men who do. A sample of size 15 from the population of men who do not go on to develop coronary heart disease shows x = 219 mg/100ml and s = 41 mg/100ml. Can it be concluded that the true population mean for this group of men is 244 mg/100ml at the α = .05 level of significance?

• Scenario 2: A STATS 245 final examination, when set up by a certain instructor, consists of 75 multiple choice questions; each question with five possible responses. You want to establish that Alexander Joseph performs better on the exam than a person who guesses on every question. If Alexander Joseph obtains 22 correct, what is your conclusion at α = .05?

• Scenario 3: A commercial farmer harvests his entire field of beans at one time. Therefore he would like to plant a variety of green beans that mature all at one time (i.e. small standard deviation between maturity times of individual plants). A seed company has developed a new hybrid strain of green beans that it believes to be better for the commercial farmer. The maturity time of the standard variety has a mean of 50 days and a standard deviation of 2.1 days. A random sample of 30 plants of the new hybrid showed a standard deviation of 1.65 days. Is the new variety better at the 0.05 level of significance.

• Scenario 4: It would be interesting to determine whether the advice given by a physician during a routine physical examination is effective in encouraging patients to stop smoking. In a study of current smokers, one group of patients was given a brief talk about the hazards of smoking and was encouraged to quit. A second group received no advice pertaining to smoking. All patients were given a follow-up exam. In a sample of 114 patients who had received the advice, 11 reported that they had quit smoking; in a sample of 96 patients who had not, 7 quit smoking.

• Scenario 5: A study was conducted to investigate whether oat bran cereal helps to lower serum cholesterol in hypercholesterolemic males. A random sample of such individuals were placed on a diet which included either oat bran or corn flakes; after two weeks, their low-density lipoprotein (LDL) cholesterol levels were recorded. Each man was then switched to the alternative diet. After a second two week period, the LDL cholesterol level of each individual was again recorded. The data from this study are provided below. (population 1=corn flakes, population 2=oat bran) 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14

 x  2

x

corn flakes 4.61 6.42 5.40 4.54 3.98 3.82 5.01 4.34 3.80 4.56 5.35 3.89 2.25 4.24 62.21 288.64

oat bran 3.84 5.57 5.85 4.80 3.68 2.96 4.41 3.72 3.49 3.84 5.26 3.73 1.84 4.14 57.13 247.65

difference 0.77 0.85 -0.45 -0.26 0.30 0.86 0.60 0.62 0.31 0.72 0.09 0.16 0.41 0.10 5.08 3.99

• For the five scenarios described above, choose the appropriate hypotheses to be tested from the list given below: 14. Hypotheses to be tested for Scenario 1. 15. Hypotheses to be tested for Scenario 2. 16. Hypotheses to be tested for Scenario 3. 17. Hypotheses to be tested for Scenario 4. 18. Hypotheses to be tested for Scenario 5. Choices for (14) - (18) are (a) H0 : σ 2 = 4.41, H1 : σ 2 = 4.41 (b) H0 : p1 = p2 , H1 : p1 < p2 (c) H0 : µ = 244, H1 : µ < 244 (d) H0 : µD = 0, H1 : µD = 0 (e) H0 : p1 − p2 = 0, H1 : p1 − p2 > 0 (f ) H0 : p = .2, H1 : p = .2 (g) H0 : σ 2 = 4.41, H1 : σ 2 < 4.41 (h) H0 : p = .2, H1 : p > .2 (i) H0 : µ = 244, H1 : µ = 244 4

(j) H0 : µD = 0, H1 : µD > 0 19. For which of the above scenarios would you apply the z-test? 20. For which of the above scenarios would you apply the t-test? 21. For which of the above scenarios would you apply the χ2 -test? Choices for questions (19), (20) and (21): (a) for scenario 4 only (b) for scenarios 1 and 3 (c) for scenarios 1 and 5 (d) for scenarios 2 and 4 (e) for scenarios 2 and 5 (f ) for scenarios 3, 4 and 5 (g) for scenarios 1, 2 and 5 (h) for scenario 2 only (i) for scenario 3 only (j) NoT 22. For Scenario 1, which of the following statements are correct or must be assumed? (i) the population is normal (ii) the Central Limit Theorem holds (iii) the standard deviation σ is known (a)

(i) only,

(b)

(d)

(i) + (ii) only,

(g)

all three,

(ii) only, (e)

(c)

(iii) only,

(i) + (iii) only,

(f )

(ii) + (iii) only,

(h) NoT.

23. If you carry out the hypothesis test at the 0.05 significance level, your conclusion for Scenario 1 is (b) the test is not significant at 0.05 level (c) the sample is too small to conclude anything (d) the test is significant at 0.05 level 5

(e) NoT 24. For Scenario 2, the rejection region at the 5% level of significance is: (Note: TS denotes “test statistic”)

(a) T S ≤ 1.96 (b) T S ≥ 1.96 (c) T S ≤ −1.96 or T S ≥ 1.96 (d) T S ≤ 1.330 (e) T S ≥ 1.330 (f ) T S ≤ −1.734 or T S ≥ 1.734 (g) T S ≤ 1.645 (h) T S ≥ 1.645 (i) T S ≤ −1.645 or T S ≥ 1.645 (j) NoT 25. For Scenario 2, the P-value is (a) 0.217

(b) 0.365

(c) 0.250

(d) 0.223

(e) 0.022

(f ) 0.694

(g) 0.688

(h) 0.063

(i) 0.775

(j) NoT

26. For Scenario 3, the numerical value of the test statistic is: (a) 3.26

(b) 9.65

(c) 14.50

(d) 14.22

(e) 17.71

(f ) 87.64

(g) 10.88

(h) 17.90

(i) 3.75

(j) NoT

27. Based on the observed test statistic, the correct conclusions for Scenario 3 are: (a) reject at 5% level, reject at 10% level (b) retain at 5% level, retain at 10% level (c) retain at 5% level, reject at 10% level (d) reject at 5% level, retain at 10% level (e) NoT 6

28. For Scenario 4, a 98% confidence interval for the difference of proportions is: (a) (-0.0142, 0.1218)

(b) (-0.0112, 0.1182)

(c) (-0.0657, 0.1129)

(d) (-0.0104, 0.1256)

(e) (-0.0118, 0.1182)

(f ) (-0.0148, 0.1212)

(g) (-0.0088, 0.1152)

(h) (-0.0082, 0.1158)

29. The numerical value of the test statistic for scenario 5 is: (a) 1.207

(b) 2.432

(c) 3.285

(d) 4.237

(e) 5.331

(f ) 6.178

(g) 7.317

(h) 8.024

(i) 9.381

(j) NoT

VI. Three different methods of class evaluation for STATS 245 were investigated to determine whether they influence learning. The methods differed in the number of tests, homework and computer assignments. The same text and instructor were used in all three methods. The response variable was percentage of test points obtained by each student on the final exam. The actual data got erased by accident, but some summary quantities are n1 = 29, x1 = 74.7, s1 = 12.5 n2 = 18, x2 = 78.5, s2 = 12.6 n3 = 15, x3 = 79.5, s3 = 8.0 Also parts of the ANOVA table got erased, but it is actually possible to fill in blank spots. The missing entries are denoted by question numbers in parenthesis. Source Treatment Error Total

D.F. (30) (32) (34)

S.S. (31) 7970.6 8257

M.S. 143.2 (33)

F 1.06

• Work out the missing entries (30) - (34), and then select your answers from the following (rounded) choices. (a) (e)

61 135

(b) 5 (f ) 2

(c) (g)

286 80

(d) (h)

59 30 (i)

NoT

35. At the 0.05 significance level, the rejection region is given by (a) T S ≥ 2.61 (e) T S ≥ 3.49

(b) T S ≥ 5.70 (f ) T S ≥ 8.74

(c) T S ≥ 3.15 (g) T S ≥ 3.29

(d) T S ≥ 2.49 (h) T S ≥ 4.16

36. Based on the observed F statistic, your conclusion is best described as (a) reject H0 : µ1 = µ2 = µ3 7

(b) reject H0 : µ1 = µ2 (c) the test is not significant (d) the test is significant (e) both (a) and (d) (f ) both (b) and (d) (g) NoT

VII. The tumor-producing potential of a new drug was tested. One hundred rats were used as a control group, 100 were exposed to a low dose of a new drug, and 100 were exposed to a high dose. The results were

control low dose high dose

0 tumors 93 89 86 268

1 or more 7 11 14 32

100 100 100 300

• Is there sufficient evidence to conclude that the dosage does, in fact, affect the occurrence of tumors using α = .05? 37. What is the number of degrees of freedom associated with this contingency table? (a)

12

(b)

9

(c)

16

(d)

(f )

20

(g)

3

(h)

13

(i)

6 2

(e)

11

(j)

NoT

38. At the .05 level of significance, what is the critical region for the test statistic? (a) T S ≥ 2.59

(b) T S ≥ 5.99

(c) T S ≤ 4.45

(d) T S ≤ 0.10

(e) T S ≤ 2.59

(f ) T S ≤ 5.99

(g) T S ≥ 4.45

(h) T S ≥ 0.10

39. What is the expected number of rats with no tumors after having taken a high dose? 40. What is the contribution of the rats from the preceding question to the overall measure of discrepancy between the observed and expected frequencies (that is, to the numerical value of the test statistic)? Choices for question (39) and (40) are: (a) 11.07

(b) 89.33

(c) 0.23

(d) 14.79

(e) 4.45

(f ) 7.25

(g) 0.12

(h) 0.48

(i) 0.33

(j) NoT

8

VIII. Crickets make a chirping sound with their wing covers. Scientists have recognized that there is a linear relationship between the frequency of chirps and the temperature. The table below contains measurements for the striped ground cricket:

y x

20.0 88.6

16.0 71.6

19.8 93.3

18.4 84.3

17.1 80.6

15.5 75.2

14.7 69.7

17.1 82.0

15.4 69.4

16.3 83.3

15.0 79.6

17.2 82.6

16.0 80.6

17.0 83.5

14.4 76.3

• Here, y is chirps per second and x is the temperature in degrees Fahrenheit. The data are shown on the following scatter plot. 9

10

• • Summary statistics are: n = 15

15 

xi = 1200.60

i=1 15 

yi = 249.90

i=1

15 

x2i = 96, 725.86

i=1

15 

yi2 = 4203.81

i=1

15 

xi yi = 20, 135.80.

i=1

MINITAB output for this set of data is also enclosed to help answer questions. 41. What proportion of the total variability in y is “explained” by the linear regression? (a) 0.895

(b) 0.702

(c) 0.781

(d) 0.682

(e) 1.000

(f ) 0.873

(g) 0.590

(h) 0.465

(i) 0.911

(j) NoT

42. What is the value of the correlation coefficient? (a) 0.465

(b) -0.781

(c) -0.682

(d) 0.838

(e) -0.884

(f ) -0.465

(g) 0.781

(h) 0.682

(i) -0.838

(h) 0.884

43. If you test H0 : β0 = 0 against H1 : β0 = 0 at 0.05 significance level, your conclusion will be best described as: (a) reject H0 (b) do not reject H0 (c) the test is not significant 11

(d) the test is significant (e) both (a) and (d) (f ) both (b) and (c) (g) NoT 44. What is the mean predicted frequency when the temperature is 77 degrees? (a) 14.07

(b) 15.98

(c) 5.02

(d) 15.42

(f ) 13.31

(g) 6.13

(h) 1.57

(i) NoT

(e) 15.11

45. Find the 95% confidence interval for this prediction. (Pick the closest answer.) (a) (13, 18)

(b) (11, 19)

(c) (6, 17)

(d) (1, 14)

(e) (15, 17)

(f ) (14, 18)

(g) (4, 16)

(h) (6, 12)

THE END

12

Suggest Documents