9. E (Similar to 2007 Q21) I. False. Since the hypotheses are about the proportion who said yes, np 55(0.6) 33 and n(1 p) 55(0.4) 12 are both greater than 10.

II.

III.

40 0.7273 0.6 0.0270 . True. The p-value is P pˆ P z 55 0.6(0.4) 55 True. Since the null hypothesis is rejected, a Type I error is made if, in fact, the true proportion of citizens who support the proposal to build the water park is equal to or less than 60 percent.

10. D (Similar to 1997 Q33) 0.5(1 0.5) 2.326 0.04 ; n 845.356 ; Therefore, of the answer choices, the smallest n sample size that will guarantee a margin of error of at most 0.04 is 849.

Free Response: 11. Similar to 2005 Q4 Solution Part (a): State a correct pair of hypotheses. Let p be the proportion of bottles with a code for a free music download. H 0 : p 0.25 H a : p 0.25

Part (b): Identify a correct test (by name or formula) and check appropriate conditions. One sample z-test for a proportion

OR

z

pˆ p p (1 p ) n

Conditions: 1. np 82(0.25) 20.5 10 and n(1 p) 82(0.75) 61.5 10 . 2. It is reasonable to assume that the company produces more than 82(10) 820 bottles of this tea. ( N 10n ) 3. Random Sample: The bottles were sampled at random as stated in the stem. Part (c): Use correct mechanics and calculations, and provide the p-value (or rejection region). 0.2073 0.25 17 0.893 and 0.2073 . The test statistic is z 82 0.25(1 0.25) 82 the p-value is P( z 0.893) 0.1860

The sample proportion is pˆ

Part (d): State a correct conclusion, using the result of the statistical test, in context of the problem. Since the p-value 0.186 is larger than any reasonable significance level (e.g., 0.05 ), we cannot reject the company’s claim. That is, we do not have statistically significant evidence to support the student’s belief that the proportion of tea bottles with a free music download on the cap is less than 25 percent.

Scoring The question is divided into four parts. Each part is scored as essentially correct (E), or incorrect (I). Part (a) is essentially correct (E) if the student states a correct pair of hypothesis. Notes: 1. Since the proportion was defined in the stem, standard notation for the proportion ( p or ) need not be defined in the hypothesis. 2. Nonstandard notation must be defined correctly. 3. A two-sided alternative is incorrect for this part. Part (b) is essentially correct (E) if the student identifies a correct test (by name or formula) and checks for appropriate conditions. Notes: 1. np 5 and n(1 p ) 5 are OK as long as appropriate values are used for n and p. 2. Since students cannot check the actual population size, they do not need to mention it. 3. The stem of the problem indicates this is a random sample so it (or a discussion of independence) does not need to be repeated in the solution. Part (c) is essentially correct (E) if no more than one of the following errors is present in the student’s work: Undefined or nonstandard notation is used: OR The correct z-value 0.89 is given with no setup for the calculation; OR The incorrect z-value 0.95 is given because pˆ was used in the calculation of the standard error. For this incorrect z-value, the p -value 0.1702 .; OR The incorrect z-value is calculated because of a minor arithmetic error. Part (c) is incorrect (I) if: Inference for a lower tail alterative is based on either two tails p -value 0.4146 or the upper tail p -value 0.7927 ; OR An unsupported z-value other than 0.89 or 0.95 is given; OR The correct z-value 0.89 is given but equated to an incorrect formula.

Notes: 1. Students using a rejection region approach should have critical values appropriate for a lower tail test, e.g. for 0.05 the rejection region is z 1.645 . 2. Other possible correct mechanics include: Exact Binomial X ~ Binomial(82, 0.25) . The exact p-value is P( X 17) 0.2248 Normal Approximation to Binomial (with or without a continuity correction) X is approximately Normal(20.5, 3.9211) . The approximate p-value using the 17 0.5 20.5 continuity correction is P Z P( Z 0.8799) 0.1895 . 3.9211 Confidence interval approach – provided there is a reasonable interpretation tied to a significance level. For example if 0.05 , and p 0.25 is within a 95% upper confidence bound (0, 0.2810) or a two-tailed 90% confidence interval (0.1337, 0.2810) . Part (d) is essentially correct (E) if the student states a correct conclusion in the context of the problem, using the result of the statistical test. Notes: 1. If both an and a p-value (or critical value) are given, the linkage is implied. 2. If no is given, the solution must be explicit about the linkage by giving a correct interpretation of the p-value or explaining how the conclusion follows from the p-value. 3. If the p-value in part (c) is incorrect but the conclusion is consistent with the computed p-value, part (d) can be considered essentially correct (E). 4. If a student accepts the null hypothesis and concludes the proportion really is 0.25, this part is incorrect (I). Each essentially correct (E) response counts as 1 point, each partially correct (P) response counts 1 as point. 2 4

Complete Response

3

Substantial Response

2

Developing Response

1

Minimal Response

1 points), use a holistic approach to 2 determine whether to score up or down depending on the strength of the response and communication.

Note: If a response is in between two scores (for example, 2

12. 2011B Q5 Intent of Questions The primary goals of this question were to assess students’ ability to (1) identify and check appropriate conditions for inference; (2) identify and carry out the appropriate inference procedure; (3) determine the sample size necessary to meet certain specifications in planning a study. Solution Part (a): Step 1: Identifies the appropriate confidence interval by name or formula and checks appropriate conditions One-sample (or large-sample) interval for p (the proportion of the vaccine-eligible people pˆ (1 pˆ ) in the United States who actually got vaccinated) or pˆ z * . n Conditions:

1. Random sample 2. Large sample ( npˆ 10 and n(1 pˆ ) 10 )

The stem of the problem indicates that a random sample of vaccine-eligible people was surveyed. The number of successes (978 vaccine-eligible people who received the vaccine), and failures (1,372 vaccine-eligible people who did not receive the vaccine), are both much larger than 1, so the large-sample interval procedure can be used. Step 2: Correct mechanics 0.41617(1 0.41617) 978 2,350 2.57583 2.350 0.41617 2.57583 0.01017 0.41617 0.02619 (0.38998, 0.44236)

Step 3: Interpretation Based on the sample, we are 99 percent confident that the proportion of the vaccineeligible people in the United States who actually got vaccinated is between 0.39 and 0.44. Because 0.45 is not in the 99 percent confidence interval, it is not a plausible value for the population proportion of vaccine-eligible people who received the vaccine. In other words, the confidence interval is inconsistent with the belief that 45 percent of those eligible got vaccinated.

Part (b): The sample-size calculation uses 0.5 as the value of the proportion in order to provide the minimum required sample size to guarantee that the resulting interval will have a marge of error no larger than 0.02. 2

(2.576) 2 (0.5)(0.5) 2.576 n 4,147.36 (0.02) 2 2(0.02) Thus a sample of at least 4,148 vaccine-eligible people should be taken in Canada. Scoring Each step in part (a) is scored as essentially correct (E), partially correct (P), or incorrect (I); and part (b) is scored as essentially correct (E), partially correct (P), or incorrect (I). Step 1 of part (a) is scored as follows: Essentially correct (E) if the one-sample z-interval for a proportion is identified (either by name or formula) AND both conditions (of random sampling and sample size) are stated and checked. Partially correct (P) if the response identifies the correct procedure BUT adequately addresses only one of the two required conditions. OR if the response does not identify the correct procedure BUT adequately addresses both conditions. Incorrect (I) for any of the following: • The response identifies the correct procedure BUT does not adequately address either required condition, • The correct procedure is not identifies AND at most one of the required conditions is adequately addressed, • An incorrect procedure is identified. Notes • If the formula is of the correct form, even if incorrect numbers appear in it, then the procedure may be considered correctly identified. • Stating only that npˆ and n(1 pˆ ) both are greater than or equal to 10 is only a statement of the sample size condition and is not sufficient for checking the condition. The response must use specific values from the question to check the condition. • If a response includes additional inappropriate conditions, such as n 30 or requiring a normal population, then the response can earn no more than a P for this step. However, stating and checking a condition about the size of the sample relative to the size of the population is not required but is also not inappropriate. • Any statement of hypotheses, definitions of parameters, statements of populations, etc. should be considered extraneous. However, if these statements are included and incorrect, this should be considered poor communications in terms of holistic scoring.

Step 2 of part (a) is scored as follows: Essentially correct (E) if a 99 percent confidence interval is correctly computed. Partially correct (P) for any of the following: • If a correct method (confidence interval for a proportion) is used, BUT an incorrect critical z-value or a t-value is used. • 0.45 is used for the value of pˆ . • There are errors in the calculation of the interval (unless such errors follow from an incorrect procedure in step 1) Incorrect (I) if an incorrect method is used, such as a t-interval for a population mean, OR if the resulting interval is unreasonable, such as an interval with integer endpoints. Step 3 of part (a) is scored as follows: Essentially correct (E) if the response notes that 0.45 is not in the 99 percent confidence interval AND states that this is evidence against the belief that 45 percent of vaccine-eligible people had received flu-vaccine. Partially correct (P) if a reasonable statement about the belief that 45 percent of vaccineeligible people is made, in context, but there is no clear connection made to the confidence interval, OR a clear connection to the confidence interval is made, but the response includes Ione or both of the following two omissions: 1. The response is not in context. 2. The response does not mention the confidence level of 99 percent. Incorrect (I) if the response fails to meet the criteria for E or P. Part (b) is scored as follows: Essentially correct (E) if an appropriate sample size is calculated and supporting work is shown. Partially correct (P) if supporting work is shown, BUT the response includes one or both of the following errors: 1. 0.41617 (the sample proportion) or 0.45 is used instead of 0.5. 2. An incorrect critical z-interval is used—unless the same incorrect value was used in part (a). Incorrect (I) if the response fails to meet the criteria E or P. Notes • In this situation, the formula for margin or error used to compute sample size is only an approximation of the margin of error. Because of this, we will not insists that the computed sample size be rounded up; that is 4,127 is scored as E, as long as supporting work is shown. • If the critical value of 2.575 is used, then the sample size should be n 4,144.14 or n 4,145 (or 4,144 ). • If the final recommended sample size is not an integer, then the response can earn no more than a P.

Each essentially correct (E) part counts as 1 point. Each partially correct (P) part counts as ½ point. 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response

13. Similar to 2009 Q5 Solution: Part (a): The p-value of 0.2141 measures the chance of observing a sample proportion of unbroken windows as low or lower than pˆ 0.83 assuming the true proportion that won’t break is 0.85 . Part (b): Since the p -value 0.2141 0.05 we cannot reject the null hypothesis. There is not significant evidence that the success rate of the new windows is less than 85%. Part (c): Since we did not reject the null hypothesis and it could be false, we might have made a Type II error. One consequence of this error would be that these windows could be sold as being able to resist high wind speeds, when in fact they cannot and many customers could be hurt or have property loss or damage. Scoring: Parts (a),(b), and (c) are scored as essentially correct (E), partially correct (P), or incorrect (I). Part (a) is scored as follows: A correct interpretation must include the following three components: Correct probability phrase (e.g., “The p-value of 0.2141 measures the chance of…”) that includes “as small as” (or something similar). Correct conditional phrase (e.g., “assuming the true proportion that won’t break is 0.85”). Correct context. Essentially correct (E) if the response includes all three components. Partially correct (P) if the response includes the first component and one of the other two components; OR if the probability phrase is complete except for the omission of the words “as small as” (or something similar) and the other two components are included. Incorrect (I) if the response includes no more than one component.

Part (b) is scored as follows: Essentially correct (E) if a correct conclusion (failure to reject H 0 ) is provided in context with appropriate linkage to the p-value. Partially correct (P) if a correct conclusion is provided but either the context or linkage is missing; OR the student “accepts H 0 ” (or something similar) and provides both context and linkage. Incorrect (I) if the student rejects H 0 OR the student provides neither context nor linkage. Part (c) is scored as follows: A correct response must include the following two components: The type of error named is consistent with the conclusion in part (b). A consequence is provided (in context) that is consistent with the conclusion in part (b) and is specific with regard to the success rate of the windows. The consequence must address how the lower success rate affects consumers. Essentially correct (E) if the response includes both components. Partially correct (P) if the response only includes the consequence component OR the type of error named is consistent with the conclusion in part (b) AND a correct definition (either generic or in context) of that error is given, but the consequence component is either missing or incorrect. Incorrect (I) if the response does not include the consequence component, apart from the exception given above as the second type of partially correct response. 4 3 2

1

Complete Response All three parts essentially correct Substantial Response Two parts essentially correct and one part partially correct Developing Response Two parts essentially correct and no part partially correct OR One part essentially correct and one or two parts partially correct OR Three parts partially correct Minimal Response One part essentially correct and no parts partially correct OR No parts essentially correct and two parts partially correct

14. 2010B Q4 Intent of Question The primary goals of this question were to assess students’ ability to (1) calculate and interpret a confidence interval for a population proportion; (2) recognize that it is still reasonable to use the confidence interval procedure even though sampling is without replacement as long as the sample size is small relative to the population size. Solution Part (a): 13 0.26 . The conditions for 50 constructing a confidence interval are satisfied because: (1) the problem states that the 50 songs in the sample were randomly selected, and (2) npˆ 13 and n(1 pˆ ) 37 are both at least 10. A 90 percent confidence interval for the population proportion p, the actual proportion of all songs on the player that were loaded by Lori, is 0.26(1 0.26) 0.26 1.645 0.26 0.102 (0.158, 0.362) 50 We can be 90 percent confident that for the population of all songs on the digital music player, the proportion of songs that were loaded by Lori is between 0.158 and 0.362.

The sample proportion of songs that were loaded by Lori is pˆ

Part (b): The sample size of 50 is quite small compared with the population size of 2,384. The usual criterion for checking whether one can disregard the distinction between sampling with or without replacement is to check whether the ratio of the population size to the sample size is 2384 47.7 , so the criterion is large, such as at least 10 or at least 20. In this case the ratio is 50 clearly met, and the confidence interval procedure in part (a) is valid. Scoring This question is scored in four sections. Part (a) has three components: (1) stating the appropriate confidence interval procedure and checking its conditions; (2) construction of the confidence interval; (3) interpretation of the confidence interval. Section 1 consists of part (a), component 1; section 2 consists of part (a), component 2; section 3 consists of part (a), component 3. Section 4 consists of part (b). Each of the four sections is scored as essentially correct (E), partially correct (P) or incorrect (I). Section 1 is scored as follows: Essentially correct (E) if the response identifies a one-sample z-interval for a proportion (either by name or by formula) and also includes a statement of the random sampling condition and a statement of, and check of, the sample size condition.

Partially correct (P) if the response identifies the correct procedure but adequately addresses only one of the two conditions (random sampling, sample size) OR does not identify the correct procedure but adequately addresses both conditions. Incorrect (I) if the response identifies the correct procedure but does not adequately address either condition OR does not identify the correct procedure and adequately addresses, at most, one condition. Notes • Stating only that “ npˆ and n(1 pˆ ) ” are both greater than 10” is only a statement of the sample size condition and is not sufficient for checking it. The response must use specific values from the question in the check of the condition. • If a response includes an inappropriate condition, such as requiring that n 30 or requiring a normal population, then the response can earn no more than a P for part (a). However, stating and checking a condition about the size of the sample relative to the size of the population is not required but is also not inappropriate. Section 2 is scored as follows: Essentially correct (E) if the response makes use of the appropriate confidence interval procedure and calculates the 90 percent confidence interval correctly. Partially correct (P) if the response makes use of the appropriate confidence interval procedure but does not include a correct calculation of the 90 percent confidence interval. Incorrect (I) if the response makes use of an incorrect procedure, such as a t-interval for a population mean. Section 3 is scored as follows: Essentially correct (E) if the response provides a reasonable interpretation, in context, making clear that the estimate is for the population proportion of songs that were loaded by Lori and that we have 90 percent confidence in the interval. Partially correct (P) if the response provides a reasonable interpretation, but does not make clear that the estimate is for the population proportion of songs that were loaded by Lori or does not mention 90 percent confidence. Incorrect (I) if the response provides an incorrect interpretation. Section 4 is scored as follows: Essentially correct (E) if the response states that the difference between sampling with or without replacement is negligible here because the population size is large relative to the sample size AND provides a reasonable numerical justification for this assertion. Partially correct (P) if the response states that the difference between sampling with or without replacement is negligible here because the population size is large relative to the sample size, but provides no numerical justification for this assertion. Incorrect (I) if the response does not state that the difference between sampling with or without replacement is negligible here because the population size is large relative to the sample size.

Notes • Reasonable numerical justification includes stating that the sample size is less than 5 percent (or 10 percent) of the population size or that the ratio of the population size to the sample size is greater than 20 (or 10). • A response that compares the probabilities of songs being selected with or without replacement in terms of the sample and populations sizes, and that concludes that the difference in probabilities will be negligible, may be scored as essentially correct (E). Each essentially correct (E) section counts as 1 point and each partially correct (P) section counts 1 as point. 2 4 Complete Response 3 Substantial Response 2 Developing Response 1 Minimal Response 1 points), use a holistic approach to 2 determine whether to score up or down, depending on the strength of the response and communication.

If a response is between two scores (for example, 2