Section 3.1: Sampling Distributions

Section 3.1: Sampling Distributions Objective: using Sample Statistics to estimate Population Parameters 1) Average price of gasoline in some states ...
Author: Frank French
0 downloads 1 Views 874KB Size
Section 3.1: Sampling Distributions Objective: using Sample Statistics to estimate Population Parameters 1)

Average price of gasoline in some states – March 2016

October 2016 – the average price of gasoline in Maryland is $2.25

2) From Pew Research - Tech Device Ownership: 2015

1

To do at home: Read section 3.1, starting with page 162; then, answer the questions 1-10, and then, questions 13-18. 1) What is statistical inference?

2) What is a parameter?

3) What is a statistic?

4) Table 3.1 - Give the notation for common parameters and statistics Population Parameter Sample Statistic Mean Standard Deviation Proportion Correlation Slope

5) What is a point estimate?

6) Book problems - In Exercises 3.1 to 3.5, state whether the quantity described is a parameter or a statistic and give the correct notation. 3.1 - Average household income for all houses in the US, using data from the US Census. 3.2 - Correlation between height and weight for players on the 2010 Brazil World Cup team, using data from all 23 players on the roster. 3.3 - Proportion of people who use an electric toothbrush, using data from a sample of 300 adults. 3.4 - Proportion of registered voters in a county who voted in the last election, using data from the county voting records. 3.5 - Average number of television sets per household in North Carolina, using data from a sample of 1000 households.

2

7) Book problems - In Exercises 3.6 to 3.10, give the correct notation for the quantity described and give its value. 3.6 - Proportion of families in the US who were homeless in 2010. The number of homeless families5 in 2010 was about 170,000 while the total number of families is given in the 2010 Census at 78 million.

3.7 - Average enrollment in charter schools in Illinois. In 2010, there were 95 charter schools in the state of Illinois6 and the total number of students attending the charter schools was 30,795.

3.8 - Proportion of US adults who own a cell phone. In a survey of 2252 US adults, 82% said they had a cell phone.7

3.9 - Correlation between age and heart rate for patients admitted to an Intensive Care Unit. Data from the 200 patients included in the file ICUAdmissions gives a correlation of 0.037.

3.10 - Mean number of cell phone calls made or received per day by cell phone users. In a survey of 1917 cell phone users, the mean was 13.10 phone calls a day.8

8) Example : In the Fall 2014, 56% of MC students were female (according to the Montgomery College Fall 2014 student profile). In one of my Fall 2014 classes, 75% of the students were female. a.

What is the parameter of interest? Select one of the two statements below: i. The mean number of female students in the college ii. The proportion of female students in the college

b.

The parameter value with correct notation is ________________

c.

The statistic value with correct notation is _________________

d.

Is the sample statistic __________ a good point estimate for the population parameter ___________? Why or why not?

9) Example : In the Fall 2014, the average age of all MC students was 25.41 (according to the Montgomery College Fall 2014 student profile). A random sample of 200 students had an average age of 24.5 years. a.

What is the parameter of interest?

b.

The parameter value with correct notation is ________________

c.

The statistic value with correct notation is _________________

d.

Is the sample statistic __________ a good point estimate for the population parameter ___________? Why or why not?

3

10) Example: Using Search Engines on the Internet A 2012 survey of a random sample of 2253 US adults found that 1,329 of them reported using a search engine (such as Google) every day to find information on the Internet. a). Find the relevant proportion and give the correct notation with it.

b). Is your answer to part (a) a parameter or a statistic?

c). Give notation for and define the population parameter that we estimate using the result of part (a).

Variability of Sample Statistics - Using STATKEY – Instructions for you are on the last two pages. These are the types of questions to consider in each example that follows: a) b) c) d) e) f) g) h) i) j) k) l)

m) n) o)

What are the cases? Is this a population or a sample? What is the variable? Is it quantitative or categorical? Is the problem about means or proportions? How many are there? (Size of population) What is the shape of the distribution of the variable? List all parameters Are the parameters fixed or variable? Sampling from this population a. Select samples of size 3, give the shape and the statistics for each of the selected samples Are the sample statistics fixed or variable? Sampling distribution of sample means for samples of size 3, 20, 50 a. Give shape, mean and standard deviation (standard error) b. Comment on shape, center and variability of this distribution as sample size increases What does each dot in the distribution of the population represent? What does each dot in the distribution of the sample represent? What does each dot in the distribution of the sample means represent?

4

Section 3.1 – Sampling Distributions – Continued Variability of Sample Means - Using WEB-APPLETS

5

Section 3.1 - Variability of Sample Means - Using STATKEY – 11) Example: Enrollment in Graduate Programs in Statistics. DATA 3.1 STATISTICS PHD: There are 82 US statistics doctoral programs. The name of the school along with the total enrollment of fulltime graduate students in each program in 2009 is listed in DATA set StatisticsPhD. (In STATKEY – SAMPLING DISTRIBUTIONS – MEAN – select data set – generate samples) a) Put a circle around the words that describe the cases. b) Put a rectangle around the words that describe the variable. Is the variable quantitative or categorical? c) Describe the population (shape, mean, standard deviation) – what does each dot represent? d) Describe the sample (shape, mean, standard deviation) – what does each dot represent? e) Create the sampling distribution of the mean for samples of size (1) n = 3 (2) n = 20 (3) n = 50 f) Describe each distribution, what does each dot represent? g) Compare results and comment on the effect of the sample size?

6

To do at home: Summarizing my results – Sampling size effect 12) Example: Enrollment in Graduate Programs in Statistics. Here are my results; complete the right column. Sampling dot plot of Mean Shape = Mean = Standard deviation = Minimum = Maximum =

Sampling dot plot of Mean Shape = Mean = Standard deviation = Minimum = Maximum =

Sampling dot plot of Mean Shape = Mean = Standard deviation = Minimum = Maximum = Thinking about it: What is the effect of the sample size?

7

To do at home: Keep reading section 3.1 of the book to complete the following: 13) What is a sampling distribution? Complete the following: If we select many samples of the same size from the same population and compute the sample statistic x-bar for each of the selected samples, we generate the _______________________________________ of the statistic x-bar. A sampling distribution shows how the sample statistics vary from sample to sample.

14) Shape and Center of a sampling distribution: If samples are __________________________________ and the sample size is ______________________________ the sampling distribution will be _______________________________________ and centered at ________________________. From the 95% rule we can conclude that about ________________ of sample statistics will be within _____________________ from the parameter.

15) What is the standard error? Circle the correct choice below: a) The standard deviation of the population b) The standard deviation of the sample c) The standard deviation of the distribution of sample means (or sample proportions) 16) What happens with the standard error of the sampling distribution as the sample size increases?

17) Sample Size Matters! As the sample size _______________________ the variability of sample statistics tend to _______________________ and sample statistics tend to be closer to the __________________________ of the ______________________________. Show sketches that reinforce this concept.

18) Why is it important to select Random Samples? Refer here to the section 1.2 – Gettysburg example; the simulations done in class. Show a rough sketch of the sampling distribution of x-bar for the biased samples and for the random samples. Give the center (mean) of each distribution and also label 4.29 which was the actual population mean.

8

To do at home - Variability of Sample Means - Using STATKEY – 19) Example: 2011 Hollywood Movies Budget – Using StatKey Budget, in millions of dollars, of the 136 movies that came out of Hollywood in 2011 (In STATKEY – SAMPLING DISTRIBUTIONS – MEAN – select data set – generate samples) a) Put a circle around the words that describe the cases. b) Put a rectangle around the words that describe the variable. Is the variable quantitative or categorical? s c) Describe the population (shape, mean, standard deviation) d) Create a sampling distribution of the mean for samples of size; (2) n = 10 (2) n = 40 e) For each, give the shape, mean and standard deviation (standard error) f) Summarize and compare results

9

Variability of Sample Proportions – WEB APPLETS 20) Reese’s Pieces – 40% of Reese’s pieces are orange – Select samples of size 25. Count the number of orange pieces in the sample, determine the sample proportion and construct dotplot. Repeat this 10 times. Describe (give the shape, mean and standard deviation or standard error) the distribution of sample proportions for samples of size 25. Note – this applet use pi, instead of p for the population proportion.

10

Variability of Sample Proportions - Using the calculator 21) In the Fall 2014, 56% or MC students were female (according to the Montgomery College Fall 2014 student profile). We are going to use the calculator to select a random sample of 100 students from this population, count the number of female students in the sample and determine the sample proportion p-hat. What is the population? What is the variable Is it quantitative or categorical? For this simulation we need to use RandBin(n, p) In our case the sample size is n = 100, and the population proportion is p = 0.56 Steps to accomplish this: Press MATH (left column of calculator) Arrow right to PRB Select 7:RandBin( Type the values of n and p separated by a comma Press ENTER a)

The number of female students in your sample is ____________

b) Calculate the sample proportion in your sample: c)

p-hat = ________________

If you select another sample of 100 students at random, will the proportion of female students be the same as the one above?

d) Run another simulation and calculate the proportion of female students in this second sample ________________ e)

Construct a dot plot with yours and the class’ results:

_____|____|____|____|____|____|____|____|____|____|____|____|____|____|____|____|____|__ 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 f)

Is the population proportion unique? If so, what is its value? ______________

g)

What does each of the dots of the distribution shown in the dot plot represent?

h) Describe the center and shape of the distribution shown in the dot plot Center: Shape: i)

How likely are the following sample proportions? Very likely, somewhat likely, very unlikely? a. P-hat = 0.54 b. p-hat = 0.72 c. p-hat = 0.5

11

Variability of sample proportions using Statkey 22) Example: 27.5% of US adults at least 25 years old have a college bachelor’s degree or higher. Use Statkey to investigate the behavior of sample proportions from this population for samples of size 200. (In STATKEY – SAMPLING DISTRIBUTIONS – PROPORTIONS – edit proportion – generate samples)

12

Start in class / finish at home - Do we really understand sampling distributions? 23) The sampling distribution for the mean for samples of size 30 is shown below.

a) What does one dot in the dotplot represent? b) Use the sampling distribution to estimate the mean of the population; use correct notation. c) Estimate the standard error: (1) 0.5 (2) 1.1 (3) 2.1 (4) 3.1 d) Classify sample means as likely (within two standard errors of the mean) or unlikely (1) x-bar = 140 (2) x-bar = 144 (3) x-bar = 138 e) If we took samples of size 100 instead of 50, and used the sample mean to estimate the population mean, would the standard error be larger or smaller? Would the estimates be more accurate or less accurate?

24) The sampling distribution for the mean for samples of size 50 is shown below.

a) What does one dot in the dotplot represent? b) Use the sampling distribution to estimate the mean of the population; use correct notation. c) Estimate the standard error: (2) 0.5 (2) 1.1 (3) 2.1 (4) 3.1 d) Classify sample means as likely (within two standard errors of the mean) or unlikely (1) x-bar = 2.0 (2) x-bar = 4.2 (3) x-bar = 5.5 e) If we took samples of size 100 instead of 50, and used the sample mean to estimate the population mean, would the standard error be larger or smaller? Would the estimates be more accurate or less accurate?

13

25) 3.28 – Section 3.1 – book - Hollywood Movies Data 2.7 introduces the dataset HollywoodMovies2011, which contains information on all the 136 movies that came out of Hollywood in 2011.15 One of the variables is the budget (in millions of dollars) to make the movie. Figure 3.8 shows two boxplots. One represents the budget data for one random sample of size n=30. The other represents the values in a sampling distribution of 1000 means of budget data from samples of size 30.

Figure 3.8 One sample and one sampling distribution: (a) Which is which? Explain. (b) From the boxplot showing the data from one random sample, what does one value in the sample represent? How many values are included in the data to make the boxplot? Estimate the minimum and maximum values. Give a rough estimate of the mean of the values and use appropriate notation for your answer. (c) From the boxplot showing the data from a sampling distribution, what does one value in the sampling distribution represent? How many values are included in the data to make the boxplot? Estimate the minimum and maximum values. Give a rough estimate of the value of the population parameter and use appropriate notation for your answer.

21) 3.30 - Number of Screws in a Box A company that sells boxes of screws claims that a box of its screws contains on average 50 screws . Figure 3.10 shows a distribution of sample means collected from many simulated random samples of size 10 boxes.

Figure 3.10 Sampling distribution for average number of screws in a box (a) For a random sample of 10 boxes, is it unlikely that the sample mean will be more than 2 screws different from μ? What about more than 5? 10? (b) If you bought a random sample of 10 boxes at the hardware store and the mean number of screws per box was 42, would you conclude that the company's claim is likely to be incorrect? (c) If you bought a random box at the hardware store and it only contained 42 screws, would you conclude that the company's claim is likely to be incorrect? 14

26) Proportion Never Married A sampling distribution is shown for the proportion of US citizens over 15 years old who have never been married, using the data from the 2010 US Census and random samples of size n = 500.

a). What does one dot in the dotplot represent?

b). Use the sampling distribution to estimate the proportion of all US citizens over 15 years old who have never been married. Give correct notation for your answer.

d). If we take a random sample of 500 US citizens over 15 years old and compute the proportion of the sample who have never been married, indicate how likely it is that we will see that result for each sample proportion below. Which are the least likely? The most likely? 𝑝𝑝̂ = 0.30

𝑝𝑝̂ = 0.20

𝑝𝑝̂ = 0.37

𝑝𝑝̂ = 0.45

c). Estimate the standard error of the sampling distribution.

e). If we took samples of size 1000 instead of 500, and used the sample proportions to estimate the population proportion: Would the estimates be more accurate or less accurate? Would the standard error be larger or smaller?

15

27) In the Fall 2014, it was reported that 13.1% of MC students were Hispanic. Explain the meaning of the Simulation shown below. (In STATKEY – SAMPLING DISTRIBUTIONS – PROPORTIONS – edit proportion – generate samples)

28) 3.29 - Defective Screws Suppose that 5% of the screws a company sells are defective. Figure 3.9 shows sample proportions from two sampling distributions: One shows samples of size 100, and the other shows samples of size 1000.

Figure 3.9 Sampling distributions for n=100 and n=1000 screws (a) What is the center of both distributions? (b) What is the approximate minimum and maximum of each distribution? (c) Give a rough estimate of the standard error in each case. (d) Suppose you take one more sample in each case. Would a sample proportion of 0.08 (that is, 8% defective in the sample) be reasonably likely from a sample of size 100? Would it be reasonably likely from a sample of size 1000?

16