Lecture 7: Geometric & Binomial distributions

Lecture 7: Geometric & Binomial distributions Statistics 101 Mine C ¸ etinkaya-Rundel February 7, 2012 Announcements Announcements Due: HW 2 at th...
Author: Mercy Wood
11 downloads 1 Views 245KB Size
Lecture 7: Geometric & Binomial distributions Statistics 101 Mine C ¸ etinkaya-Rundel

February 7, 2012

Announcements

Announcements Due: HW 2 at the beginning of class on Thursday: Clarification on Exercise 3.4 parts (c) and (d): Mary and Leo’s percentile mean the proportion of people whose finishing times are lower than theirs. Note that this doesn’t mean ”% of people they performed better than” because in a triathlon ”performing better” means finishing faster.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

1 / 31

Recap

Qnline quiz 2- commonly missed questions Question 3: Q3: More than three-quarters of the nation’s colleges and universities now offer online classes, and about 23% of college graduates have taken a course online. 39% of those who have taken a course online believe that online courses provide the same educational value as one taken in person, a view shared by only 27% of those who have not taken an online course. At a coffee shop you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before? took online course valuable not valuable total

Statistics 101 (Mine C ¸ etinkaya-Rundel)

didn’t take online course

0.23

total

1

L7: Geo & Binom distributions

February 7, 2012

2 / 31

Recap

Qnline quiz 2- Q3 Review question Which is the correct notation for the following probability? “At a coffee shop you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before?”

(a) P(took online course | not valuable) (b) P(not valuable | took online course) (c) P(valuable and took online course) (d) P(took online course and not valuable) (e) P(valuable | didn’t take online course)

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

3 / 31

Geometric distribution

1

Geometric distribution Bernoulli distribution Geometric distribution

2

Binomial distribution

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

Geometric distribution

Bernoulli distribution

Milgram experiment Stanley Milgram, a Yale University psychologist, conducted a series of experiments on obedience to authority starting in 1963. Experimenter (E) orders the teacher (T), the subject of the experiment, to give severe electric shocks to a learner (L) each time the learner answers a question incorrectly. The learner is actually an actor, and the electric shocks are not real, but a prerecorded sound is played each time the teacher administers an electric shock. Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

4 / 31

Geometric distribution

Bernoulli distribution

Milgram experiment (cont.) These experiments measured the willingness of study participants to obey an authority figure who instructed them to perform acts that conflicted with their personal conscience. Milgram found that about 65% of people would obey authority and give such shocks. Over the years, additional research suggested this number is approximately consistent across communities and time.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

5 / 31

Geometric distribution

Bernoulli distribution

Bernouilli random variables Each person in Milgram’s experiment can be thought of as a trial. A person is labeled a success if she refuses to administer a severe shock, and failure if she administers such shock. Since only 35% of people refused to administer a shock, probability of success is p = 0.35. When an individual trial has only two possible outcomes, it is called a Bernoulli random variable.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

6 / 31

Geometric distribution

Geometric distribution

Geometric distribution Dr. Smith wants to repeat Milgram’s experiments but she only wants to sample people until she finds someone who will not inflict a severe shock. What is the probability that she stops after the first person?

P(1st person refuses) = 0.35 ... the third person?

P(1st and 2nd shock, 3rd refuses) =

S S R × × = 0.652 ×0.35 ≈ 0.15 0.65 0.65 0.35

... the tenth person?

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

7 / 31

Geometric distribution

Geometric distribution

Geometric distribution (cont.) Geometric distribution describes the waiting time until a success for independent and identically distributed (iid) Bernouilli random variables. independence: outcomes of trials don’t affect each other identical: the probability of success is the same for each trial

Geometric probabilities If p represents probability of success, (1 − p) represents probability of failure, and n represents number of independent trials

P(success on the nth trial) = (1 − p)n−1 p

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

8 / 31

Geometric distribution

Geometric distribution

Clicker question Can we calculate the probability of rolling a 6 for the first time on the 6th roll of a die using the geometric distribution? (a) no, on the roll of a die there are more than 2 possible outcomes (b) yes, why not

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

9 / 31

Geometric distribution

Geometric distribution

Expected value How many people is Dr. Smith expected to test before finding the first one that refuses to administer the shock? The expected value, or the mean, of a geometric distribution is defined as 1p .

µ=

1 1 = = 2.86 p 0.35

She is expected to test 2.86 people before finding the first one that refuses to administer the shock. But how can she test a non-whole number of people?

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

10 / 31

Geometric distribution

Geometric distribution

Expected value and its variability Mean and standard deviation of geometric distribution

s

1 µ= p

σ=

1−p p2

Going back to Dr. Smith’s experiment:

s σ=

1−p = p2

r

1 − 0.35 = 2.3 0.352

Dr. Smith is expected to test 2.86 people before finding the first one that refuses to administer the shock, give or take 2.3 people. These values only makes sense in the context of repeating the experiment many many times. Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

11 / 31

Binomial distribution

1

Geometric distribution

2

Binomial distribution The binomial distribution Normal approximation to the binomial

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

Binomial distribution

Suppose we randomly select four individuals to participate in this experiment. What is the probability that exactly 1 of them will refuse to administer the shock? Let’s call these people Allen (A), Brittany (B), Caroline (C), and Damian (D). Each one of the four scenarios below will satisfy the condition of “exactly 1 of them refuses to administer the shock”: Scenario 1: Scenario 2: Scenario 3: Scenario 4:

0.35 (A) refuse

0.65 (A) shock

0.65 (A) shock

0.65 (A) shock

× × × ×

0.65 (B) shock

0.35 (B) refuse

0.65 (B) shock

0.65 (B) shock

× × × ×

0.65 (C) shock

0.65 (C) shock

0.35 (C) refuse

0.65 (C) shock

× × × ×

0.65 (D) shock

0.65 (D) shock

0.65 (D) shock

0.35 (D) refuse

= 0.0961 = 0.0961 = 0.0961 = 0.0961

The probability of exactly one 1 of 4 people refusing to administer the shock is the sum of all of these probabilities.

0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844 Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

12 / 31

Binomial distribution

The binomial distribution

Binomial distribution The question from the prior slide asked for the probability of given number of successes, k, in a given number of trials, n, (k = 1 success in n = 4 trials), and we calculated this probability as

# of scenarios × P(single scenario) # of scenarios: there is a less tedious way to figure this out, we’ll get to that shortly...

P(single scenario) = pk (1 − p)(n−k) probability of success to the power of number of successes, probability of failure to the power of number of failures

The Binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

13 / 31

Binomial distribution

The binomial distribution

Counting the # of scenarios Earlier we wrote out all possible scenarios that fit the condition of exactly one person refusing to administer the shock. If n was larger and/or k was different than 1, writing out the scenarios would get even more tedious. For example, what if n = 9 and k = 2: RRSSSSSSS SRRSSSSSS SSRRSSSSS

··· SSRSSRSSS

··· SSSSSSSRR Writing out all possible scenarios is incredibly tedious and prone to errors. Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

14 / 31

Binomial distribution

The binomial distribution

Calculating the # of scenarios Choose function The choose function is useful for calculating the number of ways to choose k successes in n trials.

! n n! = k k!(n − k)!

k = 1, n = 4:

4

k = 2, n = 9:

9

1 2

= =

4! 1!(4−1)! 9! 2!(9−1)!

= =

4×3×2×1 1×(3×2×1) = 9×8×7! 72 2×1×7! = 2

4 = 36

Note: You can also use R for these calculations:

> choose(9,2) [1] 36 Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

15 / 31

Binomial distribution

The binomial distribution

Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:

! n =n 1

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

16 / 31

Binomial distribution

The binomial distribution

Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:

! n =n 1 If k = n, all n trials result in a success, and there’s only one way this can happen:

! n =1 n

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

16 / 31

Binomial distribution

The binomial distribution

Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:

! n =n 1 If k = n, all n trials result in a success, and there’s only one way this can happen:

! n =1 n If k = 0, all n trials result in a failure, and there’s only one way this can happen as well:

! n =1 0 Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

16 / 31

Binomial distribution

The binomial distribution

Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes

P(k successes in n trials) =

! n k p (1 − p)(n−k) k

We can use the binomial distribution to calculate the probability of k successes in n trials, as long as

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

17 / 31

Binomial distribution

The binomial distribution

Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes

P(k successes in n trials) =

! n k p (1 − p)(n−k) k

We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1

the trials are independent

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

17 / 31

Binomial distribution

The binomial distribution

Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes

P(k successes in n trials) =

! n k p (1 − p)(n−k) k

We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1

the trials are independent

2

the number of trials, n, is fixed

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

17 / 31

Binomial distribution

The binomial distribution

Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes

P(k successes in n trials) =

! n k p (1 − p)(n−k) k

We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1

the trials are independent

2

the number of trials, n, is fixed

3

each trial outcome can be classified as a success or a failure

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

17 / 31

Binomial distribution

The binomial distribution

Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes

P(k successes in n trials) =

! n k p (1 − p)(n−k) k

We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1

the trials are independent

2

the number of trials, n, is fixed

3

each trial outcome can be classified as a success or a failure

4

the probability of success, p, is the same for each trial

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

17 / 31

Binomial distribution

The binomial distribution

Clicker question A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 10 Americans what is the probability that exactly 8 would vote for Obama over Romney? (a) pretty low (b) pretty high

http:// www.gallup.com/ poll/ election.aspx , February 6, 2012.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

18 / 31

Binomial distribution

The binomial distribution

Clicker question A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 10 Americans what is the probability that exactly 8 would vote for Obama over Romney? (a) 0.488 × 0.522

8 × 0.488 × 0.522 10  8 2 (c) 10 8 × 0.48 × 0.52 10 (d) 8 × 0.482 × 0.528 (b)

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

19 / 31

Binomial distribution

The binomial distribution

Expected value A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 100 people, how many would you expect to vote for Obama? Easy enough, 100 × 0.48 = 48. Or more formally, µ = np = 100 × 0.48 = 48. But this doesn’t mean in every random sample of 100 people exactly 48 will vote for Obama. In some samples this value will be less, and in others more. How much would we expect this value to vary?

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

20 / 31

Binomial distribution

The binomial distribution

Expected value and its variability Mean and standard deviation of binomial distribution

µ = np

σ=

p np(1 − p)

Going back to the voters:

σ=

√ p np(1 − p) = 100 × 0.48 × 0.52 ≈ 5

We would expect 48 out of 100 randomly sampled voters, give or take 5. Note: Mean and standard deviation of a Binomial might not always be whole numbers, and that is alright, these values represent what we would expect to see on average.

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

21 / 31

Binomial distribution

The binomial distribution

Unusual observations Using the notion that observations that are more than 2 standard deviations away from the mean are considered unusual and the mean and the standard deviation we just computed, we can calculate a range for how many people we should expect to find in random samples of 100 that are planning to vote for Obama in the 2012 presidential election.

48 ± 2 × 5 = (38, 58)

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

22 / 31

Binomial distribution

The binomial distribution

Clicker question A January 2012 Gallup report suggests that 7.5% of 18-29 year old Americans have been diagnosed with high blood pressure. Would a random sample of 1,000 adults in this age range where only 60 of them have high blood pressure be considered unusual? (a) Yes

(b) No

http:// www.gallup.com/ poll/ 152108/ Key-Chronic-Diseases-Decline.aspx

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

23 / 31

Binomial distribution

Normal approximation to the binomial

A recent study found that “Facebook users get more than they give”. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends’ content an average of 14 times, but had their content “liked” an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained?

http:// www.pewinternet.org/ Reports/ 2012/ Facebook-users/ Summary.aspx

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

24 / 31

Binomial distribution

Normal approximation to the binomial

This study also found that approximately %25 of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(K ≥ 70).

P(X ≥ 70) = P(K = 70 or K = 71 or K = 72 or · · · or K = 245) = P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245) This seems like an awful lot of work...

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

25 / 31

Binomial distribution

Normal approximation to the binomial

Normal approximation to the binomial When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with p parameters µ = np and σ = np(1 − p). In the case of the Facebook power users, n = 245 and p = 0.25.

µ = 245 × 0.25 = 61.25

σ=

√ 245 × 0.25 × 0.75 = 6.78

Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78). 0.06 Bin(245,0.25) N(61.5,6.78)

0.05 0.04 0.03 0.02 0.01 0.00 20

40

60

80

100

k

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

26 / 31

Binomial distribution

Normal approximation to the binomial

Correction for the normal approximation P(K ≥ 70) = P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245) = P(X > 70 − 0.5) 0.06 Bin(245,0.25) N(61.5,6.78)

0.05 0.04

We apply a 0.5 correction in order to account for the probability of exactly 70 “successes”.

0.03 0.02 0.01 0.00 20

40

Statistics 101 (Mine C ¸ etinkaya-Rundel)

60 k

80

100

L7: Geo & Binom distributions

February 7, 2012

27 / 31

Binomial distribution

Normal approximation to the binomial

Clicker question What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0984

(c) 0.8888

(b) 0.1112

(d) 0.9016

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

28 / 31

Binomial distribution

Normal approximation to the binomial

Histograms of number of successes Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300.

0

2

4

6

0

2

n = 10

0

5

10

4

6

8

10

n = 30

15

20

10

20

n = 100

30

40

50

n = 300

Try it yourself at http:// www.socr.ucla.edu/ htmls/ SOCR Distributions.html . Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

29 / 31

Binomial distribution

Normal approximation to the binomial

Low large is large enough? The sample size is considered large enough if the expected number of successes and failures are both at least 10.

np ≥ 10

Statistics 101 (Mine C ¸ etinkaya-Rundel)

and

n(1 − p) ≥ 10

L7: Geo & Binom distributions

February 7, 2012

30 / 31

Binomial distribution

Normal approximation to the binomial

Clicker question Below are four pairs of Binomial distribution parameters. Which distribution can be approximated by the normal distribution? There are two correct answers. (a) n = 100, p = 0.8

(c) n = 150, p = 0.05

(b) n = 25, p = 0.6

(d) n = 500, p = 0.015

Statistics 101 (Mine C ¸ etinkaya-Rundel)

L7: Geo & Binom distributions

February 7, 2012

31 / 31