Lecture 7: Geometric & Binomial distributions Statistics 101 Mine C ¸ etinkaya-Rundel
February 7, 2012
Announcements
Announcements Due: HW 2 at the beginning of class on Thursday: Clarification on Exercise 3.4 parts (c) and (d): Mary and Leo’s percentile mean the proportion of people whose finishing times are lower than theirs. Note that this doesn’t mean ”% of people they performed better than” because in a triathlon ”performing better” means finishing faster.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
1 / 31
Recap
Qnline quiz 2- commonly missed questions Question 3: Q3: More than three-quarters of the nation’s colleges and universities now offer online classes, and about 23% of college graduates have taken a course online. 39% of those who have taken a course online believe that online courses provide the same educational value as one taken in person, a view shared by only 27% of those who have not taken an online course. At a coffee shop you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before? took online course valuable not valuable total
Statistics 101 (Mine C ¸ etinkaya-Rundel)
didn’t take online course
0.23
total
1
L7: Geo & Binom distributions
February 7, 2012
2 / 31
Recap
Qnline quiz 2- Q3 Review question Which is the correct notation for the following probability? “At a coffee shop you overhear a recent college graduate discussing that she doesn’t believe that online courses provide the same educational value as one taken in person. What’s the probability that she has taken an online course before?”
(a) P(took online course | not valuable) (b) P(not valuable | took online course) (c) P(valuable and took online course) (d) P(took online course and not valuable) (e) P(valuable | didn’t take online course)
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
3 / 31
Geometric distribution
1
Geometric distribution Bernoulli distribution Geometric distribution
2
Binomial distribution
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
Geometric distribution
Bernoulli distribution
Milgram experiment Stanley Milgram, a Yale University psychologist, conducted a series of experiments on obedience to authority starting in 1963. Experimenter (E) orders the teacher (T), the subject of the experiment, to give severe electric shocks to a learner (L) each time the learner answers a question incorrectly. The learner is actually an actor, and the electric shocks are not real, but a prerecorded sound is played each time the teacher administers an electric shock. Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
4 / 31
Geometric distribution
Bernoulli distribution
Milgram experiment (cont.) These experiments measured the willingness of study participants to obey an authority figure who instructed them to perform acts that conflicted with their personal conscience. Milgram found that about 65% of people would obey authority and give such shocks. Over the years, additional research suggested this number is approximately consistent across communities and time.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
5 / 31
Geometric distribution
Bernoulli distribution
Bernouilli random variables Each person in Milgram’s experiment can be thought of as a trial. A person is labeled a success if she refuses to administer a severe shock, and failure if she administers such shock. Since only 35% of people refused to administer a shock, probability of success is p = 0.35. When an individual trial has only two possible outcomes, it is called a Bernoulli random variable.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
6 / 31
Geometric distribution
Geometric distribution
Geometric distribution Dr. Smith wants to repeat Milgram’s experiments but she only wants to sample people until she finds someone who will not inflict a severe shock. What is the probability that she stops after the first person?
P(1st person refuses) = 0.35 ... the third person?
P(1st and 2nd shock, 3rd refuses) =
S S R × × = 0.652 ×0.35 ≈ 0.15 0.65 0.65 0.35
... the tenth person?
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
7 / 31
Geometric distribution
Geometric distribution
Geometric distribution (cont.) Geometric distribution describes the waiting time until a success for independent and identically distributed (iid) Bernouilli random variables. independence: outcomes of trials don’t affect each other identical: the probability of success is the same for each trial
Geometric probabilities If p represents probability of success, (1 − p) represents probability of failure, and n represents number of independent trials
P(success on the nth trial) = (1 − p)n−1 p
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
8 / 31
Geometric distribution
Geometric distribution
Clicker question Can we calculate the probability of rolling a 6 for the first time on the 6th roll of a die using the geometric distribution? (a) no, on the roll of a die there are more than 2 possible outcomes (b) yes, why not
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
9 / 31
Geometric distribution
Geometric distribution
Expected value How many people is Dr. Smith expected to test before finding the first one that refuses to administer the shock? The expected value, or the mean, of a geometric distribution is defined as 1p .
µ=
1 1 = = 2.86 p 0.35
She is expected to test 2.86 people before finding the first one that refuses to administer the shock. But how can she test a non-whole number of people?
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
10 / 31
Geometric distribution
Geometric distribution
Expected value and its variability Mean and standard deviation of geometric distribution
s
1 µ= p
σ=
1−p p2
Going back to Dr. Smith’s experiment:
s σ=
1−p = p2
r
1 − 0.35 = 2.3 0.352
Dr. Smith is expected to test 2.86 people before finding the first one that refuses to administer the shock, give or take 2.3 people. These values only makes sense in the context of repeating the experiment many many times. Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
11 / 31
Binomial distribution
1
Geometric distribution
2
Binomial distribution The binomial distribution Normal approximation to the binomial
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
Binomial distribution
Suppose we randomly select four individuals to participate in this experiment. What is the probability that exactly 1 of them will refuse to administer the shock? Let’s call these people Allen (A), Brittany (B), Caroline (C), and Damian (D). Each one of the four scenarios below will satisfy the condition of “exactly 1 of them refuses to administer the shock”: Scenario 1: Scenario 2: Scenario 3: Scenario 4:
0.35 (A) refuse
0.65 (A) shock
0.65 (A) shock
0.65 (A) shock
× × × ×
0.65 (B) shock
0.35 (B) refuse
0.65 (B) shock
0.65 (B) shock
× × × ×
0.65 (C) shock
0.65 (C) shock
0.35 (C) refuse
0.65 (C) shock
× × × ×
0.65 (D) shock
0.65 (D) shock
0.65 (D) shock
0.35 (D) refuse
= 0.0961 = 0.0961 = 0.0961 = 0.0961
The probability of exactly one 1 of 4 people refusing to administer the shock is the sum of all of these probabilities.
0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844 Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
12 / 31
Binomial distribution
The binomial distribution
Binomial distribution The question from the prior slide asked for the probability of given number of successes, k, in a given number of trials, n, (k = 1 success in n = 4 trials), and we calculated this probability as
# of scenarios × P(single scenario) # of scenarios: there is a less tedious way to figure this out, we’ll get to that shortly...
P(single scenario) = pk (1 − p)(n−k) probability of success to the power of number of successes, probability of failure to the power of number of failures
The Binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
13 / 31
Binomial distribution
The binomial distribution
Counting the # of scenarios Earlier we wrote out all possible scenarios that fit the condition of exactly one person refusing to administer the shock. If n was larger and/or k was different than 1, writing out the scenarios would get even more tedious. For example, what if n = 9 and k = 2: RRSSSSSSS SRRSSSSSS SSRRSSSSS
··· SSRSSRSSS
··· SSSSSSSRR Writing out all possible scenarios is incredibly tedious and prone to errors. Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
14 / 31
Binomial distribution
The binomial distribution
Calculating the # of scenarios Choose function The choose function is useful for calculating the number of ways to choose k successes in n trials.
! n n! = k k!(n − k)!
k = 1, n = 4:
4
k = 2, n = 9:
9
1 2
= =
4! 1!(4−1)! 9! 2!(9−1)!
= =
4×3×2×1 1×(3×2×1) = 9×8×7! 72 2×1×7! = 2
4 = 36
Note: You can also use R for these calculations:
> choose(9,2) [1] 36 Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
15 / 31
Binomial distribution
The binomial distribution
Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:
! n =n 1
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
16 / 31
Binomial distribution
The binomial distribution
Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:
! n =n 1 If k = n, all n trials result in a success, and there’s only one way this can happen:
! n =1 n
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
16 / 31
Binomial distribution
The binomial distribution
Properties of the choose function If k = 1, only 1 of the n trials result in a success, it could be the first, the second, · · · , or the nth trial, so there are n ways this can happen:
! n =n 1 If k = n, all n trials result in a success, and there’s only one way this can happen:
! n =1 n If k = 0, all n trials result in a failure, and there’s only one way this can happen as well:
! n =1 0 Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
16 / 31
Binomial distribution
The binomial distribution
Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(k successes in n trials) =
! n k p (1 − p)(n−k) k
We can use the binomial distribution to calculate the probability of k successes in n trials, as long as
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
17 / 31
Binomial distribution
The binomial distribution
Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(k successes in n trials) =
! n k p (1 − p)(n−k) k
We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1
the trials are independent
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
17 / 31
Binomial distribution
The binomial distribution
Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(k successes in n trials) =
! n k p (1 − p)(n−k) k
We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1
the trials are independent
2
the number of trials, n, is fixed
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
17 / 31
Binomial distribution
The binomial distribution
Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(k successes in n trials) =
! n k p (1 − p)(n−k) k
We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1
the trials are independent
2
the number of trials, n, is fixed
3
each trial outcome can be classified as a success or a failure
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
17 / 31
Binomial distribution
The binomial distribution
Binomial distribution (cont.) Binomial probabilities If p represents probability of success, (1 − p) represents probability of failure, n represents number of independent trials, and k represents number of successes
P(k successes in n trials) =
! n k p (1 − p)(n−k) k
We can use the binomial distribution to calculate the probability of k successes in n trials, as long as 1
the trials are independent
2
the number of trials, n, is fixed
3
each trial outcome can be classified as a success or a failure
4
the probability of success, p, is the same for each trial
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
17 / 31
Binomial distribution
The binomial distribution
Clicker question A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 10 Americans what is the probability that exactly 8 would vote for Obama over Romney? (a) pretty low (b) pretty high
http:// www.gallup.com/ poll/ election.aspx , February 6, 2012.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
18 / 31
Binomial distribution
The binomial distribution
Clicker question A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 10 Americans what is the probability that exactly 8 would vote for Obama over Romney? (a) 0.488 × 0.522
8 × 0.488 × 0.522 10 8 2 (c) 10 8 × 0.48 × 0.52 10 (d) 8 × 0.482 × 0.528 (b)
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
19 / 31
Binomial distribution
The binomial distribution
Expected value A January 27, 2012 Gallup survey suggests that 48% of Americans would vote for Obama over Romney if the presidential election was held that day. Among a random sample of 100 people, how many would you expect to vote for Obama? Easy enough, 100 × 0.48 = 48. Or more formally, µ = np = 100 × 0.48 = 48. But this doesn’t mean in every random sample of 100 people exactly 48 will vote for Obama. In some samples this value will be less, and in others more. How much would we expect this value to vary?
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
20 / 31
Binomial distribution
The binomial distribution
Expected value and its variability Mean and standard deviation of binomial distribution
µ = np
σ=
p np(1 − p)
Going back to the voters:
σ=
√ p np(1 − p) = 100 × 0.48 × 0.52 ≈ 5
We would expect 48 out of 100 randomly sampled voters, give or take 5. Note: Mean and standard deviation of a Binomial might not always be whole numbers, and that is alright, these values represent what we would expect to see on average.
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
21 / 31
Binomial distribution
The binomial distribution
Unusual observations Using the notion that observations that are more than 2 standard deviations away from the mean are considered unusual and the mean and the standard deviation we just computed, we can calculate a range for how many people we should expect to find in random samples of 100 that are planning to vote for Obama in the 2012 presidential election.
48 ± 2 × 5 = (38, 58)
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
22 / 31
Binomial distribution
The binomial distribution
Clicker question A January 2012 Gallup report suggests that 7.5% of 18-29 year old Americans have been diagnosed with high blood pressure. Would a random sample of 1,000 adults in this age range where only 60 of them have high blood pressure be considered unusual? (a) Yes
(b) No
http:// www.gallup.com/ poll/ 152108/ Key-Chronic-Diseases-Decline.aspx
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
23 / 31
Binomial distribution
Normal approximation to the binomial
A recent study found that “Facebook users get more than they give”. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends’ content an average of 14 times, but had their content “liked” an average of 20 times Users sent 9 personal messages, but received 12 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo Any guesses for how this pattern can be explained?
http:// www.pewinternet.org/ Reports/ 2012/ Facebook-users/ Summary.aspx
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
24 / 31
Binomial distribution
Normal approximation to the binomial
This study also found that approximately %25 of Facebook users are considered power users. The same study found that the average Facebook user has 245 friends. What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? We are given that n = 245, p = 0.25, and we are asked for the probability P(K ≥ 70).
P(X ≥ 70) = P(K = 70 or K = 71 or K = 72 or · · · or K = 245) = P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245) This seems like an awful lot of work...
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
25 / 31
Binomial distribution
Normal approximation to the binomial
Normal approximation to the binomial When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with p parameters µ = np and σ = np(1 − p). In the case of the Facebook power users, n = 245 and p = 0.25.
µ = 245 × 0.25 = 61.25
σ=
√ 245 × 0.25 × 0.75 = 6.78
Bin(n = 245, p = 0.25) ≈ N(µ = 61.25, σ = 6.78). 0.06 Bin(245,0.25) N(61.5,6.78)
0.05 0.04 0.03 0.02 0.01 0.00 20
40
60
80
100
k
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
26 / 31
Binomial distribution
Normal approximation to the binomial
Correction for the normal approximation P(K ≥ 70) = P(K = 70) + P(K = 71) + P(K = 72) + · · · + P(K = 245) = P(X > 70 − 0.5) 0.06 Bin(245,0.25) N(61.5,6.78)
0.05 0.04
We apply a 0.5 correction in order to account for the probability of exactly 70 “successes”.
0.03 0.02 0.01 0.00 20
40
Statistics 101 (Mine C ¸ etinkaya-Rundel)
60 k
80
100
L7: Geo & Binom distributions
February 7, 2012
27 / 31
Binomial distribution
Normal approximation to the binomial
Clicker question What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0984
(c) 0.8888
(b) 0.1112
(d) 0.9016
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
28 / 31
Binomial distribution
Normal approximation to the binomial
Histograms of number of successes Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300.
0
2
4
6
0
2
n = 10
0
5
10
4
6
8
10
n = 30
15
20
10
20
n = 100
30
40
50
n = 300
Try it yourself at http:// www.socr.ucla.edu/ htmls/ SOCR Distributions.html . Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
29 / 31
Binomial distribution
Normal approximation to the binomial
Low large is large enough? The sample size is considered large enough if the expected number of successes and failures are both at least 10.
np ≥ 10
Statistics 101 (Mine C ¸ etinkaya-Rundel)
and
n(1 − p) ≥ 10
L7: Geo & Binom distributions
February 7, 2012
30 / 31
Binomial distribution
Normal approximation to the binomial
Clicker question Below are four pairs of Binomial distribution parameters. Which distribution can be approximated by the normal distribution? There are two correct answers. (a) n = 100, p = 0.8
(c) n = 150, p = 0.05
(b) n = 25, p = 0.6
(d) n = 500, p = 0.015
Statistics 101 (Mine C ¸ etinkaya-Rundel)
L7: Geo & Binom distributions
February 7, 2012
31 / 31