Conditional Probability

Conditional Probability STA 281 Fall 2011 1 Definition Often we are only interested in particular rows or columns of a probability table. Consider the...
Author: Ambrose Haynes
73 downloads 0 Views 470KB Size
Conditional Probability STA 281 Fall 2011 1 Definition Often we are only interested in particular rows or columns of a probability table. Consider the newspaper example, and the question “Of those that receive the morning paper, what proportion receive the evening paper?” This question does not concern the entire population of households; it only concerns those who receive a morning paper. Probabilities that refer only to subsets of the population are called conditional probabilities. Recall the probability table we constructed 0.10 0.20 0.30 0.50 0.20 0.70 0.60 0.40 1.00 The question asked concerns only those who receive a morning paper, which is 60% of the entire population. We want to know, out of that 60%, what proportion receive the evening paper. To provide an intuitive fell for how this question is answered, suppose we sampled 100 people from the population. On average 60 of those people would receive the morning paper. Looking at the M column of the table, we see that of those 60, on average 50 receive only the morning paper while 10 receive both. So 10 of the 60 who receive the morning paper also receive the evening paper, so the conditional probability is 10/60=1/6. Usually we don’t go through the argument concerning sampling a set of people and just divide the probabilities directly. There are 60% of the people who receive a morning paper, with 50% receiving only the morning paper and 10% receiving the evening paper. So 0.10/0.60=1/6 is the conditional probability of receiving an evening paper given one receives a morning paper. Mathematically, a conditional probability has two parts: First, a conditional probability only asks about a subset of the population, not the entire population. Second, a conditional probability asks some property of that subset. In our example question, the subset of interest was those who receive the morning paper, while the property we are interested in was receiving an evening paper. In general, we have a question: “of those who are in subset A, what is the probability they are in B.” This question is translated into mathematical symbols , which is read “the probability of B given A.” Notice how we solved the problem. First, we found which of the individuals were in the subset of interest. This involved finding , the unconditional probability of the subset. Then, within that subset, we found how many individuals had the property we were interested in. The result was

Instead of writing this fraction in words, we can use symbols. For the denominator, the “people in subset A” refers to . For the numerator, the people must be in subset A, but they must also have property B. Since both criteria must be satisfied, the numerator is , resulting in

Mathematically, this formula is the definition of conditional probability. immediately implies what is called the intersection rule 1

Rearranging the terms

Similarly, since just switching the roles of A and B in the definition of conditional probability yields , we find

Since is the same as , the two previous equations provide two ways of finding the probability of an intersection. Simply use whichever conditional probability is more convenient.

2 Recognizing Conditional Probabilities Remember the key point about conditional probability is that we are only interested in a subset of the population. The first step in evaluating a conditional probability is determining which outcomes we are interested in. In our example we were interested in people who received a morning paper. The second step is to identify the property of interest (in our example it was receiving an evening paper). After that you use the definition of conditional probability . Whenever you see a probability stated or are asked for a probability, ask yourself two questions. First, “Who is this statement about?” In a conditional probability we are only interested in a subset of the population. It is important to determine which subset as soon as possible so we can proceed. Second, “What are they asking about?” That is, what property of the subset is the question or statement about. For example, suppose our population is all registered voters. Contrast the two questions: “What proportion of women are Democrats?” and “What proportion of voters are women Democrats?” The first statement does not ask about all registered voters, it only asks about women. Therefore it is a conditional probability. After deciding it only asks about women, we must then determine what exactly it wants to know about women. In this example, the property of interest is being Democratic. If W is the event “voter is female” and D is the event “voter is a Democrat”, the first question asks for . The second question does not place any restriction on the population since it asks for the proportion of voters. The property it is interested in is whether a voter is a Democratic woman. Thus the second question is asking for . These are separate questions, so you must recognize which one you are being asked. Of course, in any language there are multiple ways to ask the same question. The following questions are equivalent, all ask for the probability of receiving an evening paper in the set of outcomes where a morning paper was received. 

What proportion of those that receive a morning paper receive an evening paper?



Given a person receives a morning paper, what is the probability they receive an evening paper?



If someone receives a morning paper, what is the probability they receive an evening paper?

There are a number of ways to report the result P(A|M)=1/6 as well. 

1/6 of those that receive a morning paper receive an evening paper.



Given someone receives a morning paper, there is a 1/6 probability of receiving an evening paper.



If someone receives a morning paper there is a 1/6 probability of receiving an evening paper. 2

The probability table allows us to compute conditional probabilities fairly easily, since we are just involved with one row or column of the table. More complicated conditional probabilities may be computed as well. For example, given that someone receives at least one of the papers, what is the probability they receive at most one of the papers? This is a conditional probability since we are interested only in those who receive at least one paper, not everyone. This subset of individuals includes 3 cells of our table, for a total of 80% of the population. Within that subset, we are interested in the property “receive at most one of the papers”. As with all conditional probabilities, we look at the individuals within the subset and try to determine which obey the property. Just looking at the cells which compose the subset, we find only the cells and satisfy the property. Those two boxes total 70% of the overall probability. The conditional probability is thus 0.70/0.80=7/8.

3 Using Conditional Probabilities The definition of conditional probability is that . We derive previously the intersection rule . This formula allows us to take conditional probabilities as given information and complete a probability table. Suppose in a small university, students work in either the dorms or library. 70% of the students do some work in the library while 10% work only in the dorms. Of those who work in the library, 30% also work in the dorms. Use this information to construct a probability table. Students may either work in the dorms or the library. Since they may work or not work in the library and work or not work in dorms, we may construct a probability table.

1.00 We are given the information that 70% of the students work in the library. Since this information was given without any reference to whether or not those students work in the dorms, it is therefore placed in the margins of the table. We are also given that 10% work only in the dorms. This is one of the 4 core cells of the table. Before working with the conditional probability, we may use some arithmetic to fill in some of the table. 0.10 0.20 0.30

0.70

1.00

To complete the table, we must use the conditional probability given in the problem. We are given the information that of those who work in the library, 30% work in the dorms. This is a conditional probability, not the proportion that work in both. We are conditioning on people who work in the library (70% of the students). We are given that 30% of those 70% also work in the dorm. 30% of 70% may be found by multiplying the probabilities, so (0.30)(0.70)=0.21 work in the dorm and the library. Mathematically, we have been given =0.30, and we have found We may complete the table. Since we have computed the probabilities of all outcomes, we may compute any probabilities from the table. 3



What is the probability a student works only in the library? (0.49)



What is the probability a student works in the dorms? (0.31)



What is the probability a student works in either the dorms or in the library? (0.21+0.49+0.10=0.80)



What is the probability a student works in neither the library nor the dorms? (0.20)



What is the probability a student that does not work in the library works in the dorms? (0.10/0.30=1/3)

4 More difficult problems In the dorm and library example, the conditional probability provided allowed the direct computation of a cell probability. Problems involving conditional probabilities may be more difficult. 4.1

Applicants Example

Suppose a company is looking at applicants for a position. The position requires (A) experience and (B) a master’s degree. Suppose that 90% of the applicants have at least one of (A) or (B). Suppose further that, of those applicants with at least one of (A) or (B), 50% have both. Of those applicants with exactly one of (A) or (B), 2/3 have (A). We can construct a probability table and fill in one of the cells directly. Since 90% of the applicants have at least one of (A) or (B), we may determine by the complement rule that 10% have neither. This results in the table

0.10 1.00 To fill in the remainder of the table, we have to use conditional probabilities. We are given that 50% of the people with at least one have both. This probability involves the subset “with at least one”. This subset contains 90% of the applicants. Of this 90%, we are given 50% have both. Thus 50% of the 90%, or 45%, have both. The remaining cells must be found by utilizing that 2/3 of the applicants with exactly one of (A) or (B) have (A). Although we are not given the proportion of individuals with exactly one directly, we can derive it to be 1-0.45-0.10=0.45. Since 2/3 of those 45% have (A), we conclude and complete the table through arithmetic. 4.2

Professors and their Stories Example

Here is another example. Suppose that 10% of professors can think of decent stories to go with their exam problems. Of those that can, 60% can think of funny stories. Suppose also that, of those professors whose stories are at least one of funny/decent, 75% of the professors write funny stories. The first two probabilities allow us to complete part of the table. 0.06 0.04 0.10

0.90 4

1.00

To fill in the remaining entries, we must use the information that “of those professors whose stories are at least one of funny/decent, 75% of the professors write funny stories”. Unfortunately, with the table so far we cannot compute the probability of being at least one of funny or decent, because we are missing the cell . Let’s just fill in this unknown quantity with x and see if we can make progress in solving for x. In terms of the unknown x, the proportion of professors whose stories are at least one of funny or decent is 0.06+0.04+x=0.10+x. We are given

Solving for x, we find x=0.06, so we may complete the table. As with all probability tables, once the table is completed we may compute any probability. 

What proportion of professors write both decent and funny stories? (0.06)



Of the professors with funny stories, what proportion write decent stories? (0.06/0.12=0.5)



Of those professors whose stories are exactly one of funny or decent, what proportion write funny stories? (0.06/0.10=0.6)

5 Independence If is different than , then A and B are related. Return to the newspaper example. We find =1/6 while =1/2. People who receive the morning paper are less likely to receive the evening paper than people who do not receive the morning paper. Since the probability the event E occurs depends on whether or not M occurred, we call these events dependent. There are many events of this form. Any events with a causal relationship, for example, will be of this form. Individuals with a Ph.D. are much more likely to get a faculty position at a research university than individuals who do not have a Ph.D., the simple reason being a Ph.D. is a requirement for such a faculty position. Indirect links between variables also create dependence. An elementary school student with large feet is more likely to read well. Foot size and reading ability have no causal link, but children with big feet tend to be older, and older children tend to read better than younger children. All of these types of relationships between variables result in dependent events. Much of science is concerned with dependence. In medicine, for example, we would like to know whether administering a treatment to a patient increases the patient’s chance of recovery. Occasionally two variables are unrelated. A coin has no memory. If you flip a fair coin twice, the result of the second flip has nothing to do with the result of the first flip. The probability of heads is always 0.5. If is the event “heads on the first flip” and H2 is the event “heads on the second flip”, then =0.5 and =0.5. They are the same probability since the result of the first flip does not affect the probability of heads on the second flip. Events A and B where are called independent events. Essentially, whether or not B occurred does not affect the probability A occurred. The purpose of this section is to derive some special properties of independent events. With the information , we can derive some equivalent definitions of independence. Recall the Law of Total Probability

Using the rule for intersections

5

Now using the assumed information

[

]

This equivalent definition of independence states that the overall probability of A, P(A), is the same as the conditional probability of A given B, P(A|B). If you know the overall probability of A, knowing whether or not B occurred does not change that probability. Another equivalent definition may be found by using the rule for intersections

This third definition is the one used for mathematical purposes. The reason is that the conditional probability is a fraction, and thus there is the possibility of dividing by zero. The equation above presents no such problem. If P(A) and P(B) are both greater than 0, then all three definitions are equivalent. However, if P(A)=0 or P(B)=0, then the first two definitions might result in division by zero problems. To verify independence in this course, you must show the third definition, that . For example, reconsider the newspaper example. independent?” you should check

If you were asked “Are events M and E

Since the equality required for independence does not hold, the events are dependent, not independent. Alternatively, suppose a probability table has 0.08 0.32 0.40

0.12 0.48 0.60

0.20 0.80 1.00

In this example,

and thus the events are independent.

6 The Relationship between Disjoint and Independent The short answer is that there is no relationship between two disjoint events and two independent events. They are separate definitions, and they are useful in different scenarios. Disjoint events are defined as events A and B such that . Disjoint events are useful in that they simplify the rule for unions. In general, the rule for unions states

If A and B are disjoint, then

and thus the union rule simplifies to the third axiom 6

Remember, you need the disjoint assumption to make the simplification (technically you only need , which you can derive from the disjoint assumption). Independence is defined as intersections. In general,

, and thus results in a simplification of the rule for

If A and B are independent, this simplifies to

Remember you have to have the independence assumption to make this simplification. Independent events are sometimes disjoint and sometimes not, while disjoint events are sometimes independent and sometimes not. You have to know your purpose and check the appropriate definition.

7 Bayes Rule A conditional probability concerns a subset of the population. Within that population, all the axioms of probability apply. For example, all conditional probabilities have to be nonnegative, just like axiom 2. Theorems such as the complement rule also still apply. Thus, we can prove theorems like

Suppose a university is interested in graduation rates between students who have off-campus jobs and students who do not. Let G be the event “graduates in 6 years or less” and W be the event “works offcampus”. If you are given the information that 60% of the students with off-campus jobs graduate in 6 years, then P(G|W)=0.60. You can also conclude that 40% of the students with off-campus jobs do NOT graduate. In symbols, you can conclude P(GC|W)=0.40. In both conditional probabilities, you are conditioning on the same group of people, students with off-campus jobs. If 60% of those students graduate, the other 40% of those students do not. Unfortunately, many people make the mistake of doing calculations based on different populations, such as concluding, incorrectly, that P(A|B)=1-P(A|BC). In our example, this would be the same as concluding that since 60% of the students with off-campus jobs graduate, then only 40% of the students without off-campus jobs graduate. But this conclusion is unwarranted. Just because 60% of the students with off-campus jobs graduate doesn’t say anything about the students without off-campus jobs. They are separate groups of people, and can have separate, unrelated probabilities. It’s possible all students without off-campus jobs graduate. It is also possible none of them do, or anywhere in between. Another common mistake is to assume P(A|B)=1-P(B|A). In our example, this would be the same as using the information that 60% of students with off-campus jobs graduate to conclude that 40% of students who graduate have off-campus jobs. This conclusion is unwarranted as well. To make this more obvious, let F be the event a person is female and P be the event the person is pregant. Suppose at any given time that 2% of women are pregnant. This says P(P|F)=0.02. You cannot then conclude that 98% of pregnant people are female. There actually is a relationship between P(A|B) and P(B|A), but it is more complicated. Note by definition

7

By the law of total probability

, so

We can then use the rule for intersections repeatedly to show

This last equation is a simplified version of Bayes rule, which is vital in many scientific applications. Typically the event A is that some hypothesis is true, and thus AC is the event the hypothesis is false. The event B corresponds to some piece of data being observed. By comparing the relative likelihoods A and AC assign to B (i.e., the relative probability of observing the data when the hypothesis is true P(B|A) versus observing the data when the hypothesis is false P(B|AC) you can compute how observing the data B affects your belief in the hypothesis A. Note that Bayes rule includes the probabilities P(A) and P(AC). This is useful in situations such as medical diagnostic testing, where it is worthwhile to incorporate relative rates of disease into the calculations. If an exotic, rare disease produces a particular symptom but a common disease also produces the symptom, when faced with the symptom the more likely outcome is that the common disease is present. In this course, Bayes rule is typically not computed symbolically, but arises naturally as you fill in a probability table. Conditional probabilities allow you to compute cells of the table, and after the table is completed you can compute whatever probability you wish.

8 Sample Questions The following questions are divided into three levels of difficulty (all relative). The “easiest” problems contain no conditional probabilities in the given information, but do ask you to compute conditional probabilities from the table. The “medium” problems have conditional probabilities in the given information as well, so you must use the conditional probabilities to construct the table. The “hardest” problems often require some type of algebra to construct the probability table. 1) (Easiest) In a study of 3756 court cases, Kalven and Zeisell (1966) recorded the jury panel’s decision. In addition, they separately asked the judge how he or she would have decided the same case if there were no jury panel. They found that: the judge would not have convicted in 17% of the cases; both the judge and the jury would have convicted in 64% of the cases; and the judge and jury disagree on whether to convict in 22% of the cases a) Construct a probability table. b) What is the probability that neither the judge nor the jury would convict? c) Given the panel convicts, what is the probability the judge would also convict? d) Are the events “judge convicts” and “jury panel convicts” disjoint? Why or why not?

8

2) (Easiest) Suppose in a particular company, employees use workstations or PCs (could be both or neither). The probability an employee uses a PC is 0.95 and the probability an employee uses a workstation is 0.20. The probability an employee uses both is 0.19. a) Construct a probability table. b) What is the probability an employee who does not use a PC uses a workstation? c) What is the probability an employee uses exactly one of the machines? d) Given an employee uses at least one of the machines, what is the probability they use a workstation? 3) (Easiest) A muscle cell has 2 sites where electricity can conduct into the cell. Every time the body intends to stimulate the muscle cell, an attempt is made to channel electricity through each of the 2 sites. The probability both sites conduct electricity is 0.4. The probability that exactly one site conducts electricity is 0.4. Finally, the probability site 1 conducts electricity is 0.7. a) Construct a probability table. b) What is the probability site 2 conducts electricity? c) Given at least one of the sites conducts electricity, what is the probability both sites conduct electricity? 4) (Medium) Suppose a professor only writes two types of exams, “easy” or “hard”. Suppose that 90% of the exams are hard. There is an 80% chance that the first question on a hard exam will be difficult, and a 15% chance that the first question on an easy exam will be difficult. a) Construct a probability table. b) On a given exam, what is the probability it is a hard exam that contains a non-difficult first question? c) On a given exam, what is the probability that the first question will be non-difficult? d) Suppose the first question on a given exam is non-difficult. Given this information, what is the probability the exam is hard? e) Are the events “easy exam” and “difficult first question” independent? Why? 5) (Medium) Suppose at a particular park visitors can hike or raft. The probability that someone will raft is 0.40. The probability that someone won’t hike is 0.10. Given someone hikes, the probability they raft is 0.40. a) Construct a probability table. b) What is the probability of rafting? c) What is the probability of not hiking? d) What is the probability of someone participating in exactly one? e) What is the probability of someone participating in at least one? f)

Of those who do not participate in both, what proportion participate in neither? 9

6) (Medium) Suppose that 10% of all cases go to trial. Of those that go to trial, the defendant is found guilty in 95% of the cases. Of those that do not go to trial, the defendant is found guilty (through a plea bargain) in 40% of the cases. a) Construct a probability table. b) What proportion of defendants are found guilty? c) Given a defendant is found guilty, what is the probability their case went to trial? d) What proportion of cases that go to trial do not result in a guilty plea? 7) (Hardest) Investment advisors might subscribe to the Wall Street Journal or Investor’s Business Daily. Suppose 80% subscribe to the WSJ. Of those that subscribe to the WSJ, 75% also subscribe to IBD. Also, 20% of those that subscribe to exactly one of the papers subscribe to IBD. a) Construct a probability table. b) What is the probability an investment advisor subscribe to neither paper? c) What proportion of investment advisors subscribe to IBD? d) Given an investment advisor receives IBD, what is the probability they also receive the WSJ? 8) (Hardest) Suppose there are two restaurants in a small town, Abelard’s Attic and Baltazar’s Buffet. Suppose further that 20% of the people in the town dine at neither restaurant. Of those that go to at least one of the restaurants, 75% dine at both. Of those that dine at exactly one of the restaurants, 75% dine at Abelard’s. a) Construct a probability table. b) What proportion dine at Baltazar’s? c) Of those that dine at Baltazar’s, what proportion dine at Abelard’s? 9) (Hardest) Let A and B be events. Suppose that P(A)=0.3 and P(Bc|A)=0.2. Suppose further that given exactly one of the two events occurs, 40% of the time it is B that occurred. a) Construct a probability table. b) Given that at least one of the events occurred, what is the probability B occurred? c) What is P(B|A)? 10) (Hardest) Let A and B be events. Suppose . Suppose also that, conditional on exactly one of the events occurring, the probability A occurs is 0.8. Finally, suppose that the probability neither event occurs given B did not occur is 1/9. a) You should know what to do for part (a) by now. b) What is the probability that at least one of the events occurs? c) Given that at most one of the events occur, what is the probability A occurs?

10

9 Solutions for Sample Problems 1) a) P Pc

J 0.64 0.19 0.83

Jc 0.03 0.14 0.17

0.67 0.33 1.00

b) 0.14 c) 0.64/0.67=0.9552 d) They are not disjoint, they occur together with probability 0.64. 2) a) WS WSc

PC 0.19 0.76 0.95

PCc 0.01 0.04 0.05

0.20 0.80 1.00

b) 0.01/0.05=0.20 c) 0.76+0.01=0.77 d) 0.20/(0.19+0.01+0.76)=0.2083 3) a) Site 1 conducts 0.40 0.30 0.70

Site 2 conducts Site 2 doesn’t conduct

Site 1 doesn’t conduct 0.10 0.20 0.30

0.50 0.50 1.00

b) 0.50 c) 0.40/0.80=0.5 4) Let E be the event an exam is easy. Let D be the event the first question is difficult. a) D Dc

E 0.015 0.085 0.100

Ec 0.720 0.735 0.180 0.265 0.900 1.00

b) 0.180 c) 0.265 d) 0.180/0.265=0.6792 11

e) The events are dependent, since

5) a) R Rc

H 0.36 0.54 0.90

Hc 0.04 0.06 0.10

0.40 0.60 1.00

b) 0.40 c) 0.10 d) 0.54+0.04=0.58 e) 0.36+0.54+0.04=0.94 f)

0.06/0.64=0.0938

6) Let T be the event “go to trial” and G be the event “found guilty”. a) G Gc

T 0.095 0.005 0.100

Tc 0.360 0.455 0.540 0.545 0.900 1.00

b) 0.455 c) 0.095/0.455=0.2088 d) 0.005/0.100=0.05 7) a) IBD IBDc

WSJ 0.60 0.20 0.80

WSJc 0.05 0.65 0.15 0.35 0.20 1.00

b) 0.15 c) 0.65 d) 0.60/0.65=0.9231

12

8) a) B Bc

A 0.60 0.15 0.75

Ac 0.05 0.20 0.25

0.65 0.35 1.00

B Bc

A 0.24 0.06 0.30

Ac 0.04 0.66 0.70

0.28 0.72 1.00

B Bc

A 0.45 0.40 0.85

Ac 0.10 0.05 0.15

0.55 0.45 1.00

b) 0.65 c) 0.60/0.65=0.9231 9) a)

b) 0.28/0.34=0.8235 c) 0.24/0.30=0.8

10) a)

b) 0.95 c) 0.40/0.55=0.7273

13