1 Probability. 1.1 Basic Probability Theory A Short History of Probability

1 1.1 1.1.1 Probability Basic Probability Theory A Short History of Probability ABCDFrom Calculus, Volume II by Tom M. Apostol (2nd edition, John Wi...
Author: Calvin White
17 downloads 0 Views 78KB Size
1 1.1 1.1.1

Probability Basic Probability Theory A Short History of Probability

ABCDFrom Calculus, Volume II by Tom M. Apostol (2nd edition, John Wiley & Sons, 1969 ): A gambler’s dispute in 1654 led to the creation of a mathematical theory of probability by two famous French mathematicians, Blaise Pascal and Pierre de Fermat. Antoine Gombaud, Chevalier de M´er´e, a French nobleman with an interest in gaming and gambling questions, called Pascal’s attention to an apparent contradiction concerning a popular dice game. The game consisted in throwing a pair of dice 24 times; the problem was to decide whether or not to bet even money on the occurrence of at least one “double six” during the 24 throws. A seemingly well-established gambling rule led de M´er´e to believe that betting on a double six in 24 throws would be profitable, but his own calculations indicated just the opposite. (Chevalier de M´er´e gambled frequently to increase his wealth. He bet on a roll of a die that at least one 6 would appear during a total of four rolls. From past experience, he knew that he was more successful than not with this game of chance. Tired of his approach, he decided to change the game. He bet that he would get a total of 12, or a double 6, on twenty-four rolls of two dice. Soon he realized that his old approach to the game resulted in more money. He asked his friend Blaise Pascal why his new approach was not as profitable. Pascal worked through the problem and found that the probability of winning using the new approach was only 49.1 percent compared to 51.8 percent using the old approach.) This problem and others posed by de M´er´e led to an exchange of letters between Pascal and Fermat in which the fundamental principles of probability theory were formulated for the first time. Although a few special problems on games of chance had been solved by some Italian mathematicians in the 15th and 16th centuries, no general theory was developed before this famous correspondence. The Dutch scientist Christian Huygens, a teacher of Leibniz, learned of this correspondence and shortly thereafter (in 1657) published the first book on probability; entitled De Ratiociniis in Ludo Aleae, it was a treatise on problems associated with gambling. Because of the inherent appeal of games of chance, probability theory soon became popular, and the subject developed rapidly during the 18th century. The major contributors during this period were Jakob Bernoulli (1654-1705) and Abraham de Moivre (1667-1754). In 1812 Pierre de Laplace (1749-1827) introduced a host of new ideas and mathematical techniques in his book, Th´eorie Analytique 1

des Probabilit´es. Before Laplace, probability theory was solely concerned with developing a mathematical analysis of games of chance. Laplace applied probabilistic ideas to many scientific and practical problems. The theory of errors, actuarial mathematics, and statistical mechanics are examples of some of the important applications of probability theory developed in the l9th century. Like so many other branches of mathematics, the development of probability theory has been stimulated by the variety of its applications. Conversely, each advance in the theory has enlarged the scope of its influence. Mathematical statistics is one important branch of applied probability; other applications occur in such widely different fields as genetics, psychology, economics, and engineering. Many workers have contributed to the theory since Laplace’s time; among the most important are Chebyshev, Markov, von Mises, and Kolmogorov. One of the difficulties in developing a mathematical theory of probability has been to arrive at a definition of probability that is precise enough for use in mathematics, yet comprehensive enough to be applicable to a wide range of phenomena. The search for a widely acceptable definition took nearly three centuries and was marked by much controversy. The matter was finally resolved in the 20th century by treating probability theory on an axiomatic basis. In 1933 a monograph by a Russian mathematician A. Kolmogorov outlined an axiomatic approach that forms the basis for the modern theory. (Kolmogorov’s monograph is available in English translation as Foundations of Probability Theory, Chelsea, New York, 1950.) Since then the ideas have been refined somewhat and probability theory is now part of a more general discipline known as measure theory. Outcomes can be anything, like slot machine pictures. Probabilities describe everything that can be said before running an experiment. Deriving new probabilities from basic ones is the key issue. Motivated by decisions about uncertain outcomes. 1.1.2

Basic Definitions

Use an example, like rolling a pair of dice, first and break down the example into the parts labelled by the definitions. Frequency is a common example of probability. Example Let us say that one die is red and the other is green so that we can tell them apart. For the game all that matter is the number of dots on the top face of each die after the pair of dice is thrown and comes to rest. Each die has 6 possible faces so that the pair has 36 possible outcomes. 2

One of these, a pair of sixes, holds special interest. If we assume that all pairs are equally likely, then the probability of a pair of sixes in one roll is 1/36 ≈ 0.028. After 24 rolls, there are 3624 ≈ 2.245 × 1037 possible sequences of 24 pairs. This is a pretty big number, but not as big as the number of elementary particles in the observable universe, which has been variously estimated from 1079 up to 1081 . If we assume that these sequences are equally likely then the probability of a pair of sixes on the first roll only is 3523 ≈ 0.0145. 3624 The probability of no pairs of sixes is 3524 ≈ 0.5086 3624 so that the probability of at least one pair of sixes is approximately 1 − 0.5086 = 0.4914. Definitions Definition 1 (Random Trial) A random trial is an experiment or action with uncertain consequences. The trial is usually repeatable. The consequences are observable after the experiment. The uncertainty implies that there is more than one possible consequence. Definition 2 (Outcome) An outcome is an observed characteristic of the experimental consequences. The point is to focus on some particular aspect of consequences, like the pattern of the dots on the top face of the die at rest, while ignoring others like the angle of a side relative to a table edge. An outcome does not have to be a number. To be interesting, the outcome of the trial is also uncertain. The number of dice is certain when a pair is thrown. A motivation for a focusing on a particular trial and characteristic is that the probabilities of outcomes are known. The outcome is always observed. We will require that all possible outcomes can be anticipated and described. Definition 3 (Sample Space) A sample space is the set of all possible outcomes. The sample space always has more than one element. Definition 4 (Event) An event is a subset of the sample space. 3

An outcome is an elementary event. The motivation for focusing on a particular event is usually given by the application. New, more complicated, trials can be constructed out of old simpler ones: rolling one die becomes rolling a pair of dice and that becomes rolling a pair of dice 24 times. 1.1.3

Set Theory

1.1.4

Probability

Postulate 1 The probability of any event A is positive: P (A) ≥ 0. Postulate 2 The probability of the sample space, the union of all events, equals one: P (S) = 1. Postulate 3 If the events A1 , A2 , A3 , . . . , AN are mutually exclusive (Ai ∩Aj = ∅, ∀i 6= j) then P (A1 ∪ A2 ∪ A3 ∪ · · · ∪ AN ) = P (A1 ) + P (A2 ) + P (A3 ) + · · · + P (AN ). The result that the probabilty of a complement is one minus the probability of an event is a very powerful result that follows from these postulates. It helps to solve the birthday problem and the Chevalier de M´er´e’s problem. Definition 5 (Odds) The odds (or odds ratio) of any event A is the ratio P (A) . 1 − P (A) 1.1.5

Conditional Probability

1.1.6

Bayes’ Theorem

1.1.7

Independence

1.2

Random Variables and Probability Distributions

Now the outcomes have to be (real) numbers, like the monetary value of something. Then we can order outcomes and describe the probabilities as distributions. Random variables and probability distributions are one-to-one.

1.3

Mathematics of Expectations

Distributions can be complicated. Expectations are summaries of distributions and give low dimensional ways to compare distributions.

4

1.4

Multivariate Distributions

Random variables are often multi-dimensional, like a table of personal characteristics (height, weight, vision, speed, strength).

2 2.1

Statistics Sampling and Sampling Distributions

Leading example: use the sample average (a statistic) to learn about the mean of a distribution. The sample average is a random variable. We will ignore the word ”population.” The chapter primarily Introduces the normal, chi-squared, and t probability distributions.

2.2

Estimation

There are various ways to learn about the mean of a distribution. Which ones should we use? Should we combine them?

2.3

Interval Estimation and Hypothesis Testing

Statistics are random variables so that we want to account for their distributions in our method of learning or inference.

3 3.1

Linear Regression Simple Linear Regression

There is a simple way to predict the outcome of one random variable conditional on the value of another. Think of it as estimating the conditional expected value by assuming that it is a linear function of the conditioning variable.

3.2

Inferences in Simple Linear Regression

We can generalize our statistical inferences to simple linear regression.

3.3

Multiple Regression

We can have more than one conditioning variable, or allow our conditional expected value function to be nonlinear.

3.4

Extending the Multiple Regression Model

A particularly interesting family of conditioning variable is dummy or indicator variables.

5