Histograms and Cumulative Frequency

A Level Statistics Histograms and Cumulative Frequency This is really a reminder from GCSE. Histograms are similar to bar charts apart from the consid...
Author: Richard Melton
0 downloads 2 Views 1MB Size
A Level Statistics Histograms and Cumulative Frequency This is really a reminder from GCSE. Histograms are similar to bar charts apart from the consideration of areas. In a bar chart, all of the bars are the same width and the only thing that matters is the height of the bar. In a histogram, the area is the important thing. Example Draw a histogram for the following information. Height (feet) Frequency

Relative Frequency

0-2

0

0

2-4

1

1

4-5

4

8

5-6

8

16

6-8

2

2

(Ignore relative frequency for now). It is difficult to draw a bar chart for this information, because the class divisions for the height are not the same. When drawing a histogram, the y-axis is labelled ‘frequency density’ or "relative frequency". You must work out the relative frequency before you can draw a histogram. To do this, first decide upon a standard width for the groups. Some of the heights are grouped into 2s (0-2, 2-4, 6-8) and some into 1s (4-5, 5-6). Most are 2s, so we shall call

1

the standard width 2. To make the areas match, we must double the values for frequency which have a class division of 1 (since 1 is half of 2). Therefore the figures in the 4-5 and the 5-6 columns must be doubled. If any of the class divisions were 4 (for example if there was a 8-12 group), these figures would be halved. This is because the area of this "bar" will be twice the standard width of 2 unless we half the frequency. •

Area of bar = frequency x standard width

Cumulative Frequency The cumulative frequency is the running total of the frequencies. On a graph, it can be represented by a cumulative frequency polygon, where straight lines join up the points, or a cumulative frequency curve. Example Height (cm)

Frequency

Cumulative Frequency

0 - 100

4

4

100 - 120

6

10 (= 4 + 6)

2

120 - 140

3

13 (= 4 + 6 + 3)

140 - 160

2

15 (= 4 + 6 + 3 + 2)

160 - 180

6

21

180 - 220

4

25

These data are used to draw a cumulative frequency polygon by plotting the cumulative frequencies against the upper class boundaries.

Averages The Median Value The median of a group of numbers is the number in the middle, when the numbers are in order of magnitude. For example, if the set of numbers is 4, 1, 6, 2, 6, 7, 8, the median is 6: 1, 2, 4, 6, 6, 7, 8 (6 is the middle value when the numbers are in order) If you have n numbers in a group, the median is the (n + 1)/2 th value. For example, there are 7 numbers in the example above, so replace n by 7 and the median is the (7 + 1)/2 th value = 4th value. The 4th value is 6. On a histogram, the median value occurs where the whole histogram is divided into two equal parts. An estimate of the median can be found using algebraic methods. However, an easier method would be to use the data to draw a cumulative frequency polygon and estimate the median using that.

Mean There are four types of average: mean, mode, median and range. The mean is what most people mean when they say "average". It is found by adding up all of the numbers you have to find the mean of, and dividing by the number of numbers. So the mean of 3, 5, 7, 3 and 5 is 23/5 = 4.6 . When you are given data which has been grouped, the mean is Σfx / Σf , where f is the frequency and x is the midpoint of the group (Σ means "the sum of").

3

Example Work out an estimate for the mean height. Height (cm) 101-120

Number of People (f) Midpoint (x) 1 110.5

fx (f multiplied by x) 110.5

121-130

3

125.5

376.5

131-140

5

135.5

677.5

141-150

7

145.5

1018.5

151-160

4

155.5

622

161-170

2

165.5

331

171-190

1

180.5

180.5

Σfx = 3316.5 Σf = 23 mean = 3316.5/23 = 144cm (3s.f.)

Mode The mode is the number in a set of numbers which occurs the most. So the modal value of 5, 6, 3, 4, 5, 2, 5 and 3 is 5, because there are more 5s than any other number. On a histogram, the modal class is the class with the largest frequency density.

Range The range is the largest number in a set minus the smallest number. So the range of 5, 7, 9 and 14 is (14 - 5) = 9.

4

Measures of Dispersion Measures of dispersion measure how spread out a set of data is.

Variance and Standard Deviation The formulae for the variance and standard deviation are given below. µ means the mean of the data. Variance = σ2 = Σ (xr - µ)2 n The standard deviation, σ, is the square root of the variance. What the formula means: (1) xr - µ means take each value in turn and subtract the mean from each value. (2) (xr - µ)2 means square each of the results obtained from step (1). This is to get rid of any minus signs. (3) Σ(xr - µ)2 means add up all of the results obtained from step (2). (4) Divide step (3) by n, which is the number of numbers (5) For the standard deviation, square root the answer to step (4). Example Find the variance and standard deviation of the following numbers: 1, 3, 5, 5, 6, 7, 9, 10 . The mean = 46/ 8 = 5.75 (Step 1): (1 - 5.75), (3 - 5.75), (5 - 5.75), (5 - 5.75), (6 - 5.75), (7 - 5.75), (9 - 5.75), (10 5.75) = -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25 (Step 2): 22.563, 7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063 (Step 3): 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063 = 61.504 (Step 4): n = 8, therefore variance = 61.504/ 8 = 7.69 (3sf) (Step 5): standard deviation = 2.77 (3sf)

5

Adding or Multiplying Data by a Constant If a constant, k, is added to each number in a set of data, the mean will be increased by k and the standard deviation will be unaltered (since the spread of the data will be unchanged). If the data is multiplied by the constant k, the mean and standard deviation will both be multiplied by k.

Grouped Data There are many ways of writing the formula for the standard deviation. The one above is for a basic list of numbers. The formula for the variance when the data is grouped is as follows. The standard deviation can be found by taking the square root of this value.

Example The table shows marks (out of 10) obtained by 20 people in a test Mark (x) Frequency (f) 1

0

2

1

3

1

4

3

5

2

6

5

7

5

8

2

9

0

6

10

1

Work out the variance of this data. In such questions, it is often easiest to set your working out in a table: fx 0

fx2 0

2

4

3

9

12

48

10

50

30

180

35

245

16

128

0

0

10

100

Σf = 20 Σfx = 118 Sfx2 = 764 variance = Σfx2 - ( Σfx )2 Σf ( Σf )2 = 764 - (118)2 20 ( 20 )2 = 38.2 - 34.81 = 3.39

7

Quartiles If we divide a cumulative frequency curve into quarters, the value at the lower quarter is referred to as the lower quartile, the value at the middle gives the median and the value at the upper quarter is the upper quartile. A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers is 19.625 . However, the extremes in this set (8 and 50) distort the range. The inter-quartile range is a method of measuring the spread of the numbers by finding the middle 50% of the values. It is useful since it ignore the extreme values. It is a method of measuring the spread of the data. The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and the upper quartile is the 3(n+1)/4 the value. The difference between these two is the inter-quartile range (IQR). In the above example, the upper quartile is the 118.5th value and the lower quartile is the 39.5th value. If we draw a cumulative frequency curve, we see that the lower quartile, therefore, is about 17 and the upper quartile is about 37. Therefore the IQR is 20 (bear in mind that this is a rough sketch- if you plot the values on graph paper you will get a more accurate value).

8

Box and Whisker Diagrams Given some data, we can draw a box and whisker diagram (or box plot) to show the spread of the data. The diagram shows the quartiles of the data, using these as an indication of the spread. The diagram is made up of a "box", which lies between the upper and lower quartiles. The median can also be indicated by dividing the box into two. The "whiskers" are straight line extending from the ends of the box to the maximum and minimum values.

Outliers When collecting data, often a result is collected which seems "wrong". In other words, it is much higher or much lower than all of the other values. Such points are known as "outliers". On a box and whisker diagram, outliers should be excluded from the whisker portion of the diagram. Instead, plot them individually, labelling them as outliers.

Skewness See: Skewness If the whisker to the right of the box is longer than the one to the left, there is more extreme values towards the positive end and so the distribution is positively skewed. Similarly, if the whisker to the left is longer, the distribution is negatively skewed.

9

Probability See also the probability section in the GCSE section for topics such as probability trees. The probability of an event occurring is the chance or likelihood of it occurring. The probability of an event A, written P(A), can be between zero and one, with P(A) = 1 indicating that the event will certainly happen and with P(A) = 0 indicating that event A will certainly not happen. Probability = the number of successful outcomes of an experiment the number of possible outcomes So, for example, if a coin were tossed, the probability of obtaining a head = ½, since there are 2 possible outcomes (heads or tails) and 1 of these is the ‘successful’ outcome.

Using Set Notation Probability can be studied in conjunction with set theory, with Venn Diagrams being particularly useful in analysis. The probability of a certain event occurring, for example, can be represented by P(A). The probability of a different event occurring can be written P(B). Clearly, therefore, for two events A and B, •

P(A) + P(B) - P(A∩B) = P(A∪B)

P(A∩B) represents the probability of A AND B occurring. P(A∪B) represents the probability of A OR B occurring. This can be shown on a Venn diagram. The rectangle represents the sample space, which is all of the possible outcomes of the experiment. The circle labelled as A represents event A. In other words, all of the points within A represent possible ways of achieving the outcome of A. Similarly for B.

10

So in the diagram, P(A) + P(B) is the whole of A (the whole circle) + the whole of B (so we have counted the middle bit twice). A" is the complement of A and means everything not in A. So P(A") is the probability that A does not occur. Note that the probability that A occurs + the probability that A does not occur = 1 (one or the other must happen). So P(A) + P(A") = 1. Thus: •

P(A") = 1 - P(A)

Mutual Exclusive Events Events A and B are mutually exclusive if they have no events in common. In other words, if A occurs B cannot occur and vice-versa. On a Venn Diagram, this would mean that the circles representing events A and B would not overlap. If, for example, we are asked to pick a card from a pack of 52, the probability that the card is red is ½ . The probability that the card is a club is ¼. However, if the card is red it can"t be a club. These events are therefore mutually exclusive. If two events are mutually exclusive, P(A∩B) = 0, so •

P(A) + P(B) = P(A∪B)

Independent Events Two events are independent if the first one does not influence the second. For example, if a bag contains 2 blue balls and 2 red balls and two balls are selected randomly, the events are: a) independent if the first ball is replaced after being selected b) not independent if the first ball is removed without being replaced. In this instance, there are only three balls remaining in the bag so the probabilities of selecting the various colours have changed. 11

Two events are independent if (and only if): •

P(A∩B) = P(A)P(B)

This is known as the multiplication law.

Conditional Probability Conditional probability is the probability of an event occurring, given that another event has occurred. For example, the probability of John doing mathematics at A-Level, given that he is doing physics may be quite high. P(A|B) means the probability of A occurring, given that B has occurred. For two events A and B, •

P(A∩B) = P(A|B)P(B)

and similarly P(A∩B) = P(B|A)P(A). If two events are mutually exclusive, then P(A|B) = 0 .

Independence Using the above condition for independence, we deduce that if two events are independent, then: P(A)P(B) = P(A|B)P(B) = P(B|A)P(A), or: P(A) = P(A|B) and P(B) = P(B|A) Example A six-sided die is thrown. What is the probability that the number thrown is prime, given that it is odd. The probability of obtaining an odd number is 3/6 = ½. Of these odd numbers, 2 of them are prime (3 and 5). P(prime | odd) = P(prime and odd) = 2/6 = 2/3 P(odd)

3/6

12

Skewness A normal distribution is a bell-shaped distribution of data where the mean, median and mode all coincide. A frequency curve showing a normal distribution would look like this:

In a normal distribution, approximately 68% of the values lie within one standard deviation of the mean and approximately 95% of the data lies within two standard deviations of the mean. If there are extreme values towards the positive end of a distribution, the distribution is said to be positively skewed. In a positively skewed distribution, the mean is greater than the mode. For example:

13

A negatively skewed distribution, on the other hand, has a mean which is less than the mode because of the presence of extreme values at the negative end of the distribution. There are a number of ways of measuring skewness: Pearson’s coefficient of skewness =

mean – mode = 3(mean – median) Standard deviation Standard deviation

Quartile measure of skewness = Q3 – 2Q2 + Q1 Q3 – Q1

Linear Regression Scatter Diagrams We often wish to look at the relationship between two things (e.g. between a person"s height and weight) by comparing data for each of these things. A good way of doing this is by drawing a scatter diagram. "Regression" is the process of finding the function satisfied by the points on the scatter diagram. Of course, the points might not fit the function exactly but the aim is to get as close as possible. "Linear" means that the function we are looking for is a straight line (so our function f will be of the form f(x) = mx + c for constants m and c). Here is a scatter diagram with a regression line drawn in: 14

Correlation Correlation is a term used to describe how strong the relationship between the two variables appears to be. We say that there is a positive linear correlation if y increases as x increases and we say there is a negative linear correlation if y decreases as x increases. There is no correlation if x and y do not appear to be related. Explanatory and Response Variables In many experiments, one of the variables is fixed or controlled and the point of the experiment is to determine how the other variable varies with the first. The fixed/controlled variable is known as the explanatory or independent variable and the other variable is known as the response or dependent variable. I shall use "x" for my explanatory variable and "y" for my response variable, but I could have used any letters.

15

Regression Lines By Eye If there is very little scatter (we say there is a strong correlation between the variables), a regression line can be drawn "by eye". You should make sure that your line passes through the mean point (the point (x,y) where x is mean of the data collected for the explanatory variable and y is the mean of the data collected for the response variable). Two Regression Lines When there is a reasonable amount of scatter, we can draw two different regression lines depending upon which variable we consider to be the most accurate. The first is a line of regression of y on x, which can be used to estimate y given x. The other is a line of regression of x on y, used to estimate x given y. If there is a perfect correlation between the data (in other words, if all the points lie on a straight line), then the two regression lines will be the same.

Least Squares Regression Lines This is a method of finding a regression line without estimating where the line should go by eye. If the equation of the regression line is y = ax + b, we need to find what a and b are. We find these by solving the "normal equations". Normal Equations The "normal equations" for the line of regression of y on x are: Σy = na + bΣx and Σxy = aΣx + bΣx2 The values of a and b are found by solving these equations simultaneously. For the line of regression of x on y, the "normal equations" are the same but with x and y swapped.

16

The Product Moment Correlation Coefficient The product moment correlation coefficient is a measurement of the degree of scatter. It is usually denoted by r and r can be any value between -1 and 1. It is defined as follows: r = sxy sxsy where sxy is the covariance of x and y,

.

Correlation The product moment correlation coefficient (pmcc) can be used to tell us how strong the correlation between two variables is. A positive value indicates a positive correlation and the higher the value, the stronger the correlation. Similarly, a negative value indicates a negative correlation and the lower the value the stronger the correlation. If there is a perfect positive correlation (in other words the points all lie on a straight line that goes up from left to right), then r = 1. If there is a perfect negative correlation, then r = -1. If there is no correlation, then r = 0. r would also be equal to zero if the variables were related in a non-linear way (they might lie on a quadratic curve rather than a straight line, for example).

17

Discrete Random Variables A probability distribution is a table of values showing the probabilities of various outcomes of an experiment. For example, if a coin is tossed three times, the number of heads obtained can be 0, 1, 2 or 3. The probabilities of each of these possibilities can be tabulated as shown: Number of Heads

0

1

2

3

Probability

1/8

3/8

3/8

1/8

A discrete variable is a variable which can only take a countable number of values. In this example, the number of heads can only take 4 values (0, 1, 2, 3) and so the variable is discrete. The variable is said to be random if the sum of the probabilities is one.

Probability Density Function The probability density function (p.d.f.) of X (or probability mass function) is a function which allocates probabilities. Put simply, it is a function which tells you the probability of certain events occurring. The usual notation that is used is P(X = x) = something. The random variable (r.v.) X is the event that we are considering. So in the above example, X represents the number of heads that we throw. So P(X = 0) means "the probability that no heads are thrown". Here, P(X = 0) = 1/8 (the probability that we throw no heads is 1/8 ). In the above example, we could therefore have written: x

0

1

2

3

P(X = x)

1/8

3/8

3/8

1/8

Quite often, the probability density function will be given to you in terms of x. In the above example, P(X = x) = 3Cx / (2)3 (see permutations and combinations for the meaning of 3Cx). Example A die is thrown repeatedly until a 6 is obtained. Find the probability density function for the number times we throw the die.

18

Let X be the random variable representing the number of times we throw the die. P(X = 1) = 1/6 (if we only throw the die once, we get a 6 on our first throw. The probability of this is 1/6 ). P(X = 2) = (5/6) × (1/6) (if we throw the die twice before getting a 6, we must throw something that isn't a 6 with our first throw, the probability of which is 5/6 and we must throw a 6 on our second throw, the probability of which is 1/6) etc In general, P(X = x) = (5/6)(x-1) × (1/6)

Cumulative Distribution Function The cumulative distribution function (c.d.f.) of a discrete random variable X is the function F(t) which tells you the probability that X is less than or equal to t. So if X has p.d.f. P(X = x), we have: F(t) = P(X ≤ t) = ΣP(X = x). In other words, for each value that X can be which is less than or equal to t, work out the probability that X is that value and add up all such results. Example In the above example where the die is thrown repeatedly, lets work out P(X ≤ t) for some values of t. P(X ≤ 1) is the probability that the number of throws until we get a 6 is less than or equal to 1. So it is either 0 or 1. P(X = 0) = 0 and P(X = 1) = 1/6. Hence P(X ≤ 1) = 1/6 Similarly, P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0 + 1/6 + 5/36 = 11/36

19

Expectation The expected value (or mean) of X, where X is a discrete random variable, is a weighted average of the possible values that X can take, each value being weighted according to the probability of that event occurring. The expected value of X is usually written as E(X) or m. •

E(X) = S x P(X = x)

So the expected value is the sum of: [(each of the possible outcomes) × (the probability of the outcome occurring)]. In more concrete terms, the expectation is what you would expect the outcome of an experiment to be on average. Example What is the expected value when we roll a fair die? There are six possible outcomes: 1, 2, 3, 4, 5, 6. Each of these has a probability of 1/6 of occurring. Let X represent the outcome of the experiment. Therefore P(X = 1) = 1/6 (this means that the probability that the outcome of the experiment is 1 is 1/6) P(X = 2) = 1/6 (the probability that you throw a 2 is 1/6) P(X = 3) = 1/6 (the probability that you throw a 3 is 1/6) P(X = 4) = 1/6 (the probability that you throw a 4 is 1/6) P(X = 5) = 1/6 (the probability that you throw a 5 is 1/6) P(X = 6) = 1/6 (the probability that you throw a 6 is 1/6) E(X) = 1×P(X = 1) + 2×P(X = 2) + 3×P(X = 3) + 4×P(X=4) + 5×P(X=5) + 6×P(X=6) Therefore E(X) = 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 7/2 So the expectation is 3.5 . If you think about it, 3.5 is halfway between the possible values the die can take and so this is what you should have expected.

Expected Value of a Function of X To find E[ f(X) ], where f(X) is a function of X, use the following formula: 20



E[ f(X) ] = S f(x)P(X = x)

Example For the above experiment (with the die), calculate E(X2) Using our notation above, f(x) = x2 f(1) = 1, f(2) = 4, f(3) = 9, f(4) = 16, f(5) = 25, f(6) = 36 P(X = 1) = 1/6, P(X = 2) = 1/6, etc So E(X2) = 1/6 + 4/6 + 9/6 + 16/6 + 25/6 + 36/6 = 91/6 = 15.167 The expected value of a constant is just the constant, so for example E(1) = 1. Multiplying a random variable by a constant multiplies the expected value by that constant, so E[2X] = 2E[X]. A useful formula, where a and b are constants, is: •

E[aX + b] = aE[X] + b

[This says that expectation is a linear operator].

Variance The variance of a random variable tells us something about the spread of the possible values of the variable. For a discrete random variable X, the variance of X is written as Var(X). •

Var(X) = E[ (X – m)2 ]

where m is the expected value E(X)

This can also be written as: •

Var(X) = E(X2) – m2

The standard deviation of X is the square root of Var(X). Note that the variance does not behave in the same way as expectation when we multiply and add constants to random variables. In fact: •

Var[aX + b] = a2Var(X) 21

You is because: Var[aX + b] = E[ (aX + b)2 ] - (E [aX + b])2 . = E[ a2X2 + 2abX + b2] - (aE(X) + b)2 = a2E(X2) + 2abE(X) + b2 - a2E2(X) - 2abE(X) - b2 = a2E(X2) - a2E2(X) = a2Var(X)

The Discrete Uniform Distribution There are a number of important types of discrete random variables. The simplest is the uniform distribution. A random variable with p.d.f. given by: P(X = x) = 1/(k+1) for all values of x = 0, ... k P(X = x) = 0 for other values of x where k is a constant, is said to be follow a uniform distribution. Example Suppose we throw a die. Let X be the random variable denoting what number is thrown. P(X = 1) = 1/6 P(X = 2) = 1/6 etc In fact, P(X = x) = 1/6 for all x between 1 and 6. Hence we have a uniform distribution.

Expectation and Variance We can find the expectation and variance of the discrete uniform distribution: Suppose P(X = x) = 1/(k+1) for all values of x = 0, ... k. Then E(X) = 1.P(X = 1) + 2.P(X = 2) + ... + k.P(X = k) = 1/(k+1) + 2/(k+1) + 3/(k+1) + ... k/(k+1) = (1/(k+1))(1 + 2 + ... + k) = (1/(k+1)) x ½k [2 + (k - 1)] (summing the arithmetic progression) = ½k

22

It turns out that the variance is: k(k+2) 12

The Normal Distribution A continuous random variable X follows a normal distribution if it has the following probability density function (p.d.f.):

The parameters of the distribution are µ and σ2, where µ is the mean (expectation) of the distribution and σ2 is the variance. We write X ~ N(µ, σ2) to mean that the random variable X has a normal distribution with parameters µ and σ2. The normal distribution is symmetrical about its mean:

The Standard Normal Distribution If Z ~ N(0, 1), then Z is said to follow a standard normal distribution. P(Z < z) is known as the cumulative distribution function of the random variable Z. For the standard normal distribution, this is usually denoted by Φ(z).

23

Normally, you would work out the c.d.f. by doing some integration. However, it is impossible to do this for the normal distribution and so results have to be looked up in statistical tables. Example Find P(-1.96 < Z < 1.96), where Z ~ N(0, 1). This is equal to P(Z < 1.96) - P(Z < -1.96) = Φ(1.96) - Φ(-1.96) = Φ(1.96) - (1 - Φ(1.96)) = 2 Φ(1.96) - 1 Now we need to look in a table to find out what Φ(1.96) is. If you look in statistical tables for the standard normal distribution, you should be able to find a line: z

0

1

1.9

.9713 .9719

2

3

4

5

6

7

8

.9726

.9732

.9738 .9744 .9750 .9756 .9761

9 .9767

To find Φ(1.96), read across the 1.9 line until you get to 6 (for 1.96). So Φ(1.96) = 0.975 . Hence P(-1.96 < Z < 1.96) = 2 × 0.975 - 1 = 0.95 . This result says that the central 95% of the distribution lies between -1.96 and 1.96 . Some Identities • • •

Φ(-z) = P(Z < -z) = 1 - Φ(z) P(Z > z) = 1 - Φ(z) P(a < Z < b) = Φ(b) - Φ(a)

Standardising Now, the mean and variance of the normal distribution can be any value and so clearly there can't be a statistical table for each one. Instead, we convert to the standard normal distribution- we can also use statistical tables for the standard normal distribution to find the c.d.f. of any normal distribution. We use the following trick: 24

If X ~ N(µ, σ2), then put:

It turns out that Z ~ N(0, 1). Note that it is σ and not σ2 on the denominator! Example If X ~ N(4, 9), find P(X