Cumulative Frequency Constructing The Cumulative Frequency Table • Cumulative frequency, as the name suggests, is the accumulated or collected frequencies up to a particular point. Let us look at an example for discrete data. • Example: Students in a music class were asked how many musical instruments each of them could play. The results are: 3 1 3 3 1

4 1 3 1 1

5 2 4 2 6

3 2 4 3 1

3 1 5 5 7

Class frequency Cumulative frequency 0-1 7 7 2-3 10 17 4-5 6 23 6-7 2 25 The cumulative frequency in the last class/row is the total observations. The cumulative frequency for class 2-3 for example is just the total accumulated frequency up to 3 instruments. Thus, the cumulative frequency for class 2-3 is 7 + 10 = 17, (frequency for 0-1 and frequency for 2-3).

• Let us now look at an example for continuous data. A group of students were subjected to a certain experiment to measure their reaction time in seconds. The results are as given below: Class frequency Cumulative frequency [0,2) 1 1 [2,4) 2 3 [4,6) 4 7 [6,8) 4 11 [8,10) 1 12 • These cumulative frequencies can be used to produce a cumulative frequency curve or ogive. i.

ii. iii. iv.

We start the curve at the lowest class boundary of the first class. In our case this is 0 and the cumulative frequency at this point is set to 0. The end of the curve is the highest class boundary of the last class. In our case this is 10. The next point is (2, 1) then we plot (4, 3), (6, 7), (8, 11) and (10, 12). Thus, the point is (upper class boundary, cumulative frequency at that class) except for the end points.

Exploration • Complete the cumulative frequency table below: Class frequency Cumulative frequency [3,5) 2 2 [5,7) 3 [7,9) 5 [9,11) 4 [11,13) 1 • Draw a cumulative frequency curve using the above information. Hint: the first point in your curve is (3, 0) and the end point is (13, 15).

Percentiles • A 60th percentile is a value such that the 60% of the distribution is beneath this value. For example, if your score on a SAT test is the 95th percentile then your score is better than 95% of all students who took the same test. However, you did not necessarily receive 95% on the test. • The 25th percentile is also called the lower or first quartile (Q1). The 50th percentile and 75th percentile are called the second quartile (Q2) and third quartile (Q3). The third quartile is also known as the upper quartile. • The second quartile is also known as the median. • The difference between the third and first quartiles is known as the inter-quartile range. Inter-quartile range is also a measure of the spread of the data. Using the definition of percentile above, half of the sample size is found inside the inter-quartile range. • Example 1: Below are the monthly rents of a particular size house in Sheffield, England. 5600 6800 4500 5900 7000 4900 5200 6100 9000 i. ii. iii.

Let us order these data: 4500 4900 5200 5600 5900 6100 6800 7000 9000 The median is in the "middle" of the ordered data. The first quartile is (4900 + 5200)/2 = 5050.

iv.

The third quartile is (6800 + 7000)/2 = 6900.

v. vi.

Inter-quartile range = 6900 – 5050 = 1850. Note that if we have extreme minimum like 200 or maximum like 9000000 these values would not affect the median and the inter-quartile range for most data set like the one above.

• Example 2: 30 30 50 75 85 90 110 150 190 205 255

• The information about minimum, maximum and quartiles is commonly represented as a "box and whisker plot." A box and whisker plot looks like this:

A box and whisker plot helps us to visualize where half of the data (between Q1 and Q3) are distributed.

Use of GDC. [STAT] Select 1: EDIT [ENTER] {Enter your data into the column under L1. For illustration I will use data from Example 2 above.} Once finished then press [2nd][MODE] for quit. Press [STAT] again and select CALC 1:1-Var Stats [Enter]. You will then see this in your screen: 1-Var Stats {blinking cursor} Enter [2nd][1] for L1 and press [ENTER]

The mean here is 115 (3 s.f). The second screen on the right helps us to produce a box-and-whisker diagram. Alternatively, you can set your GDC to the screen in the left below. Set the window to Xmin=15, Xmax=260, Xscl=10 to match your data set. The values of Ys in the window will not affect your box and whisker plot. Press [ENTER] to obtain the screen in the right. Press [TRACE] and use the arrow keys to obtain values of Q1, Q2 and etc.

From the plot, we see that the median is not in the middle of the distribution (between the minimum and maximum values) but nearer to the "tail end" of the distribution.

• So far, we have been dealing with raw ungrouped data.

We could have also obtained the quartiles of grouped data. This is accomplished by using a cumulative frequency curve. Let us look at the above cumulative frequency curve. The 50th percentile is 50% of the cumulative frequency. Since 100% of the cumulative frequency is 12, then 50% of the cumulative frequency is 6. Similarly, the first quartile and third quartile are 25% and 75% of the cumulative frequency. The values of first quartile, median, and third quartiles are read from the horizontal axis. The first quartile is 4 seconds, the median is 5.65 seconds, and the third quartile is 6.9 seconds. Similarly, the 30th percentile is 30% of 12 and is 3.6. Therefore, we find the value of 3.6 on the cumulative frequency axis and read across until we hit the graph. We then read the associated

value on the horizontal axis and in this case, the answer is 4.4 seconds. You should confirm that the 30th percentile is 4.4 seconds. • Exploration: Find the value of 83rd percentile. [It should be about 7.4 seconds.]