GCSE Statistics. Revision

GCSE Statistics Revision Types of data Quantitative Qualitative numerical data such as time, age, height non-numerical such as opinions, favourite ...
Author: Silas Stevenson
9 downloads 3 Views 5MB Size
GCSE Statistics Revision

Types of data Quantitative Qualitative

numerical data such as time, age, height non-numerical such as opinions, favourite subjects, gender

Numerical data can either be discrete or continuous. Discrete data

jumps from one measurement to the next. The measurements in between have no meaning, such as shoe size, number of goals scored at a football match. Continuous data does not jump from one measurement to the next, but passes smoothly through all the measurements in between such as, time, height.

Primary data Data that is collected by or for the person who is going to use it Secondary data Data that is not collected by the person who is going to use it

Sampling When organisations collect data his is usually done by SAMPLING. This is collecting data from a representative SAMPLE of the population they are interested in. A POPULATION need not be human. In statistics we define a population as the collection of ALL the items about which we want to know some characteristics. Examples of populations are hospital patients, road accidents, pet owners, unoccupied property or bridges. It is usually far too expensive and too time consuming to collect information from every member of the population (known as taking a census), exceptions being the General Election and The Census, so instead we collect it from a sample. If it is to be of any use the sample must represent the whole of the population we are interested in, and not be biased in any way. This is where the skill in sampling lies: in choosing a sample that will be as representative as possible. The basis for selecting any sample is the list of all the subjects from which the sample is to be chosen - this is the SAMPLING FRAME. Examples are the Postcode Address File, the Electoral register, telephone directories, membership lists, lists created by credit rating agencies and others, and maps. A problem, of course, is that the list may not be up to date. In some cases a list may not even exist.

Simple random sampling A simple random sample gives each member of the population an equal chance of being chosen. This can be achieved using random number tables.

Systematic sampling This is random sampling with a system! From the sampling frame, a starting point is chosen at random, and thereafter at regular intervals. For example, suppose you want to sample 8 houses from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random starting point between 1 and 15. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116.

Stratified Sampling A Stratified Sample will give a sample proportional to the size of that group.

Quota sampling In quota sampling the selection of the sample is made by the interviewer, who has been given quotas to fill from specified subgroups of the population. For example, an interviewer may be told to sample 50 females between the age of 45 and 60.

Scatter graphs A typical GCSE Statistics question on scatter graphs will have the following structure; a) Plot some missing points on a scatter graph b) Describe relationship between variables c) Draw a line of best fit (through (π‘₯, 𝑦) d) Use the line of best fit to estimate one variable if given the other. If inside the data range this is known as interpolation and if outside the data range, this is known as extrapolation and may not be suitable as trends may not continue e) Find the equation of the line of best fit in the form y = ax + b f) State what a and b represent in context of the question

Finding (π‘₯, 𝑦) π‘₯ is the average x value, so add all the x values together and divide by how many you have Do the same for y.

Equation of a line of best fit To find a, the gradient of the line, pick two points that lie on your line of best fit. Then find the difference between the y’s and divide by the difference between the x’s 𝑦 βˆ’π‘¦ i.e. π‘₯2βˆ’π‘₯1 2

1

To find b, look at the y value where your line of best fit crosses the y axis

Cumulative frequency curves Cumulative frequency is a running total. It is calculated by adding up the frequencies up to that point. Note that the first point that is plotted is the lower boundary of the first class interval which has a cumulative frequency of 0. Notice also the characteristic S-shape of the cumulative frequency curve. Draw lines up to the c.f curve where necessary.

Histograms With a histogram, it is the area of the bar that represents the frequency. Along the y axis, frequency density is plotted. The formula can be found below. πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ 𝐷𝑒𝑛𝑠𝑖𝑑𝑦 =

πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ πΆπ‘™π‘Žπ‘ π‘  π‘Šπ‘–π‘‘π‘‘β„Ž

You may need to rearrange this formula to get Frequency as the subject. πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ = πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ 𝐷𝑒𝑛𝑠𝑖𝑑𝑦 Γ— πΆπ‘™π‘Žπ‘ π‘  π‘Šπ‘–π‘‘π‘‘β„Ž Usually an examination question will have part of the table filled in and part of the histogram drawn.

If you look at the information for a bar that is shown on the histogram and where the frequency is given in the table, you can work out the frequency density and hence the scale on the y axis. In the question shown on the following page, the interval 10 ≀ β„Ž ≀ 15 had the frequency given in the table as well as the bar drawn so the frequency density was worked out. The scale was then easy to figure out and the rest straight forward to complete.

Interquartile range & Outliers & Boxplots The IQR is calculated as follows : IQR = UQ – LQ.

3 The UQ is found ΒΎ of the way through the data i.e. at position (𝑛 + 1) 4 1 The LQ is found ΒΌ of the way through the data i.e. at position (𝑛 + 1) 4 To find an outlier we work out 1.5 times the IQR and subtract/add to the LQ/UQ respectively. If an item is outside this range, it is considered an outlier. This data can also be shown on a box plot.

Time series You will be required in a GCSE Statistics exam to; a) Calculate an n-point moving average b) Plot the moving averages on a time series graph c) Draw a trend line (possibly find equation of it) d) Describe the trend e) Calculate the mean seasonal variation for a particular quarter f) Use the mean season variation and your trend line to calculate an estimate for that quarter in the following year A trend line should go through as many of the moving averages as possible and only go within the data range (You may have to extend it in a later part of the question). Trend should be described as; increasing, decreasing, fluctuating or no real trend. Once you have calculated the mean seasonal variation for a given quarter, you can use it to predict the sales for that quarter in the next year. Your trend line will give an estimate of what the sales should be and then you just add the mean seasonal variation and you have your answer.

Index numbers What are Index Numbers? An index number is a statistical measure designed to follow or track changes over a period of time in the price, quantity or value of an item or group of items.

Types of Index Numbers β€’ Simple index numbers β€’ Chain base index numbers β€’ Weighted index numbers

Simple index numbers The ratio of the price of a commodity at a given time to its price at a different time – usually before the given time. E.g. In January 1980 the price of a bar of soap was 40p., whilst in January 1985 its price was 60p. If we take January 1980 as the base year, the index for January 1985 is calculated as follows; 𝐼𝑛𝑑𝑒π‘₯ π‘›π‘’π‘šπ‘π‘’π‘Ÿ =

π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘ π‘π‘Ÿπ‘–π‘π‘’ Γ— 100 π‘π‘Žπ‘ π‘’ π‘¦π‘’π‘Žπ‘Ÿ π‘π‘Ÿπ‘–π‘π‘’

60 Γ— 100 = 150 40 The percentage sign is usually omitted, and we say that the index is 150 based on January 1980 which is 100. =

This indicates that the price of the soap has increased by 50% over the five year period. If you are given a table containing the index numbers then these can be used to work out prices be rearranging the above formula. π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘ π‘π‘Ÿπ‘–π‘π‘’ =

𝐼𝑛𝑑𝑒π‘₯ π‘›π‘’π‘šπ‘π‘’π‘Ÿ Γ— π‘π‘Žπ‘ π‘’ π‘¦π‘’π‘Žπ‘Ÿ π‘π‘Ÿπ‘–π‘π‘’ 100