LESSON 8--Graphs * Quartiles, Deciles and Centiles: What is important to remember here is to go off the prefix of the words... --Cent=100 --Deci=10 --Quart=4 ...Therefore, D5=Q2=C50 *THE BOX AND WHISKER PLOT --Whatʼs important to note here is that the lines run through quartiles 1, 2 and 3. And, that the centiles (percentages) of these lines are 25, 50, and 75 respectively.
Skewedness can also be inferred from a box and whisker plot. --Here the Menʼs side is negatively skewed, whenever the bottom distance b/t Q1 & Q2 is larger than the top, we will see negative skewedness. --On the womenʼs side we see positive skewedness because the top distance b/t Q2 & Q3 is larger than the bottom interval… --Top larger = POSITIVE --Bottom larger = Negative
When looking for interval size we must keep in mind that the ideal interval size is b/t 10 and 20 groups. So if given the problem below, we find the range, divide it seperately by 10 and 20, then subtract those two numbers to find a possible interval size. However, according to our Prof, the size can be anything b/t these two values, which in this case are 2-4.
Finding the mean from raw data --Take note that to find x1 we simply find an arbitrary reference point--can be anything, then call that interval 0. Then label +1 and -1 on the intervals above (in numerical value) and below respectively. --To find fx1 just take the sum of the f column and the x1 column.
For the calculation of grouped data, find the midpoint of the arbitrary interval (14.7 in this case) and then find the SUM of the fx1 column, the N which is the sum of the f column, and the interval size (0.5 in this case, NOT 0.4!!)
*The difference b/t finding the mean from grouping and finding the mean from the Raw date is called the “error of grouping” MEDIAN
--Here the median is 45 and we need to find the 45th individual, which falls in the 14.5--14.9 interval. We know from the interval above the 14.5--14.9 interval that we need to go 9 individuals to find the 45th because, 36 + 9 = 45. Therefore to get out fraction we take the 9 spots we need to go and put that over the total amount of individuals in that interval, which gives 9/20. Then simple multiply 9/20 by the interval size 0.5, and then add the product to the lower level of the interval, which is 14.45 in this case (NOT 14.5!!). --Symbol for median is Mdn. MODE: Notice that for the mode, the mode is ALWAYS at the peak, and the median is ALWAYS in the middle. This can be explained in more complicated terms, but this is really all that needs to said. Remember that in the baseball example, the graph is POSITIVELY skewed. --In skewed data, we want to use the MEDIAN, when data is normal we use the MEAN. --MODE is used in NOMINAL data. (think: Oʼs)
VARIABILITY: Variability is split up into these sections: Q (Quartile Deviation) AD (Average deviation) SD (standard Deviation) Varience SEM Quartile Deviation -- CHOICE statistic for SKEWED data.
To find quartile deviation, first find Q1 (typo in the slide, should say Q1). Take the cumulative f value (90) and multiply by the corresponding quartile percentage, 25% in this case for Q, and follow rest of the slide to the left. --Repeat this for Q3, then use this formula, which gives: 0.7
For the previous example for quartile deviation, our median is 14.7, so + and - one quartile deviation from 14.7 gives us 50% of our subjects. So since our Quartile deviation is 0.7 then our range is 14.0--15.4.
--Note this quartILES, not quartERS.
AVERAGE DEVIATION: This is fairly easy. Find the mean, in this case 66/6 = 11. And then take X - 11 to find the value for the little x column. The absolute values of the little x column are then added together to give us 14. We then take 14 and divide it by the number of Xʼs, which is 6, this gives us the average deviation, 2.33.
STANDARD DEVIATION This is same as above with one small difference, you square the little x column and then take the summation of the little x column, dived by the number of X values (6) and then take the square root.
* The standard deviation is interpreted much like the average deviation, they are similar, but not exactly the same. Standard deviation interpretation:
-- This is the standard deviation bell curve. --Standard deviation is only used when the data is NORMAL.
SEM--Standard error of the mean: --How accurately a sample mean (X) estimates a population mean(m) --Sample mean(X): unbiased estimator of the population mean(m) --Sample standard deviation: biased estimator of the population standard deviation **Basically this finds whether or not the sample size is adequate enough to extrapolate the results from the sample to the population.
--To calculate the SEM, we take the standard deviatinon over the square root of the number or subjects.
--This is the 95% confidence interval. --It tells us that if we take 2 SEMʼs 95%, we can be 95% sure that the population mean falls b/t these numbers.
STANDARD SCORES *Z-Scores represents the number of standard deviations a particular measure is above or below the mean.
PROBABILITY: *Chance mean = N x P *Chance standard deviation =
Example:
Part 1
PART 2
Another chance standard deviation example:
Another example, more complicated this time…