Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient 6 Scatter Plots 9 Null and Altern...
0 downloads 1 Views 2MB Size
Chapter 3

CORRELATION AND REGRESSION TOPIC

SLIDE

Correlation Defined

3

Range of the Correlation Coefficient

6

Scatter Plots

9

Null and Alternative Hypotheses

12

Statistical Significance

16

Example 1

21

Example 2

24

Coefficient of Determination

28

Tutorials



Obtaining the Correlation Coefficient in Excel 2007

Chapter 3

CORRELATION

Chapter 3

CORRELATION ➊ Indicates how well the ranking of scores on one variable matches the ranking of scores on a second variable ➋ As the ranking of scores on the first variable increasingly match the ranking of scores on the second variable, the correlation will be stronger • The fewer matched rankings, the weaker the correlation

Chapter 3

CORRELATION ➌ The ranking of scores may match in the same direction (i.e., the score ranked first on variable 1 is also ranked first on variable 2) or opposite direction (i.e., the score ranked first on variable 1 is ranked last on variable 2) ➍ There is no correlation when the ranking of scores on one variable fail to match any of the scores on the second variable

Chapter 3

CORRELATION

➊ EXAMPLE: Five soccer players were ranked according to their soccer ability and their grade point average (GPA) Perfect Positive r

Perfect Negative r

Soccer Player Ability A 1 B 2 C 3 D 4 E 5

Soccer Player Ability GPA A 1 5 B 2 4 C 3 3 D 4 2 E 5 1

GPA 1 2 3 4 5

Chapter 3

CORRELATION ➊ The numeric value of the correlation coefficient has a range of +1.00 to -1.00, where zero indicates no correlation

The closer the correlation coefficient is to +1.00 or 1.00, the stronger the correlation between two variables • The closer the correlation coefficient is to 0, the weaker the correlation between two variables • A correlation coefficient equal to 0 means there is no correlation between two variables ➋ Which value represents a stronger correlation? • +.65 or -.85 •

Chapter 3

CORRELATION ➊ The correlation coefficient describes two characteristics:

• The sign of the correlation (positive or negative) indicates the direction of the relationship between the two variables • The value of the correlation indicates how strong the correlation is between two variables ➋ The symbol for the correlation between two variables for a sample is a lower case, italicized r

Chapter 3

CORRELATION ➊ Here is a rough guideline for defining the strength of a correlation coefficient:

• • • •

r = ±.80 to ±1.00  Strong Correlation r = ±.60 to ±.80  Moderate Correlation r = ±.40 to ±.60  Weak to Moderate r < ±.40  Weak Correlation

➋ The guideline above assumes a sample size of N ≥ 30

Chapter 3

SCATTER PLOTS ➊ A scatter plot is a graph that describes the direction and strength of the correlation between two variables ➋ The closer the points in the graph are to forming a straight line, the stronger the correlation between the two variables • When the points in the graph form a circular pattern, the correlation will be close or equal to zero • When the pattern of points leans from lower right to upper left, the scatter plot indicates the correlation is negative • When the pattern of points leans from lower left to upper right, the scatter plot indicates the correlation is positive

Chapter 3

SCATTER PLOTS ➊ When the pattern is lower right to upper left, the correlation is negative:

Y X ➋ When the pattern is lower left to upper right, the correlation is positive:

Y X

Chapter 3

Scatter Plots

Chapter 3

NULL HYPOTHESIS ➊ A non-zero correlation does not necessarily mean two variables are related to each other ➋ There are two competing hypotheses: • The alternative hypothesis (HA) contends there is a true correlation between the two variables for the population and the sample correlation observed is not solely due to random error •

The null hypothesis (H0) states that there is no correlation between the two variables for the population and that any sample correlation observed is solely due to random error

Chapter 3

NULL HYPOTHESIS

➊ When a correlation coefficient is sufficiently large, we can make the inference that it reflects not just random error alone, but also a measure of how much two variables have in common • Remember random error is present in everything we measure – you can’t get rid of it and all statistics contain some amount of random error •

Smaller samples have more random error and larger samples have less

Chapter 3

NULL HYPOTHESIS

➊ A statistical conclusion is a statement that rejects or fails to reject the null hypothesis • When we reject the null hypothesis, we are saying the sample correlation obtained is NOT solely due to random error but indicates a real correlation between the two variables for the population •

When we fail to reject the null hypothesis, we are acknowledging the observed sample correlation may be only due to random error and that there may not be any true correlation between the two variables for the population

Chapter 3

NULL HYPOTHESIS ➊ The stronger the correlation, the more likely there is a real correlation between two variables for the population ➋ Whether a sample correlation between two variables is real or not is a function of how big the sample size is and the strength of the correlation between two variables • As a general rule, the larger the sample size, the weaker the sample correlation needs to be in order to declare it statistically significant (meaning the null hypothesis is rejected) • In other words, the correlation coefficient needs to be increasingly stronger for data sets based on small sample sizes

Chapter 3

STATISTICAL SIGNIFICANCE ➊ To determine if a sample correlation is significant, we need to first work from the assumption that the null hypothesis is true



We assume the null hypothesis is true because we haven’t analyzed the data yet (there’s no evidence of a correlation without analyzing the data)

➋ We only analyze the data from one sample, but to determine if a sample correlation is statistically significant we have to remember there are an infinite number of samples that could have been selected

Chapter 3

STATISTICAL SIGNIFICANCE ➊ Assuming the null hypothesis is true, the correlation for the sample obtained should be zero and if the value is not zero, then we assume the correlation is solely due to random error ➋ If we imagine obtaining the correlations for all possible samples (where each sample is the same size), we would find that the average of all sample correlations is equal to the population correlation



Again, if the null hypothesis is true, the correlation between two variables for the population will be zero

Chapter 3

STATISTICAL SIGNIFICANCE ➊ If we imagine obtaining the correlations for all possible samples (where each sample is the same size), we could build a histogram using the sample correlation coefficients •

Since the histogram consists of all possible sample correlations, it is called a sampling distribution of sample correlations



This histogram (or sampling distribution) will be flatter and wider when the sample correlations are based on smaller sample sizes and taller and narrower when the sample correlations are based on larger sample sizes

Chapter 3

STATISTICAL SIGNIFICANCE ➊ The null hypothesis is rejected and the sample correlation is statistically significant when the obtained correlation value (from Excel) falls in the outer 5% of the histogram (or sampling distribution)

r

2.5%

2.5%

Not Significant Fail to Reject Ho

Significant Reject Ho

rcrit .025

0

Significant Reject Ho

rcrit .025

Chapter 3

NULL HYPOTHESIS

➊ The correlation values that identify the outer 5% of the sampling distribution are called the critical values

➋ The critical values are found by using the r table found on the class website ➌ To look-up the critical value, you’ll need to know the sample size or N • Locate the sample size under the first column • Then, for the selected sample size, locate the critical value under the third column (.05 under ‘2-tailed testing’)

Chapter 3

CORRELATION ➊ A researcher recruited 25 adults ranging in age from 35 to 65 years old to find out if there is a relationship between number of television hours watched and blood pressure. The sample correlation obtained was +.65. ➋ State the null hypothesis for this problem • The null hypothesis expects there to be no correlation between number of television hours watched and blood pressure for adults ranging in age from 35 to 65 years old. Any non-zero sample correlation observed is assumed to be solely due to random error.

Chapter 3

CORRELATION  Conduct a test of the null hypothesis at the 5% level. Be sure to properly state the statistical conclusion •

The sample correlation obtained in Excel is +.65



The sample size is 25

• •

The critical values from the r table are ±.396 The statistical conclusion is: •

Since r (25) = +.65, p < .05; Reject H0

Chapter 3

CORRELATION  Provide an interpretation of the statistical conclusion using the variables from the description of the problem •

Based on the 25 adults surveyed, ranging in age from 25 to 65 years old, it appears that as the amount of television watched per day increases, there is an increase in blood pressure. The obtained sample correlation does not seem to be solely due to random error, but rather indicates a real correlation between amount of television watched per day and blood pressure.

Chapter 3

CORRELATION

➊ A marriage counselor believes that couples who spend more time making meals together are more satisfied with their relationship. Sixteen couples are recruited for the study and asked to keep track of how much time (in minutes) they spend preparing meals together each day for one month. At the end of the month, couples are asked to complete a survey on how satisfied they are with their current relationship. The sample correlation obtained was +.45.

Chapter 3

CORRELATION

➋ State the null hypothesis for this problem • The null hypothesis expects there to be no correlation between amount of time couples spend together preparing meals and their satisfaction with their current relationship. Any non-zero sample correlation observed is assumed to be solely due to random error.

Chapter 3

CORRELATION  Conduct a test of the null hypothesis at the 5% level. Be sure to properly state the statistical conclusion •

The sample correlation obtained in Excel is +.45



The sample size is 16

• •

The critical values from the r table are ±.497 The statistical conclusion is: •

Since r (16) = +.45, p < .05; Fail to reject H0

Chapter 3

CORRELATION  Provide an interpretation of the statistical conclusion using the variables from the description of the problem •

Based on the 16 couples recruited for the study, it appears that satisfaction with current relationship is not dependent on how much time couples spend making meals together. The obtained sample correlation may only be due to random error alone.

Chapter 3

COEFFICIENT OF DETERMINATION ➊ The coefficient of determination or r 2 provides an estimate of the percentage of variance that is common to two variables (also known as covariance) • Variance refers to all the things that cause scores on a given variable to be different • What causes people to be different heights? • Genes, nutrition, disease, age, race, and gender to name a few • Differences on these traits cause variance in heights across the population

Chapter 3

COEFFICIENT OF DETERMINATION ➊ If two variables are correlated, they must share some amount of variance • There is a significant correlation between height and weight for the population • What is the variance shared between these two variables? • Both height and weight are influenced by genes, nutrition, disease, age, race, and gender • These variables likely explain why height and weight are correlated • The variance shared by two variables is known as covariance

Chapter 3

COEFFICIENT OF DETERMINATION ➊ What is the coefficient of determination or r 2 for the problem examining the relationship between amount of TV watched and blood pressure? • To get the coefficient of determination, square the sample correlation obtained in Excel •

r 2 = .65 x .65 = .42 or 42%



Interpretation: It is estimated that 42% of the variance in amount of TV watched per day is common to blood pressure. This estimate of covariance is based on a sample size of 25.

Chapter 3

COEFFICIENT OF DETERMINATION ➊ What is the coefficient of determination or r 2 for the problem examining the relationship between amount of time couples spend making meals together and level of satisfaction with their current relationship?



Interpretation: It is estimated that 20% of the variance in amount of time couples spend making meals together is common to the level of satisfaction with their current relationship. This estimate of covariance is based on a sample size of 16. NOTE: The coefficient of determination was done for the example above for demonstration only. The coefficient of determination is not interpretable for non-significant correlations •



r 2 = .45 x .45 = .20 or 20%

End of Chapter 3 – Part 1