Exercises related to hypothesis testing and inference

1 Exercises related to hypothesis testing and inference Question 1. Which of these statements are true? r The mean of a population is denoted x r The...
Author: Blaise Mitchell
3 downloads 2 Views 53KB Size
1

Exercises related to hypothesis testing and inference Question 1. Which of these statements are true? r The mean of a population is denoted x r The population mean is a statistic r A sample is a subset of a population r The population size (number of members of the population) is always countable r None of them Question 2. Which of these random variables are discrete? r The Diastolic Blood Pressure (DBP) r The Number of people that get tested for HIV in one week in a given clinic r The Body Mass Index (BMI) r The Smoking status r The Blood type r The survival time of a cancer patient r None of them Question 3. Indicate what would be the distribution (Normal, Uniform, Binomial, Poisson, Negative Binomial, other...) of each following random variable denoted X: 1. X is the random variable representing the number of bacterial colonies in a Petri dish. 2. In a flow cytometry analysis of fluorescence proteins experiment, X is a random variable representing the relative fluorescence or light scatter intensity with respect to the number of events (cell count). 3. In a mass spectrometry experiment, X is a random variable representing the expression (or abundance) of a protein of interest in a study comparing esophageal and control subjects. 4. In a clinical experiment, X is a random variable representing the Body Mass Index (BMI) in a cohort of diabetic children of age 5-10 yo (gender matched) to assess type I diabetes at onset. 5. RNA consists of a sequence of nucleotides A, G, U, and C, where the first two are purines and the last two are pyrimidines. X is a random variable representing the number of purines in a certain microRNA. 6. X is the random variable representing the expression levels of gene CCDN3 Cyclin D3 in acute lymphoblastic leukaemia patients.

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

2

7. X is a random variable recording extinctions of marine invertebrate families in equal duration intervals of time. The question of interest is whether the pattern of extinction events through the fossil record is "random" in time, or whether instead extinctions tend to be clumped and occur in bursts ("mass extinctions"). 8. An RNA-seq experiment generates count data for some transcripts t1 , t2 , ..., tp with biological replicates corresponding to two conditions we want to compare. X is the random variable representing the number of reads falling in a specific transcript. 9. What about your experiment? What is the random variable that you are observing and what do you think is its hypothetical distribution? (feel free to discuss with our team).

Question 4. State the null hypothesis for: 1. An experiment testing the claim that James Bond can taste the difference between a Martini that is shaken and one that is stirred. 2. An experiment testing whether echinacea decreases the length of colds. 3. A correlational study on the relationship between brain size and intelligence. 4. An investigation of whether a self-proclaimed psychic can predict the outcome of a coin flip. 5. A study comparing a drug with a placebo on the amount of pain relief. (A one-tailed test was used.) 6. What about the experimental study you are researching? What is your null and alternative hypothesis?

Question 5. These questions are intended to assess your understanding of hypothesis testing. 1. Why do experimenters test hypotheses they think are false? 2. A significance test is performed and p-value = 0.20. Why can’t the experimenter claim that the probability that the null hypothesis is true is 0.20? 3. When is it valid to use a one-tailed test? What is the advantage of a one-tailed test? Give an example of a null hypothesis that would be tested by a one-tailed test.

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

3

4. What is the difference between a p-value and significance level.

5. How do the Type I and Type II error rates of one-tailed and two-tailed tests differ?

6. A two-tailed test p-value is 0.03. What is the one-tailed p-value if the effect were in the specified direction? What would it be if the effect were in the other direction?

7. Assume the null hypothesis is that µ = 50 and that the graph shown below is the sampling distribution of the mean (M). Would a sample value of M= 60 be significant in a two-tailed test at the 0.05 level? Roughly what value of M would be needed to be significant?

Question 6. True or false? 1. It is easier to reject the null hypothesis if the researcher uses a smaller α level. 2. You are more likely to make a Type I error when the number of samples is small (3) than when the number of samples is large (100). 3. You accept the alternative hypothesis when you reject the null hypothesis 4. A researcher risks making a Type I error any time the null hypothesis is rejected.

Question 7. A result is called "statistically significant" when: r The null hypothesis is true. r The null hypothesis is rejected. r The p-value is less than or equal to the significance level. r The p-value is larger than the significance level.

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

4

Question 8. A randomized trial comparing efficacy of two regimens showed that the difference is significant with p = 0.02. But in reality the two drugs do no differ in their efficacy. This is an example of: r Type I error (α-error) r Type II error (β-error) r The null hypothesis is true. r The alternative hypothesis is true. r None of them

Question 9. These exercises intend to give you practice in writing down the null and alternative hypotheses and the Type I and Type II errors. Definition: The Type I Error is to reject a true null hypothesis. The Type II Error is to fail to reject a false null hypothesis. 1. The national institute of allergy and infectious disease conducted a randomized single-blind trial to determine the rate of success of inoculation with different dilutions of smallpox vaccine. A total of 680 adult volunteers 18-32 years of age were randomly assigned to receive undiluted vaccine or 1:5 dilution of vaccine. The primary end point of the study was the rate of success of vaccination defined by the presence of a primary vesicle at inoculation site 9 days after inoculation. 2. Following exercise 4 question 6, in the context of your experimental study , what would be the type I and type II error?

Question 10. Suppose the normal level of haemoglobin (Hb) in children is 13.2 g/dl. A study on a random sample of 10 children with chronic diarrhoea revealed that the mean is 12.6 g/dl and standard deviation is 1.77 g/dl. The objective is to find out whether children with chronic diarrhoea, on average, have less Hb level or not. What are the null and the alternative hypotheses for this study? r H0 : µ = 12.6 and Ha : r H0 : µ = 13.2 and Ha : r H0 : µ = 13.2 and Ha : r H0 : µ = 12.6 and Ha : r None of them

µ > 13.2 µ < 13.2 µ ̸= 13.2 µ ̸= 13.2

Question 11. In a study case-control, a screening of 50 markers for ovarian cancer is assayed. We report the raw and adjusted p-values of a statistical test with 2 different multiple correction methods: the False Discovery Rate (FDR Benjamini Hochberg) and the Bonferroni correction. The raw and adjusted p-values are displayed in the following table. Which column (A) or (B) corresponds to the FDR correction? r Column A r Column B

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

5 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Raw p-value 2.35E-07 2.10E-05 2.58E-05 9.81E-05 1.05E-04 1.24E-04 1.33E-04 1.57E-04 2.25E-04 3.80E-04 6.11E-04 1.61E-03 3.30E-03 3.54E-03 5.24E-03 6.83E-03 7.06E-03 8.81E-03 9.40E-03 1.13E-02 2.12E-02 4.92E-02 6.05E-02 6.26E-02 7.40E-02 8.28E-02 8.63E-02 1.19E-01 1.89E-01 2.06E-01 2.21E-01 2.86E-01 3.05E-01 4.66E-01 4.83E-01 4.92E-01 5.32E-01 5.75E-01 5.78E-01 6.19E-01 6.36E-01 6.45E-01 6.56E-01 6.89E-01 7.19E-01 8.18E-01 8.27E-01 8.97E-01 9.12E-01 9.44E-01

Column (A) 1.18E-05 4.29E-04 4.29E-04 9.47E-04 9.47E-04 9.47E-04 9.47E-04 9.80E-04 1.25E-03 1.90E-03 2.78E-03 6.72E-03 1.26E-02 1.26E-02 1.75E-02 2.08E-02 2.08E-02 2.45E-02 2.47E-02 2.82E-02 5.04E-02 1.12E-01 1.30E-01 1.30E-01 1.48E-01 1.59E-01 1.60E-01 2.13E-01 3.26E-01 3.43E-01 3.56E-01 4.46E-01 4.62E-01 6.84E-01 6.84E-01 6.84E-01 7.19E-01 7.41E-01 7.41E-01 7.63E-01 7.63E-01 7.63E-01 7.63E-01 7.82E-01 7.99E-01 8.80E-01 8.80E-01 9.30E-01 9.30E-01 9.44E-01

Column (B) 1.18E-05 1.05E-03 1.29E-03 4.91E-03 5.26E-03 6.21E-03 6.63E-03 7.84E-03 1.13E-02 1.90E-02 3.06E-02 8.07E-02 1.65E-01 1.77E-01 2.62E-01 3.42E-01 3.53E-01 4.40E-01 4.70E-01 5.65E-01 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

6

Extra exercises Question 12. A random sample size n is drawn from a population with µ = 1100 and δ = 200. Which sample size would be necessary to reduce the standard error of the mean (SEM) to 20? r 10 r 100 r 50 r None of them Question 13. Which of these statements are true? r The standard deviation shows the variation between individual values. r The standard error of the mean shows the degree of variation between sample means. r The standard error of the mean gets smaller as your samples get larger. r The standard deviation does not change significantly as you acquire more data. r None of them Question 14. A study concluded that a mean heart rate variation of 50 beats/min with a standard deviation of 23 in 580 normal healthy subjects. What proportion of "individuals" can be expected to have a heart rate variation between 27 and 73 assuming a normal distribution? (Hint: Use Empirical Rule) r 68% r 95% r 99.7% r None of them Question 15. In the study described in the question 5, repeated samples of 9 healthy individuals are randomly selected, what proportion will have a mean between 27 and 73 beats/min? (Hint: Use Central Limit Theorem then Empirical Rule) r 68% r 95% r 99.7% r None of them Question 16. In a large population of adults, the mean IQ is 112 with a standard deviation of 20. Suppose that 200 adults are randomly selected for a market research campaign. The distribution of the mean IQ is: r Exactly normal, with a mean = 112 and a standard deviation = 20 r Approximately normal, with a mean = 112 and a standard deviation = 20 r Approximately normal, with a mean = 112 and a standard deviation = 1.414 r Approximately normal, with a mean = 112 and a standard deviation = 0.1

Dr Kim-Anh Lê Cao (The University of Queensland Diamantina Institute, TRI) Statistics short course series for frightened bioresearchers, Year 2015.

Suggest Documents