16.584: Statistics: Data Analysis, Estimates and Confidence Intervals

1 16.584: Statistics: Data Analysis, Estimates and Confidence Intervals • Statistics deals with measurement and analysis of data with the goal of qua...

Author: Bernadette Burns

2 downloads 1 Views 61KB Size

Report

Download PDF

Recommend Documents

2013. Practical Use of Statistics. Point Estimates vs. Confidence Intervals

Constructing Confidence Intervals based on Register Statistics

AP Statistics Summary of Confidence Intervals and Hypothesis Tests

MATH 10: Elementary Statistics and Probability Chapter 8: Confidence Intervals

Confidence Intervals

From sampling distributions to confidence intervals. Sociology 360 Statistics for Sociologists I Chapter 14 Confidence Intervals

Confidence intervals for rank statistics: Somers D and extensions

CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Statistics and Data Analysis

AP Statistics Chapter Confidence Intervals for Proportions GRN s

Chapter 9: Confidence Intervals. Statistical Estimation Point Estimation Interval Estimation. Confidence Intervals One-sided Confidence Intervals

CONFIDENCE INTERVALS FOR N IN THE EXPONENTIAL ORDER STATISTICS PROBLEM

Statistical Inference. Confidence Intervals

Notes 7: Confidence Intervals

Better Binomial Confidence Intervals

Bootstrap Confidence Intervals

Module 4 Confidence Intervals

Bootstrap Confidence Intervals

BIOM5010: Statistics #2G. Confidence Intervals Statistical Testing Statistical Power

Confidence Intervals for Ranks

1

16.584: Statistics: Data Analysis, Estimates and Confidence Intervals • Statistics deals with measurement and analysis of data with the goal of quantifying the randomness in the data • Data analyzed is often a finite sample of size n drawn from a population

– For example: Polling a group of 800 people (Sample Size n=800) from a population that may include 10 million people and publishing a result such as: 30% of people polled will vote for a particular candidate – Inspecting a sample of n = 50 items from a production lot of 1000 (population size) and determining the number of items that are within tolerance

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

2

Questions for the statistician • What could be the variation in the statistical metric of interest for the selected sample size? • What is the interval over which this metric would vary when different samples of size n are sampled from the population? • Is there a distribution that one can hypothesize to describe the statistical descriptors of interest ?

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

3

Sample Statistics • Sample Mean : Given a dataset of n samples: xi , i = 1, 2..n, Sample ¯ = µˆ X = 1 Pn xi Mean: X N i=1 • Sample Variance: Measures the degree of variation or spread in the data 2 1 PN 2 about the sample mean : s2 = σˆ X = n−1 (x − µ ˆ ) i X i=1 √ • Sample Standard Deviation : s = s2 = σˆ X

• Other statistics:

– Sample range (Minimum : Maximum ), Median – Quartiles (First, Second (Median), Third ) : Locations that divide data into quarters – Percentiles: pth percentile (p=0:100) of the sample data divides the data such that p% of the data are less than the pth percentile and (100 − p)% are above this location.

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

4

Measurement Errors • Difference between a measured value and true value is a Measurement Error : Consists of two components: Bias and Random Error • : Measurement = True Value + Bias + Random Error

• Bias = µˆ X¯ − T rue V alue : Difference between the mean value of the statistic of interest and its true value • For example : Estimating the mean value of a sample : Bias = µˆ X¯ − µX where µX is the true mean of the population from which sample was drawn • The equation used to estimate the mean value : µˆ X = n1 Pni=1 xi is an unbiased estimate : Mean or expected value of µˆ X equals the true mean µX 2 • For the variance , the unbiased estimate is σˆ X =

1 N −1

PN

i=1 (xi

• The smaller the bias, the more accurate the measuring process c

Prof. K. Chandra, September 28, 2015

− µˆ 2X )

16.584: Probability and Random Processes; ECE, UMASS Lowell

5

• Random Error : Varies with each measurement value : Has a standard deviation s which affects the Precision of the measuring process. – Represents the statistical uncertainty in the measurements

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

6

Characterizing the Sample Mean by its Mean and Variance • Assume that the data represents independently sampled random measurements of size n drawn from a population with mean µX and variance 2 σX • The sample mean µˆ X = •

•

•

1 n

Pn

i=1 xi : This is a random Expected value: E[ˆ µX ] = n1 Pni=1 E[xi] = µX 2 Variance of µˆ X : E[(ˆ µX − µX )2] = n1 σX Standard Deviation of µˆ X : √1n σX

c

Prof. K. Chandra, September 28, 2015

variable

16.584: Probability and Random Processes; ECE, UMASS Lowell

7

Variance of the sum of n independent random variables

V ar(ˆ µX ) = = = = V ar(ˆ µX ) =

2 n 1 X xi − µ X ) ] E[( n i=1 2 n n 1 X 2 X E[ 2 ( xi) − xiµX + µ2X ] n i=1 n i=1 n n 1 X 1 X X 2 xixj ] − 2µ2X + µ2X E[ 2 xi + 2 n i=1 n i=1 j=1:n,j6=i 1 (n − 1)n 2 2 2 2 n[σ µ + µ − µ ] + X X X X n2 n2 1 2 σX n

c

Prof. K. Chandra, September 28, 2015

(1) (2) (3) (4) (5)

16.584: Probability and Random Processes; ECE, UMASS Lowell

8

Central Limit Theorem : (CLT) • Consider a set of independent random samples drawn from a population 2 with mean µX and variance σX • In the limit as the sample size n → ∞, the distribution of the sample 2 σX mean µˆ X : N (µX , n ) • The distribution of the sum of n random variables Sn = 2 ) proaches N (nµX , nσX

Pn

i=1 xi

ap-

• The shape of the distribution of the original population does not influence this result if n is large enough - depends on the skewness of the population distribution

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

9

Normal Approximation to the Binomial Distribution • Recall that the Binomial RV X = Pni=1 yi where yi are Bernoulli random variables with outcomes 1 and 0 representing a Success and Failure respectively and p is probability of success • The mean and variance of Y : E[Y ] = p and V ar[Y ] = p(1 − p)

• Therefore by CLT the distribution of the sum X approaches – N (np, np(1 − p)) – Applicable typically for np > 10 and n(1 − p) > 10

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

10

Confidence Intervals • The estimates of the sample mean, sample variance etc. are referred as point estimates • The CLT provides the distribution of these point estimates under certain conditions • A Confidence Interval provides the statistician a level of confidence that, at this level (i.e CL=95%, 98%, etc.. ) there is a CL % confidence that the population mean exists in the region : µˆ X − f (σX , CL) : µˆ X + f (σX , CL) • To find a desired confidence interval, let 0 < α < 1 for which the confidence level is CL = 100(1 − α)%. • Using standard Normal tables, let zα/2 be the z value that represents P [z > zα/2] = α/2 σX • Then f (σX , CL) = zα/2 √ n where n is the sample size c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

11

σX σX √ : µ ˆ + z • The corresponding confidence interval is : µˆ X − zα/2 √ X α/2 n n

• When the population standard deviation is not known, replace the sample standard deviation s or σµˆX

σX √ n

with

• Note: For 68, 90, 95, 99, 99.7 % confidence levels , zα/2 : 1, 1.645, 1.96, 2.58, 3 respectively

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

12

Reference: Statistics for Engineers and Scientists, W.Navidi, McGrawHill.

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell