16.584: Statistics: Data Analysis, Estimates and Confidence Intervals

1 16.584: Statistics: Data Analysis, Estimates and Confidence Intervals • Statistics deals with measurement and analysis of data with the goal of qua...
2 downloads 1 Views 61KB Size
1

16.584: Statistics: Data Analysis, Estimates and Confidence Intervals • Statistics deals with measurement and analysis of data with the goal of quantifying the randomness in the data • Data analyzed is often a finite sample of size n drawn from a population

– For example: Polling a group of 800 people (Sample Size n=800) from a population that may include 10 million people and publishing a result such as: 30% of people polled will vote for a particular candidate – Inspecting a sample of n = 50 items from a production lot of 1000 (population size) and determining the number of items that are within tolerance

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

2

Questions for the statistician • What could be the variation in the statistical metric of interest for the selected sample size? • What is the interval over which this metric would vary when different samples of size n are sampled from the population? • Is there a distribution that one can hypothesize to describe the statistical descriptors of interest ?

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

3

Sample Statistics • Sample Mean : Given a dataset of n samples: xi , i = 1, 2..n, Sample ¯ = µˆ X = 1 Pn xi Mean: X N i=1 • Sample Variance: Measures the degree of variation or spread in the data 2 1 PN 2 about the sample mean : s2 = σˆ X = n−1 (x − µ ˆ ) i X i=1 √ • Sample Standard Deviation : s = s2 = σˆ X

• Other statistics:

– Sample range (Minimum : Maximum ), Median – Quartiles (First, Second (Median), Third ) : Locations that divide data into quarters – Percentiles: pth percentile (p=0:100) of the sample data divides the data such that p% of the data are less than the pth percentile and (100 − p)% are above this location.

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

4

Measurement Errors • Difference between a measured value and true value is a Measurement Error : Consists of two components: Bias and Random Error • : Measurement = True Value + Bias + Random Error

• Bias = µˆ X¯ − T rue V alue : Difference between the mean value of the statistic of interest and its true value • For example : Estimating the mean value of a sample : Bias = µˆ X¯ − µX where µX is the true mean of the population from which sample was drawn • The equation used to estimate the mean value : µˆ X = n1 Pni=1 xi is an unbiased estimate : Mean or expected value of µˆ X equals the true mean µX 2 • For the variance , the unbiased estimate is σˆ X =

1 N −1

PN

i=1 (xi

• The smaller the bias, the more accurate the measuring process c

Prof. K. Chandra, September 28, 2015

− µˆ 2X )

16.584: Probability and Random Processes; ECE, UMASS Lowell

5

• Random Error : Varies with each measurement value : Has a standard deviation s which affects the Precision of the measuring process. – Represents the statistical uncertainty in the measurements

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

6

Characterizing the Sample Mean by its Mean and Variance • Assume that the data represents independently sampled random measurements of size n drawn from a population with mean µX and variance 2 σX • The sample mean µˆ X = •





1 n

Pn

i=1 xi : This is a random Expected value: E[ˆ µX ] = n1 Pni=1 E[xi] = µX 2 Variance of µˆ X : E[(ˆ µX − µX )2] = n1 σX Standard Deviation of µˆ X : √1n σX

c

Prof. K. Chandra, September 28, 2015

variable

16.584: Probability and Random Processes; ECE, UMASS Lowell

7

Variance of the sum of n independent random variables

V ar(ˆ µX ) = = = = V ar(ˆ µX ) =

2 n 1 X xi − µ X ) ] E[( n i=1 2 n n 1 X 2 X E[ 2 ( xi) − xiµX + µ2X ] n i=1 n i=1 n n 1 X 1 X X 2 xixj ] − 2µ2X + µ2X E[ 2 xi + 2 n i=1 n i=1 j=1:n,j6=i 1 (n − 1)n 2 2 2 2 n[σ µ + µ − µ ] + X X X X n2 n2 1 2 σX n

c

Prof. K. Chandra, September 28, 2015

(1) (2) (3) (4) (5)

16.584: Probability and Random Processes; ECE, UMASS Lowell

8

Central Limit Theorem : (CLT) • Consider a set of independent random samples drawn from a population 2 with mean µX and variance σX • In the limit as the sample size n → ∞, the distribution of the sample 2 σX mean µˆ X : N (µX , n ) • The distribution of the sum of n random variables Sn = 2 ) proaches N (nµX , nσX

Pn

i=1 xi

ap-

• The shape of the distribution of the original population does not influence this result if n is large enough - depends on the skewness of the population distribution

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

9

Normal Approximation to the Binomial Distribution • Recall that the Binomial RV X = Pni=1 yi where yi are Bernoulli random variables with outcomes 1 and 0 representing a Success and Failure respectively and p is probability of success • The mean and variance of Y : E[Y ] = p and V ar[Y ] = p(1 − p)

• Therefore by CLT the distribution of the sum X approaches – N (np, np(1 − p)) – Applicable typically for np > 10 and n(1 − p) > 10

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

10

Confidence Intervals • The estimates of the sample mean, sample variance etc. are referred as point estimates • The CLT provides the distribution of these point estimates under certain conditions • A Confidence Interval provides the statistician a level of confidence that, at this level (i.e CL=95%, 98%, etc.. ) there is a CL % confidence that the population mean exists in the region : µˆ X − f (σX , CL) : µˆ X + f (σX , CL) • To find a desired confidence interval, let 0 < α < 1 for which the confidence level is CL = 100(1 − α)%. • Using standard Normal tables, let zα/2 be the z value that represents P [z > zα/2] = α/2 σX • Then f (σX , CL) = zα/2 √ n where n is the sample size c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

11

σX σX √ : µ ˆ + z • The corresponding confidence interval is : µˆ X − zα/2 √ X α/2 n n

• When the population standard deviation is not known, replace the sample standard deviation s or σµˆX

σX √ n

with

• Note: For 68, 90, 95, 99, 99.7 % confidence levels , zα/2 : 1, 1.645, 1.96, 2.58, 3 respectively

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

12

Reference: Statistics for Engineers and Scientists, W.Navidi, McGrawHill.

c

Prof. K. Chandra, September 28, 2015

16.584: Probability and Random Processes; ECE, UMASS Lowell

Suggest Documents