Simple Statistics using MathCAD

Simple Statistics using MathCAD In this worksheet, we will first examine the built-in functions that mathCAD provides to calculate mean and standard ...
Author: Kelley Lewis
2 downloads 2 Views 98KB Size
Simple Statistics using MathCAD

In this worksheet, we will first examine the built-in functions that mathCAD provides to calculate mean and standard deviation. Using these built-in functions we will define a simple mathCAD function to calculate the standard error in the mean. Once we have examined and used the built-in functions, we will then define our own functions to calculate mean and standard deviation using mathCAD's programming language. This is intended as an exercise in mathCAD programming and should start to famililiarise you with writing programs in mathCAD.

Starting MathCAD To start MathCAD on the computers in the part 1 and part 2 laboratories, simply click on the MathCAD icon on the desktop or select MathCAD from the top level of the Start Menu. Note the MathCAD on the Physics Laboratory machines has been upgraded to version 13. This works better and is more robust against crashes, however it does mean that worksheets created or saved on the laboratory machines cannot be opend in the previous version of MathCAD installed on the main campus network.

Saving your work You should create a folder on you M: drive to store all of the worksheets you will create as part of this course. A good name for this folder would be PH240 Remember to save your work frequently.

Storing data in a vector A vector or matrix is the logical way to store a set of related data or dataset and all of mathCAD's built-in statistical functions use vectors (or matrices). We will start by creating a vector to hold a set of readings, in this case the measured thickness of a bar of metal. In this case we are told in advance that there are 10 measurements in the dataset, so we can use the menu command Insert|Matrix... to insert a matrix of 1 column and 10 rows into the worksheet. We will call the matrix 'BarThicknessA' and also tell mathCAD that all the readings are in mm7  1.05 

   1.11   1.02     1.08   1.17  BarThicknessA:=   ⋅mm  1.11   1.09     1.14   1.10     1.08 

Note that if we had a large number of readings, or didn't know in advance exactly how many we would be dealing with, it may be better to have used a Data Table (Insert|Data|Table). A Data Table will automagically resize itself as you enter data into it. Once we have our data in a vector, we can use mathCAD's built-in functions to perform some analysis on it. mean( BarThicknessA) = 1.095 × 10

−3

m

Since the original data was given in mm, we can scale the output to appear in mm by typing into the unit placeholder after the result. mean( BarThicknessA) = 1.095 mm

Similarly we can use the built -in function, stdev(), to calculate the standard deviation of the dataset. Stdev( BarThicknessA) = 0.042 mm

Note: There are 2 similar functions, stdev() and Stdev() which differ only in the capitalization of the first letter. These indicate whether we are dealing with the standard deviation of a sample, Stdev() or the standard deviation of the entire population, stdev(). In this case we will use the Sample standard deviation function, Stdev, as we have sampled the bar at a number of discrete points. There is no built-in function to calculate the error in the mean, so we will create our own function in order to calculate it. Recall that the formula for standard error in the mean is given by

σ m :=

σ N

where σ is the standard deviation and N is the number of elements in our dataset. In order to calculate the standard error in the mean, we need to know the number of elements in our dataset. Although in this case we have 10 elements, it is good practise to make functions we create work with any number of elements in our dataset. MathCAD provides several functions which allow us to retrieve information about a vector, these are located on the 'Vector and Matrix' section of the insert function dialog box. In this case the 'length()' function will what we want. length( BarThicknessA) = 10

We can now create a simple function to calculate the standard error in the mean. We can either call our function StandardErrorInMean() {which is a bit unwieldy} or σm() which means we have to remember how to create text subscripts in our variable and function names. In either case, we will build our function to take the dummy argument, va, to emind us that it is designed to work on vectors. σ m( va) :=

Stdev( va) length( va)

You can use the Greek toolbar to get the σ symbol, and full stop to introduce a text subscript.

Having created our function we can test it on our dataset of bar thicknesses. σ m( BarThicknessA) = 0.013 mm

Having explored the statistical functions on our first set of readings, we can input a second set of readings before combining the two sets together. In order to see how you are following, I've created a vector called BarThicknessB and put the data in it. I've now hidden it, in order for your woksheet to work correctly, you should create a vector called 'BarThicknessB' with 10 rows and 1 column and put the dataset from Ex1.3 into it...

We need to combine the two datasets together before we can start to analyse them. The stack() function combines 2 vectors by putting them one on top of the other. AllBarThickness:= stack( BarThicknessABarThicknessB , ) 0 0

1.05

1

1.11

2

1.02

3

1.08

4

1.17

5

1.11

6

1.09

AllBarThickness= 7

1.14

8

1.1

9

1.08

10

1.2

11

1.05

12

1.09

13

1.08

14

1.16

15

1.05

mm

By default only the first 15 or so members of the resultant vector are displayed. If you select the table with the mouse, a scrollbar should appear, allowing you to display all the members of the table.

Now we can apply the statistical functions to the combined dataset... mean( AllBarThickness ) = 1.1 mm Stdev( AllBarThickness ) = 0.046 mm σ m( AllBarThickness ) = 0.01 mm

Programming our own statistical functions Although mathCAD provides built-in statistical functions which will provide answers in these cases, frequently we want to analyse datasets in other ways, not provided for in the built-in functions. We will create our own versions of the mean() and stdev() functions as an exercise in starting to program in mathCAD. Although it is permissible to re-define mathCAD's built-in functions by giving our new functions the same names as those provided by mathCAD, it can cause confusion and will prevent us from comparing the two results. MathCAD13 will alert you a redefinition of one of its buil-in variables or functions by underlining the name of the new function when you redefine it. We will start with a simple function to calculate the mean of a dataset...

∑va myMean( va) :=

length( va)

The Σ function, whcih returns the sum of all elements in a vector may be found on the Matrix toolbar

having created it, lets test it on our original dataset myMean( BarThicknessA) = 1.095 mm

From defining simple functions to first steps in programming Both of the functions we've defined so far in the worksheet (σm and myMean) have been the standard sort of expression we're used to dealing with in mathCAD. When we come to progam our own function to calculate standard deviation, we wil need to use mathCAD's more advanced programming concepts. In Mathcad, a program is entered in the programming operator, a multi-step container for Mathcad program-control operators. Specific programming operators can be used to specify local assignments to variables or functions, loop over calculations, conditionally evaluate branches, add breakpoints, trap errors, and return values. Mathcad evaluates the sequence of statements in a program in the order specified by the programming operators then returns the result of the last step. The programming operator is also known as the 'Add Line' operator and may be found on the programming toolbar. Recall how the formula for standard deviation is given by the:

σ

2



( xi − mean( x) ) 2 N−1

i

When translating this formula into a mathCAD function, there are 2 issues that make it favourable to use a program. 1. The mean will stay constant throughout the dataset and therefore there is no need to calculate it for each member of the dataset. 2. We need to find the number of elements in the vector, N, and use it once to define the range of values our index take.in it. In the first line, the number of items in The myStdev() function is avariable programi should with 3 lines the dataset is obtained and stored in the local variable N Hence a suitable forofcalculating standard of aand vector, va,incan defined as In the second line,program the mean the items the in the datasetdeviation is obtained stored thebe variable follows: m. In the third line, the mean is subtracted from each member of the dataset in turn and squared before being combined with a summation operator.The summation is then divided by the N-1 myStdev( va) := N ← length( va) before the square root is taken to find the standard deviation. µ ← mean( va) N−1

In creating the myStdev() function 2 we have used a number of special operators which have vai − µ ) on toolbars. their own keystrokes or(places i= 0 Make sure that the programming and calculus toolbars are displayed. N−1



To start the programming operator and add lines to the program:select 'Add Line' from the myStdev( BarThicknessA) = 0.042 mm Programming toolbar or press the shortcut key ] myStdev( AllBarThickness ) = 0.046 mm The left pointing arrow assigns a value to a local variable. This is only accessible from inside the program in which it is assigned.The local assignment operator is available on the programming toolbar and has the shortcut key {

The summation operator is available on the calculus toolbar. This creates its own local variable, in this case called i, which is typically used to index into an array. The array or vector index operator is either available from the matrix toolbar, where it appears as Xn or may be selected by using the shortcut key [ Notice that in order to correctly index through all the elements in an array, the indices go from 0 to N-1, as indices in mathCAD start at 0.

Create the function myStdev() and check that it gives the same result as the built-in Stdev() function.

Using the summation operator, can you create the function myMean(), which takes a vector and returns the mean of the elements ?

The Gaussian Distribution MathCAD provides a built-in function, dnorm(x,µ,σ) which returns the normal or Gaussian distribution, referred to as G(x,µ,σ) in your other notes. Using the quickplot facility of MathCAD, create a graph of dnorm. Use variables for µ and σ so that you can easily see the effect of changing the mean and standard deviation of the distribution. Mean of distribution (0) µ := 0 σ := 1

Standard deviation, if this is 1 then the x axis will be scaled in standard deviations

0.4

dnorm( x , µ , σ ) 0.2

0

4

2

0 x

2

4

You will need to set the limits on the axes of the graph in order to get the plot like this. Can you create your own function G(x,µ,σ) using the definition given in the statistics worksheet ? Compare your function with dnorm(x,µ,σ) on a graph. MathCAD also provides a function pnorm(x,µ,σ) which models the cumulative distribution, here I have plotted the 2 functions on the same graph. 1

dnorm( x , µ , σ ) pnorm( x , µ , σ )

0.5

0

4

2

0 x

2

4

Simple Statistics using MathCAD

In this worksheet, we will first examine the built-in functions that mathCAD provides to calculate mean and standard deviation. Using these built-in functions we will define a simple mathCAD function to calculate the standard error in the mean. Once we have examined and used the built-in functions, we will then define our own functions to calculate mean and standard deviation using mathCAD's programming language. This is intended as an exercise in mathCAD programming and should start to famililiarise you with writing programs in mathCAD.

Starting MathCAD To start MathCAD on the computers in the part 1 and part 2 laboratories, simply click on the MathCAD icon on the desktop or select MathCAD from the top level of the Start Menu. Note the MathCAD on the Physics Laboratory machines has been upgraded to version 13. This works better and is more robust against crashes, however it does mean that worksheets created or saved on the laboratory machines cannot be opend in the previous version of MathCAD installed on the main campus network.

Saving your work You should create a folder on you M: drive to store all of the worksheets you will create as part of this course. A good name for this folder would be PH240 Remember to save your work frequently.

Storing data in a vector A vector or matrix is the logical way to store a set of related data or dataset and all of mathCAD's built-in statistical functions use vectors (or matrices). We will start by creating a vector to hold a set of readings, in this case the measured thickness of a bar of metal. In this case we are told in advance that there are 10 measurements in the dataset, so we can use the menu command Insert|Matrix... to insert a matrix of 1 column and 10 rows into the worksheet. We will call the matrix 'BarThicknessA' and also tell mathCAD that all the readings are in mm7  1.05 

   1.11   1.02     1.08   1.17  BarThicknessA:=   ⋅mm  1.11   1.09     1.14   1.10     1.08 

Note that if we had a large number of readings, or didn't know in advance exactly how many we would be dealing with, it may be better to have used a Data Table (Insert|Data|Table). A Data Table will automagically resize itself as you enter data into it. Once we have our data in a vector, we can use mathCAD's built-in functions to perform some analysis on it. mean( BarThicknessA) = 1.095 × 10

−3

m

Since the original data was given in mm, we can scale the output to appear in mm by typing into the unit placeholder after the result. mean( BarThicknessA) = 1.095 mm

Similarly we can use the built -in function, stdev(), to calculate the standard deviation of the dataset. Stdev( BarThicknessA) = 0.042 mm

Note: There are 2 similar functions, stdev() and Stdev() which differ only in the capitalization of the first letter. These indicate whether we are dealing with the standard deviation of a sample, Stdev() or the standard deviation of the entire population, stdev(). In this case we will use the Sample standard deviation function, Stdev, as we have sampled the bar at a number of discrete points. There is no built-in function to calculate the error in the mean, so we will create our own function in order to calculate it. Recall that the formula for standard error in the mean is given by

σ m :=

σ N

where σ is the standard deviation and N is the number of elements in our dataset. In order to calculate the standard error in the mean, we need to know the number of elements in our dataset. Although in this case we have 10 elements, it is good practise to make functions we create work with any number of elements in our dataset. MathCAD provides several functions which allow us to retrieve information about a vector, these are located on the 'Vector and Matrix' section of the insert function dialog box. In this case the 'length()' function will what we want. length( BarThicknessA) = 10

We can now create a simple function to calculate the standard error in the mean. We can either call our function StandardErrorInMean() {which is a bit unwieldy} or σm() which means we have to remember how to create text subscripts in our variable and function names. In either case, we will build our function to take the dummy argument, va, to emind us that it is designed to work on vectors. σ m( va) :=

Stdev( va) length( va)

You can use the Greek toolbar to get the σ symbol, and full stop to introduce a text subscript.

Having created our function we can test it on our dataset of bar thicknesses. σ m( BarThicknessA) = 0.013 mm

Having explored the statistical functions on our first set of readings, we can input a second set of readings before combining the two sets together. In order to see how you are following, I've created a vector called BarThicknessB and put the data in it. I've now hidden it, in order for your woksheet to work correctly, you should create a vector called 'BarThicknessB' with 10 rows and 1 column and put the dataset from Ex1.3 into it...

We need to combine the two datasets together before we can start to analyse them. The stack() function combines 2 vectors by putting them one on top of the other. AllBarThickness:= stack( BarThicknessABarThicknessB , ) 0 0

1.05

1

1.11

2

1.02

3

1.08

4

1.17

5

1.11

6

1.09

AllBarThickness= 7

1.14

8

1.1

9

1.08

10

1.2

11

1.05

12

1.09

13

1.08

14

1.16

15

1.05

mm

By default only the first 15 or so members of the resultant vector are displayed. If you select the table with the mouse, a scrollbar should appear, allowing you to display all the members of the table.

Now we can apply the statistical functions to the combined dataset... mean( AllBarThickness ) = 1.1 mm Stdev( AllBarThickness ) = 0.046 mm σ m( AllBarThickness ) = 0.01 mm

Programming our own statistical functions Although mathCAD provides built-in statistical functions which will provide answers in these cases, frequently we want to analyse datasets in other ways, not provided for in the built-in functions. We will create our own versions of the mean() and stdev() functions as an exercise in starting to program in mathCAD. Although it is permissible to re-define mathCAD's built-in functions by giving our new functions the same names as those provided by mathCAD, it can cause confusion and will prevent us from comparing the two results. MathCAD13 will alert you a redefinition of one of its buil-in variables or functions by underlining the name of the new function when you redefine it. We will start with a simple function to calculate the mean of a dataset...

∑va myMean( va) :=

length( va)

The Σ function, whcih returns the sum of all elements in a vector may be found on the Matrix toolbar

having created it, lets test it on our original dataset myMean( BarThicknessA) = 1.095 mm

From defining simple functions to first steps in programming Both of the functions we've defined so far in the worksheet (σm and myMean) have been the standard sort of expression we're used to dealing with in mathCAD. When we come to progam our own function to calculate standard deviation, we wil need to use mathCAD's more advanced programming concepts. In Mathcad, a program is entered in the programming operator, a multi-step container for Mathcad program-control operators. Specific programming operators can be used to specify local assignments to variables or functions, loop over calculations, conditionally evaluate branches, add breakpoints, trap errors, and return values. Mathcad evaluates the sequence of statements in a program in the order specified by the programming operators then returns the result of the last step. The programming operator is also known as the 'Add Line' operator and may be found on the programming toolbar. Recall how the formula for standard deviation is given by the:

σ

2



( xi − mean( x) ) 2 N−1

i

When translating this formula into a mathCAD function, there are 2 issues that make it favourable to use a program. 1. The mean will stay constant throughout the dataset and therefore there is no need to calculate it for each member of the dataset. 2. We need to find the number of elements in the vector, N, and use it once to define the range of values our index take.in it. In the first line, the number of items in The myStdev() function is avariable programi should with 3 lines the dataset is obtained and stored in the local variable N Hence a suitable forofcalculating standard of aand vector, va,incan defined as In the second line,program the mean the items the in the datasetdeviation is obtained stored thebe variable follows: m. In the third line, the mean is subtracted from each member of the dataset in turn and squared before being combined with a summation operator.The summation is then divided by the N-1 myStdev( va) := N ← length( va) before the square root is taken to find the standard deviation. µ ← mean( va) N−1

In creating the myStdev() function 2 we have used a number of special operators which have vai − µ ) on toolbars. their own keystrokes or(places i= 0 Make sure that the programming and calculus toolbars are displayed. N−1



To start the programming operator and add lines to the program:select 'Add Line' from the myStdev( BarThicknessA) = 0.042 mm Programming toolbar or press the shortcut key ] myStdev( AllBarThickness ) = 0.046 mm The left pointing arrow assigns a value to a local variable. This is only accessible from inside the program in which it is assigned.The local assignment operator is available on the programming toolbar and has the shortcut key {

The summation operator is available on the calculus toolbar. This creates its own local variable, in this case called i, which is typically used to index into an array. The array or vector index operator is either available from the matrix toolbar, where it appears as Xn or may be selected by using the shortcut key [ Notice that in order to correctly index through all the elements in an array, the indices go from 0 to N-1, as indices in mathCAD start at 0.

Create the function myStdev() and check that it gives the same result as the built-in Stdev() function.

Using the summation operator, can you create the function myMean(), which takes a vector and returns the mean of the elements ?

The Gaussian Distribution MathCAD provides a built-in function, dnorm(x,µ,σ) which returns the normal or Gaussian distribution, referred to as G(x,µ,σ) in your other notes. Using the quickplot facility of MathCAD, create a graph of dnorm. Use variables for µ and σ so that you can easily see the effect of changing the mean and standard deviation of the distribution. Mean of distribution (0) µ := 0 σ := 1

Standard deviation, if this is 1 then the x axis will be scaled in standard deviations

0.4

dnorm( x , µ , σ ) 0.2

0

4

2

0 x

2

4

You will need to set the limits on the axes of the graph in order to get the plot like this. Can you create your own function G(x,µ,σ) using the definition given in the statistics worksheet ? Compare your function with dnorm(x,µ,σ) on a graph. MathCAD also provides a function pnorm(x,µ,σ) which models the cumulative distribution, here I have plotted the 2 functions on the same graph. 1

dnorm( x , µ , σ ) pnorm( x , µ , σ )

0.5

0

4

2

0 x

2

4