Trade breaks on the exchange floor

Trade breaks on the exchange floor When a customer calls a stock trading house to place an order to buy or sell stocks listed on the New York Stock Ex...
Author: Charlotte Ross
3 downloads 0 Views 191KB Size
Trade breaks on the exchange floor When a customer calls a stock trading house to place an order to buy or sell stocks listed on the New York Stock Exchange, the office contacts the trader, who goes to the specialist booth and says “I want to buy x shares of XYZ at $10”. The trader writes the order down on a piece of paper (“I bought x shares of XYZ at $10.”), and the person at the booth also records the trade (“I sold x shares of XYZ at $10.”). This is called executing the trade. The pieces of paper are later matched up (the matching process). If the information on the pieces of paper doesn’t match, this is called a trade break. It is labor intensive to resolve these breaks, as someone has to go back to the people involved and ask questions, so it is important to the trading house to understand and control trade breaks. The following data refer to all of the daily trades that occurred from June 1995 through May 1996 at a large New York City investment house (sorry, but I’m not allowed to say which one). For each day the total number of trades (Trade Total), total number of trade breaks (Trade Breaks), and the percent of the trades the resulted in breaks (Break Rate) are recorded. What can we say about the pattern of break rates? First, here are the data: Row

Trade date

Total breaks

Trade total

Break rate

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

6/1/95 6/2/95 6/5/95 6/6/95 6/7/95 6/8/95 6/9/95 6/12/95 6/13/95 6/14/95 6/15/95 6/16/95 6/19/95 6/20/95 6/21/95 6/22/95 6/23/95 6/26/95 6/27/95 6/28/95

208 156 586 285 175 176 217 210 166 190 277 198 186 146 180 236 236 204 169 266

2769 2805 2358 3410 3137 2816 2925 2764 2572 3081 2691 2826 2903 3974 3204 3371 3667 2804 2772 3135

7.5117 5.5615 24.8516 8.3578 5.5786 6.2500 7.4188 7.5977 6.4541 6.1668 10.2936 7.0064 6.4072 3.6739 5.6180 7.0009 6.4358 7.2753 6.0967 8.4848

c 2005, Jeffrey S. Simonoff

1

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

6/29/95 6/30/95 7/3/95 7/4/95 7/5/95 7/6/95 7/7/95 7/10/95 7/11/95 7/12/95 7/13/95 7/14/95 7/17/95 7/18/95 7/19/95 7/20/95 7/21/95 7/24/95 7/25/95 7/26/95 7/27/95 7/28/95 7/31/95 8/1/95 8/2/95 8/3/95 8/4/95 8/7/95 8/8/95 8/9/95 8/10/95 8/11/95 8/14/95 8/15/95 8/16/95 8/17/95 8/18/95 8/21/95 8/22/95 8/23/95 8/24/95 8/25/95 8/28/95 8/29/95

c 2005, Jeffrey S. Simonoff

160 209 128 * 119 234 221 271 282 234 245 250 218 246 212 262 267 269 169 252 203 697 213 227 234 205 256 232 195 214 190 205 195 164 174 334 232 181 174 165 166 188 168 161

3229 2730 2124 * 1108 3171 4004 4045 3774 3479 3416 3430 3689 3279 3522 4037 3121 3034 3751 2883 3486 3154 2631 3043 3143 3254 3433 3054 2696 2455 2224 2547 2259 2049 2173 2709 2671 2129 2646 2026 2051 2231 2089 2188

4.9551 7.6557 6.0264 * 10.7401 7.3794 5.5195 6.6996 7.4722 6.7261 7.1721 7.2886 5.9095 7.5023 6.0193 6.4900 8.5550 8.8662 4.5055 8.7409 5.8233 22.0989 8.0958 7.4597 7.4451 6.2999 7.4570 7.5966 7.2329 8.7169 8.5432 8.0487 8.6321 8.0039 8.0074 12.3293 8.6859 8.5016 6.5760 8.1441 8.0936 8.4267 8.0421 7.3583 2

65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108

8/30/95 8/31/95 9/1/95 9/4/95 9/5/95 9/6/95 9/7/95 9/8/95 9/11/95 9/12/95 9/13/95 9/14/95 9/15/95 9/18/95 9/19/95 9/20/95 9/21/95 9/22/95 9/25/95 9/26/95 9/27/95 9/28/95 9/29/95 10/2/95 10/3/95 10/4/95 10/5/95 10/6/95 10/9/95 10/10/95 10/11/95 10/12/95 10/13/95 10/16/95 10/17/95 10/18/95 10/19/95 10/20/95 10/23/95 10/24/95 10/25/95 10/26/95 10/27/95 10/30/95

c 2005, Jeffrey S. Simonoff

163 183 226 * 140 190 348 170 241 235 192 234 256 245 233 269 206 247 148 123 271 216 244 192 172 201 176 222 243 193 196 164 132 197 189 122 241 186 238 180 249 227 258 235

2959 2810 2658 * 1958 2609 2560 2294 1887 1911 2238 2932 3086 3017 4145 4001 3244 2839 2278 1819 2626 2687 2887 2486 2517 3112 2530 2152 2717 2066 3616 2446 2464 2499 2546 2379 3223 3161 2379 3220 4594 3513 3667 2368

5.5086 6.5125 8.5026 * 7.1502 7.2825 13.5938 7.4106 12.7716 12.2972 8.5791 7.9809 8.2955 8.1206 5.6212 6.7233 6.3502 8.7002 6.4969 6.7620 10.3199 8.0387 8.4517 7.7233 6.8335 6.4589 6.9565 10.3160 8.9437 9.3417 5.4204 6.7048 5.3571 7.8832 7.4234 5.1282 7.4775 5.8842 10.0042 5.5901 5.4201 6.4617 7.0357 9.9240 3

109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152

10/31/95 11/1/95 11/2/95 11/3/95 11/6/95 11/7/95 11/8/95 11/9/95 11/10/95 11/13/95 11/14/95 11/15/95 11/16/95 11/17/95 11/20/95 11/21/95 11/22/95 11/24/95 11/27/95 11/28/95 11/29/95 11/30/95 12/1/95 12/4/95 12/5/95 12/6/95 12/7/95 12/8/95 12/11/95 12/12/95 12/13/95 12/14/95 12/15/95 12/18/95 12/19/95 12/20/95 12/21/95 12/22/95 12/26/95 12/27/95 12/28/95 12/29/95 1/2/96 1/3/96

c 2005, Jeffrey S. Simonoff

161 157 257 219 209 163 191 176 218 158 139 193 205 196 297 176 224 234 79 207 198 186 275 292 186 187 271 289 245 157 177 172 405 350 317 244 314 188 180 143 166 165 196 225

1823 1590 1659 3221 2560 2453 2950 2466 2645 2543 2168 2288 3781 2871 3475 4316 4263 2940 1115 2974 3225 2754 3470 3138 2897 3624 2680 2902 2188 2651 2844 2340 1897 3272 3507 3399 3535 2733 2187 1450 1429 2828 2860 3179

8.8316 9.8742 15.4913 6.7991 8.1641 6.6449 6.4746 7.1371 8.2420 6.2131 6.4114 8.4353 5.4218 6.8269 8.5468 4.0778 5.2545 7.9592 7.0852 6.9603 6.1395 6.7538 7.9251 9.3053 6.4204 5.1600 10.1119 9.9586 11.1974 5.9223 6.2236 7.3504 21.3495 10.6968 9.0391 7.1786 8.8826 6.8789 8.2305 9.8621 11.6165 5.8345 6.8531 7.0777 4

153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196

1/4/96 1/5/96 1/8/96 1/9/96 1/10/96 1/11/96 1/12/96 1/15/96 1/16/96 1/17/96 1/18/96 1/19/96 1/22/96 1/23/96 1/24/96 1/25/96 1/26/96 1/29/96 1/30/96 1/31/96 2/1/96 2/2/96 2/5/96 2/6/96 2/7/96 2/8/96 2/9/96 2/12/96 2/13/96 2/14/96 2/15/96 2/16/96 2/20/96 2/21/96 2/22/96 2/23/96 2/26/96 2/27/96 2/28/96 2/29/96 3/1/96 3/4/96 3/5/96 3/6/96

c 2005, Jeffrey S. Simonoff

408 252 236 154 306 280 253 216 150 200 208 212 240 237 184 182 190 182 201 185 278 254 259 183 210 270 202 229 239 236 309 237 211 218 201 168 270 194 153 293 199 262 266 160

3763 3910 3057 1465 2869 4672 3635 4154 2155 2844 3127 3166 3161 4193 3563 2877 2948 2683 3536 3913 4042 3496 3474 3564 3811 3636 3061 3884 3620 3324 3288 3871 3476 4515 3687 4009 3080 3159 2667 4116 3364 3023 3097 2741

10.8424 6.4450 7.7200 10.5119 10.6657 5.9932 6.9601 5.1998 6.9606 7.0323 6.6517 6.6961 7.5925 5.6523 5.1642 6.3260 6.4450 6.7835 5.6844 4.7278 6.8778 7.2654 7.4554 5.1347 5.5104 7.4257 6.5992 5.8960 6.6022 7.0999 9.3978 6.1224 6.0702 4.8283 5.4516 4.1906 8.7662 6.1412 5.7368 7.1186 5.9156 8.6669 8.5890 5.8373 5

197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240

3/7/96 3/8/96 3/11/96 3/12/96 3/13/96 3/14/96 3/15/96 3/18/96 3/19/96 3/20/96 3/21/96 3/22/96 3/25/96 3/26/96 3/27/96 3/28/96 3/29/96 4/1/96 4/2/96 4/3/96 4/4/96 4/8/96 4/9/96 4/10/96 4/11/96 4/12/96 4/15/96 4/16/96 4/17/96 4/18/96 4/19/96 4/22/96 4/23/96 4/24/96 4/25/96 4/26/96 4/29/96 4/30/96 5/1/96 5/2/96 5/3/96 5/6/96 5/7/96 5/8/96

c 2005, Jeffrey S. Simonoff

174 221 1298 249 217 180 195 187 186 215 192 164 191 140 164 143 148 235 239 237 187 185 162 249 484 232 250 212 203 200 245 194 204 160 216 256 196 166 245 221 212 215 184 183

2831 2170 4555 3564 3304 2333 3088 2534 3965 3231 2824 3059 2702 2366 3050 2788 2685 3371 3380 2796 2740 2132 1057 2642 4085 4219 5182 2868 2672 3362 3080 2572 3901 3452 2762 3106 2657 2453 2954 1748 3141 2937 2917 2974

6.1462 10.1843 28.4962 6.9865 6.5678 7.7154 6.3148 7.3796 4.6910 6.6543 6.7989 5.3612 7.0688 5.9172 5.3770 5.1291 5.5121 6.9712 7.0710 8.4764 6.8248 8.6773 15.3264 9.4247 11.8482 5.4989 4.8244 7.3919 7.5973 5.9488 7.9545 7.5428 5.2294 4.6350 7.8204 8.2421 7.3767 6.7672 8.2938 12.6430 6.7494 7.3204 6.3079 6.1533 6

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256

5/9/96 5/10/96 5/13/96 5/14/96 5/15/96 5/16/96 5/17/96 5/20/96 5/21/96 5/22/96 5/23/96 5/24/96 5/28/96 5/29/96 5/30/96 5/31/96

191 198 258 210 157 218 228 143 162 219 197 212 241 188 151 116

4556 3158 3728 3139 3325 3475 3110 2158 3567 3415 4194 3741 5383 2665 2565 2706

4.1923 6.2698 6.9206 6.6900 4.7218 6.2734 7.3312 6.6265 4.5416 6.4129 4.6972 5.6669 4.4771 7.0544 5.8869 4.2868

We can see that there are always hundreds of trade breaks each day, and on one bad day, there were over 1000! A histogram of the number of trade breaks shows a clear long right tail:

c 2005, Jeffrey S. Simonoff

7

Frequency

100

50

0 0

200

400

600

800

1000

1200

1400

Total breaks

If we think about the process underlying trade breaks, it makes more sense to examine the break rate, rather than the total number of breaks; one way to think about the data is as a set of binomial random variables (the number of trade breaks in a set of total trades for each day), and we are interested in the pattern in trade break probabilities each day. A histogram of the break rate also shows a long right tail:

c 2005, Jeffrey S. Simonoff

8

Frequency

100

50

0 0

10

20

30

Break rate

These data constitute a time series, so examining a time series plot of the rates is sensible. There doesn’t seem to be any particular pattern to the break rates, with only some days with very high break rates showing up as spikes in the plot:

c 2005, Jeffrey S. Simonoff

9

Break rate

30

20

10

0 Date/Time

8/9/95

10/18/95

12/29/95

3/12/96

5/22/96

What factors might be related to break rates? The trading execution process is obviously one susceptible to error because of stress, carelessness, and a general lack of attention to detail. We don’t have data on the individual trades to look at, but we do know something else that could be related to the general attention level of workers — the day of the week. Here are side–by–side boxplots of the break rate separated by whether the day of the week is Monday or not:

c 2005, Jeffrey S. Simonoff

10

Break rate

30

20

10

0 Not Monday

Monday

Monday

The long right tail makes it difficult to see things clearly, but there is evidence that the break rate is higher on Mondays. The less charitable among us might call this a “hangover effect.” Is there a significant difference in break rates for Mondays versus other days of the week? A two–sample t–test would help answer this question: Two Sample T-Test and Confidence Interval Two sample T for Break rate Monday N Mean Not Monday 207 7.30 Monday 47 8.61

StDev 2.36 4.15

SE Mean 0.16 0.60

95% CI for mu (Not Monday) - mu (Monday ): ( -2.19, T-Test mu (Not Monday) = mu (Monday ) (vs not =): T= -2.93 P=0.0037 DF= 252 Both use Pooled StDev = 2.77 c 2005, Jeffrey S. Simonoff

-0.43)

11

* NOTE * N missing = 2

The t–test is significant at a .0037 level, indicating a highly significant difference in break rate. The break rate on Mondays is 1.31 percentage points higher than on the other days of the week; with an average of 3000 trades per day, that translates to an average of about 40 more trade breaks on Monday than on other days of the week. Another way to look at it is that the break rate is 18% higher on Mondays than it is the rest of the week. The confidence interval of (−2.19, −0.43) for the true average difference in break rates reinforces the significant difference by day, as it does not include zero. The t–test given above has three important assumptions. First, the data have to constitute a random sample from some population. Actually, since this is time series data, that could be a problem here, but it turns out that the time series structure is not important here. Second, the populations of break rates for the two groups (Mondays and non–Mondays) must each be (roughly) Gaussian. Third, the variances must be the same in the two populations. Are these assumptions valid here? MINITAB provides two tests of homogeneity of variance and confidence intervals for standard deviations as part of the package. The hypotheses being tested are H0 : σ12 = σ22 versus H0 : σ12 6= σ22 , where σ12 (σ22 ) is the variance for group 1 (2). Thus, a small tail probability indicates nonconstant variance. The output also includes confidence intervals for the two standard deviations; if the intervals do not overlap, that also indicates nonconstant variance. Here is the output: Homogeneity of Variance Response Factors ConfLvl

Break ra Monday 95.0000

Bonferroni confidence intervals for standard deviations Lower

Sigma

c 2005, Jeffrey S. Simonoff

Upper

N

Factor Levels 12

3.35797 2.12611

4.14593 2.36156

5.38924 2.65314

47 207

Monday Not Monday

Bartlett’s Test (normal distribution)

Test Statistic: 29.179 P-Value : 0.000

Levene’s Test (any continuous distribution)

Test Statistic: 1.159 P-Value : 0.283

The results are mixed, with the confidence intervals and Bartlett’s test indicating heteroscedasticity, while the Levene’s test does not. We could consider using the t–test that does not assume constant variance: Two Sample T-Test and Confidence Interval Two sample T for Break rate Monday N Mean Not Monday 207 7.30 Monday 47 8.61

StDev 2.36 4.15

SE Mean 0.16 0.60

95% CI for mu (Not Monday) - mu (Monday ): ( -2.57, T-Test mu (Not Monday) = mu (Monday ) (vs not =): T= -2.09 P=0.041 DF= 52

-0.05)

* NOTE * N missing = 2

The test is still significant (at a .05 level), but considerably less so. There is also the matter of normality within groups. The following plot is a normal plot of the two variables, Monday break rate and non–Monday break rate, in the same display. This picture has the advantage of assessing normality, while also illustrating the differences between the groups. The nonlinearity of both lines clearly demonstrates the c 2005, Jeffrey S. Simonoff

13

long right tails in both variables. Note, however, the points for Monday are consistently to the right of those for non–Mondays, showing that the Monday break rate distribution is to the right (that is, stochastically larger) of the non–Monday distribution.

Normal Probability Plot for Not Mond - Monday b

99

Not Mond Monday b

95 90

Percent

80 70 60 50 40 30 20 10 5

1

0

5

10

15

20

25

30

Data

The nonnormality of the break rates (within groups) means that the results of the two t–tests cannot be trusted. What we need are tests that do not require the assumptions of normality and constant variance — what are called nonparametric tests. These tests are valid when the usual t–test assumptions do not hold, but they are less likely to identify a genuine effect than the t–tests when the normality assumption does hold. The first of these tests is (Mood’s) median test. This test calculates the median of the joint sample of both groups. If there was no difference in location between the two groups, we would expect that about half of the values in each sample would be above the joint median, and about half would be below it. A pattern where one sample has most of its values above the joint median, while the other has most below the joint median, indicates a difference c 2005, Jeffrey S. Simonoff

14

in location. A chi–squared (χ2 ) test is use to assess the significance of the pattern. This test actually tests the hypothesis of whether the distribution of break rates for Mondays is different from that for non–Mondays (without specifying what that distribution might be). From a practical point of view, the test is best able to detect shifts (differences) in the medians of the two groups. Mood Median Test Mood median test for Break ra Chi-Square = 11.51

Monday Monday Not Mond

N 34 93

DF = 1

Median 7.60 6.83

P = 0.001

Q3-Q1 1.76 2.09

Individual 95.0% CIs -------+---------+---------+------(----+----------) (---+---) -------+---------+---------+------7.00 7.50 8.00

Overall median = 7.02 A 95.0% CI for median(Monday) - median(Not Mond): (0.41,1.36)

The overall (joint) median break rate is 7.02; while 72% of the Mondays had break rates higher than this, only 45% of the non–Mondays did. This difference is highly significant, with a tail probability of p = .001. The confidence interval for the difference in median break rates reinforces this, as it does not include zero. It is interesting to note that the confidence interval for the difference in medians is considerably narrower than those for the difference in means that are part of the t–test output. The nice properties of the mean depend on normality, and these long–tailed data are better described using medians. Note also that the group medians are noticeably smaller than the group means, since they are less sensitive to the long right tails. A second nonparametric test is the Wilcoxon (Mann–Whitney) rank sum test. This test is similar to a t–test, except that instead of using the true data values, the data used are the ranks of the values in the joint sample. Generally speaking, this test is more powerful than the median test.

c 2005, Jeffrey S. Simonoff

15

Mann-Whitney Confidence Interval and Test Not Mond N = 207 Median = 6.8335 Monday b N = 47 Median = 7.5977 Point estimate for ETA1-ETA2 is -0.8508 95.0 Percent CI for ETA1-ETA2 is (-1.3461,-0.3761) W = 24838.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0006 The test is significant at 0.0006 (adjusted for ties)

This test confirms what we saw before, with a tail probability p = .0006. The point estimate for ETA1-ETA2 refers to the median of all of the pairwise differences between observations in the first sample and ones in the second sample, and is thus of limited (no?) practical importance. The long right tails of the break rate variable suggest a different way to attack this problem: by using the logged value of the break rate as the variable of interest. Here are a histogram, a time series plot, side–by–side boxplots, and normal plots separated by day of the week, for the logged break rate data. The distributions are better behaved, and the difference in (logged) break rate by day of the week is easier to see.

c 2005, Jeffrey S. Simonoff

16

70 60

Frequency

50 40 30 20 10 0 0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

Logged break rate

1.5 1.4 Logged break rate

1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 Date/Time

8/9/95

c 2005, Jeffrey S. Simonoff

10/18/95

12/29/95

3/12/96

5/22/96

17

1.5

Logged break rate

1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 Not Monday

Monday

Monday

Normal Probability Plot for Not Mond - Monday l

99

Not Mond Monday l

95 90

Percent

80 70 60 50 40 30 20 10 5

1

0.7

0.9

1.1

1.3

1.5

Data

Here is a two–sample t–test for logged break rate:

c 2005, Jeffrey S. Simonoff

18

Two Sample T-Test and Confidence Interval Two sample T for Logged break Monday N Mean Not Monday 207 0.846 Monday 47 0.908

rate StDev 0.116 0.137

SE Mean 0.0081 0.020

95% CI for mu (Not Monday) - mu (Monday ): ( -0.0996, T-Test mu (Not Monday) = mu (Monday ) (vs not =): T= -3.17 P=0.0017 DF= 252 Both use Pooled StDev = 0.120

-0.023)

* NOTE * N missing = 2

The geometric means for each group are 10.846 = 7.01 for non–Mondays, and 10.908 = 8.09 for Mondays, which are highly statistically significantly different (p = .0017). The difference in means has an interesting interpretation. Since log(Monday) − log(Not Monday) = log



Monday Not Monday



,

the difference in mean log break rates (.908 − .846 = .062) is an estimate of the average of the logged ratio of Monday to non–Monday break rates. That is, 10.062 = 1.153 is an estimate of the typical ratio of Monday to non–Monday break rates, implying a 15.3% higher break rate on Mondays compared to other days of the week (this can be compared to the 18% figure that comes from the unlogged data, which is inflated by the very large values). Using logs has apparently corrected the heteroscedasticity in the data: Homogeneity of Variance Response Factors ConfLvl

Logged b Monday 95.0000

Bonferroni confidence intervals for standard deviations Lower

Sigma

Upper

0.110794

0.136792

0.177814

c 2005, Jeffrey S. Simonoff

N

Factor Levels 47

Monday 19

0.104275

0.115823

0.130123

207

Not Monday

Bartlett’s Test (normal distribution)

Test Statistic: 2.214 P-Value : 0.137

Levene’s Test (any continuous distribution)

Test Statistic: 0.011 P-Value : 0.917

What about the nonparametric tests? Here are results: Mood Median Test Mood median test for Logged b Chi-Square = 11.51

Monday Monday Not Mond

N 34 93

DF = 1

Median 0.8807 0.8346

P = 0.001

Q3-Q1 0.0982 0.1308

Individual 95.0% CIs -+---------+---------+---------+---(-----+-----------) (----+-----) -+---------+---------+---------+---0.825 0.850 0.875 0.900

Overall median = 0.8463 A 95.0% CI for median(Monday) - median(Not Mond): (0.0251,0.0796)

Mann-Whitney Confidence Interval and Test Not Mond N = 207 Median = 0.83465 Monday l N = 47 Median = 0.88068 Point estimate for ETA1-ETA2 is -0.05262 95.0 Percent CI for ETA1-ETA2 is (-0.08162,-0.02259) W = 24838.0 c 2005, Jeffrey S. Simonoff

20

Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0006 The test is significant at 0.0006 (adjusted for ties)

If you compare these results to those using the unlogged data, you’ll notice that they are identical, as they must be. The nonparametric tests are based on ranks in the data, and a monotone transformation like logging does not affect those ranks at all. Finally, we might wonder about the split of Monday versus non–Monday here. It would seem to make more sense to look at all five days of the week separately, and this turns out to be true. This leads to the generalization of the two–sample t–test know as analysis of variance (ANOVA) models. Here are descriptive statistics for logged break rate separated by day of the week: Descriptive Statistics

Variable Logged b

Day of w Monday Tuesday Wednesda Thursday Friday *

N 47 51 52 52 52 0

N* 0 0 0 0 0 2

Mean 0.9079 0.8264 0.8299 0.8677 0.8615 *

Median 0.8807 0.8301 0.8203 0.8438 0.8464 *

Tr Mean 0.8936 0.8228 0.8276 0.8632 0.8506 *

Variable Logged b

Day of w Monday Tuesday Wednesda Thursday Friday *

StDev 0.1368 0.1120 0.0997 0.1196 0.1276 *

SE Mean 0.0200 0.0157 0.0138 0.0166 0.0177 *

Min 0.6834 0.5651 0.6660 0.6224 0.6223 *

Max 1.4548 1.1854 1.0310 1.1901 1.3444 *

Q1 0.8401 0.7721 0.7425 0.7980 0.7895 *

Q3 0.9384 0.8727 0.9168 0.9074 0.9160 *

Monday is clearly the worst day of the week, with Tuesday and Wednesdays best. Thursdays and Fridays start to deteriorate also, perhaps because of people starting to anticipate the weekend. Overall, if you want to be sure that your trade is executed properly, it looks like the middle of the week is the way to go!

c 2005, Jeffrey S. Simonoff

21

Minitab commands Minitab has built into it the capability of identifying a variable as a “Date/Time” variable. When the data are stored into the saved file, however, they are stored in a numeric form. To convert back to the Date/Time format, click on Data rightarrow Change Data Type → Numeric to Date/Time. Enter the date variable (here Trade Date) under both Change numeric column: and Store date/time column in:. To construct a time series plot of a series, click on Stat → Time Series → Time Series Plot. Enter the variable name under Y. When there is a date/time variable that defines the time points, click the radio button next to Date/Time Stamp: and enter the date/time variable in the box. Two–sample t–tests are obtained by clicking on Stat → Basic Statistics → 2Sample t. There are two possible forms for the data: with the variable in one column, with a second column containing codes for the two groups (the stacked form), or with the variable separated into two columns, one for each group (the unstacked form. If the data are in stacked form, enter the variable name under Samples:, and the variable that defines the groups under Subscripts:. The subscript variable can be either numerical or text. If the data are in unstacked form, click the radio button next to Samples in different columns, and enter the two variables in the boxes next to First: and Second:, respectively. If you want the t–test that assumes equal variances in the two groups, click the box next to Assume equal variances. You can convert from stacked to unstacked form, and vice versa. To convert from stacked to unstacked, click on Data → Unstack Columns. Enter the variable(s) to be split up in the box next to Unstack the data in:. Enter the variable that defines the groups in the box next to Using subscripts in:. You can then choose where to put the new variables, and whether Minitab should name them for you. To convert from unstacked to stacked, click on Data → Stack → Columns. Enter the variables to be combined under Stack the following columns:. You can then choose where to put the stacked variable and associated variable of subscripts, and whether you want the subscripts to be the names of the variables (if you uncheck that box, the subscripts are the integers 1, 2, etc.). Tests of homogeneity of variance, and confidence intervals for standard deviations, are obtained by clicking on Stat → ANOVA → Test for Equal Variances. Enter the variable of interest in the box next to Response:, and the variable that defines the groups under Factors:. Normal plots of more than one variable on the same plot are obtained by clicking on c 2005, Jeffrey S. Simonoff

22

Graph → Probability Plot, and entering the variable names under Variables:. Note that in this two–group situation, you can only get this picture by using the data in unstacked form. The median test is obtained by clicking on Stat → Nonparametrics → Mood’s Median Test. The data can only be treated if they are in stacked form. Enter the variable of interest in the box next to Response:, and the variable that defines the groups in the box next to Factor:. To obtain a Mann–Whitney test, click on Stat → Nonparametrics → Mann-Whitney. The data can only be treated if they are in unstacked form. Enter the two variables that have the observations for the two groups in the boxes next to First Sample: and Second Sample:, respectively.

c 2005, Jeffrey S. Simonoff

23

Suggest Documents