Brian Silverman | 32ed54e | 2018-08-04 23:37:28 -0700 | [diff] [blame^] | 1 | |
| 2 | [section:st_eg Student's t Distribution Examples] |
| 3 | |
| 4 | [section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution] |
| 5 | |
| 6 | Let's say you have a sample mean, you may wish to know what confidence intervals |
| 7 | you can place on that mean. Colloquially: "I want an interval that I can be |
| 8 | P% sure contains the true mean". (On a technical point, note that |
| 9 | the interval either contains the true mean or it does not: the |
| 10 | meaning of the confidence level is subtly |
| 11 | different from this colloquialism. More background information can be found on the |
| 12 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]). |
| 13 | |
| 14 | The formula for the interval can be expressed as: |
| 15 | |
| 16 | [equation dist_tutorial4] |
| 17 | |
| 18 | Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation, |
| 19 | /N/ is the sample size, /[alpha]/ is the desired significance level and |
| 20 | ['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t |
| 21 | distribution with /N-1/ degrees of freedom. |
| 22 | |
| 23 | [note |
| 24 | The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting |
| 25 | the null-hypothesis. The smaller the value of [alpha] the greater the |
| 26 | strength of the test. |
| 27 | |
| 28 | The confidence level of the test is defined as 1 - [alpha], and often expressed |
| 29 | as a percentage. So for example a significance level of 0.05, is equivalent |
| 30 | to a 95% confidence level. Refer to |
| 31 | [@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm |
| 32 | "What are confidence intervals?"] in __handbook for more information. |
| 33 | ] [/ Note] |
| 34 | |
| 35 | [note |
| 36 | The usual assumptions of |
| 37 | [@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)] |
| 38 | variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution] |
| 39 | of course apply here, as they do in other examples. |
| 40 | ] |
| 41 | |
| 42 | From the formula, it should be clear that: |
| 43 | |
| 44 | * The width of the confidence interval decreases as the sample size increases. |
| 45 | * The width increases as the standard deviation increases. |
| 46 | * The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger). |
| 47 | * The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger). |
| 48 | |
| 49 | The following example code is taken from the example program |
| 50 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. |
| 51 | |
| 52 | We'll begin by defining a procedure to calculate intervals for |
| 53 | various confidence levels; the procedure will print these out |
| 54 | as a table: |
| 55 | |
| 56 | // Needed includes: |
| 57 | #include <boost/math/distributions/students_t.hpp> |
| 58 | #include <iostream> |
| 59 | #include <iomanip> |
| 60 | // Bring everything into global namespace for ease of use: |
| 61 | using namespace boost::math; |
| 62 | using namespace std; |
| 63 | |
| 64 | void confidence_limits_on_mean( |
| 65 | double Sm, // Sm = Sample Mean. |
| 66 | double Sd, // Sd = Sample Standard Deviation. |
| 67 | unsigned Sn) // Sn = Sample Size. |
| 68 | { |
| 69 | using namespace std; |
| 70 | using namespace boost::math; |
| 71 | |
| 72 | // Print out general info: |
| 73 | cout << |
| 74 | "__________________________________\n" |
| 75 | "2-Sided Confidence Limits For Mean\n" |
| 76 | "__________________________________\n\n"; |
| 77 | cout << setprecision(7); |
| 78 | cout << setw(40) << left << "Number of Observations" << "= " << Sn << "\n"; |
| 79 | cout << setw(40) << left << "Mean" << "= " << Sm << "\n"; |
| 80 | cout << setw(40) << left << "Standard Deviation" << "= " << Sd << "\n"; |
| 81 | |
| 82 | We'll define a table of significance/risk levels for which we'll compute intervals: |
| 83 | |
| 84 | double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; |
| 85 | |
| 86 | Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999). |
| 87 | |
| 88 | Next we'll declare the distribution object we'll need, note that |
| 89 | the /degrees of freedom/ parameter is the sample size less one: |
| 90 | |
| 91 | students_t dist(Sn - 1); |
| 92 | |
| 93 | Most of what follows in the program is pretty printing, so let's focus |
| 94 | on the calculation of the interval. First we need the t-statistic, |
| 95 | computed using the /quantile/ function and our significance level. Note |
| 96 | that since the significance levels are the complement of the probability, |
| 97 | we have to wrap the arguments in a call to /complement(...)/: |
| 98 | |
| 99 | double T = quantile(complement(dist, alpha[i] / 2)); |
| 100 | |
| 101 | Note that alpha was divided by two, since we'll be calculating |
| 102 | both the upper and lower bounds: had we been interested in a single |
| 103 | sided interval then we would have omitted this step. |
| 104 | |
| 105 | Now to complete the picture, we'll get the (one-sided) width of the |
| 106 | interval from the t-statistic |
| 107 | by multiplying by the standard deviation, and dividing by the square |
| 108 | root of the sample size: |
| 109 | |
| 110 | double w = T * Sd / sqrt(double(Sn)); |
| 111 | |
| 112 | The two-sided interval is then the sample mean plus and minus this width. |
| 113 | |
| 114 | And apart from some more pretty-printing that completes the procedure. |
| 115 | |
| 116 | Let's take a look at some sample output, first using the |
| 117 | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm |
| 118 | Heat flow data] from the NIST site. The data set was collected |
| 119 | by Bob Zarr of NIST in January, 1990 from a heat flow meter |
| 120 | calibration and stability analysis. |
| 121 | The corresponding dataplot |
| 122 | output for this test can be found in |
| 123 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm |
| 124 | section 3.5.2] of the __handbook. |
| 125 | |
| 126 | [pre''' |
| 127 | __________________________________ |
| 128 | 2-Sided Confidence Limits For Mean |
| 129 | __________________________________ |
| 130 | |
| 131 | Number of Observations = 195 |
| 132 | Mean = 9.26146 |
| 133 | Standard Deviation = 0.02278881 |
| 134 | |
| 135 | |
| 136 | ___________________________________________________________________ |
| 137 | Confidence T Interval Lower Upper |
| 138 | Value (%) Value Width Limit Limit |
| 139 | ___________________________________________________________________ |
| 140 | 50.000 0.676 1.103e-003 9.26036 9.26256 |
| 141 | 75.000 1.154 1.883e-003 9.25958 9.26334 |
| 142 | 90.000 1.653 2.697e-003 9.25876 9.26416 |
| 143 | 95.000 1.972 3.219e-003 9.25824 9.26468 |
| 144 | 99.000 2.601 4.245e-003 9.25721 9.26571 |
| 145 | 99.900 3.341 5.453e-003 9.25601 9.26691 |
| 146 | 99.990 3.973 6.484e-003 9.25498 9.26794 |
| 147 | 99.999 4.537 7.404e-003 9.25406 9.26886 |
| 148 | '''] |
| 149 | |
| 150 | As you can see the large sample size (195) and small standard deviation (0.023) |
| 151 | have combined to give very small intervals, indeed we can be |
| 152 | very confident that the true mean is 9.2. |
| 153 | |
| 154 | For comparison the next example data output is taken from |
| 155 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. |
| 156 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 |
| 157 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] |
| 158 | The values result from the determination of mercury by cold-vapour |
| 159 | atomic absorption. |
| 160 | |
| 161 | [pre''' |
| 162 | __________________________________ |
| 163 | 2-Sided Confidence Limits For Mean |
| 164 | __________________________________ |
| 165 | |
| 166 | Number of Observations = 3 |
| 167 | Mean = 37.8000000 |
| 168 | Standard Deviation = 0.9643650 |
| 169 | |
| 170 | |
| 171 | ___________________________________________________________________ |
| 172 | Confidence T Interval Lower Upper |
| 173 | Value (%) Value Width Limit Limit |
| 174 | ___________________________________________________________________ |
| 175 | 50.000 0.816 0.455 37.34539 38.25461 |
| 176 | 75.000 1.604 0.893 36.90717 38.69283 |
| 177 | 90.000 2.920 1.626 36.17422 39.42578 |
| 178 | 95.000 4.303 2.396 35.40438 40.19562 |
| 179 | 99.000 9.925 5.526 32.27408 43.32592 |
| 180 | 99.900 31.599 17.594 20.20639 55.39361 |
| 181 | 99.990 99.992 55.673 -17.87346 93.47346 |
| 182 | 99.999 316.225 176.067 -138.26683 213.86683 |
| 183 | '''] |
| 184 | |
| 185 | This time the fact that there are only three measurements leads to |
| 186 | much wider intervals, indeed such large intervals that it's hard |
| 187 | to be very confident in the location of the mean. |
| 188 | |
| 189 | [endsect] |
| 190 | |
| 191 | [section:tut_mean_test Testing a sample mean for difference from a "true" mean] |
| 192 | |
| 193 | When calibrating or comparing a scientific instrument or measurement method of some kind, |
| 194 | we want to be answer the question "Does an observed sample mean differ from the |
| 195 | "true" mean in any significant way?". If it does, then we have evidence of |
| 196 | a systematic difference. This question can be answered with a Students-t test: |
| 197 | more information can be found |
| 198 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm |
| 199 | on the NIST site]. |
| 200 | |
| 201 | Of course, the assignment of "true" to one mean may be quite arbitrary, |
| 202 | often this is simply a "traditional" method of measurement. |
| 203 | |
| 204 | The following example code is taken from the example program |
| 205 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. |
| 206 | |
| 207 | We'll begin by defining a procedure to determine which of the |
| 208 | possible hypothesis are rejected or not-rejected |
| 209 | at a given significance level: |
| 210 | |
| 211 | [note |
| 212 | Non-statisticians might say 'not-rejected' means 'accepted', |
| 213 | (often of the null-hypothesis) implying, wrongly, that there really *IS* no difference, |
| 214 | but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'. |
| 215 | 'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference. |
| 216 | For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and |
| 217 | [@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.] |
| 218 | ] [/ note] |
| 219 | |
| 220 | |
| 221 | // Needed includes: |
| 222 | #include <boost/math/distributions/students_t.hpp> |
| 223 | #include <iostream> |
| 224 | #include <iomanip> |
| 225 | // Bring everything into global namespace for ease of use: |
| 226 | using namespace boost::math; |
| 227 | using namespace std; |
| 228 | |
| 229 | void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha) |
| 230 | { |
| 231 | // |
| 232 | // M = true mean. |
| 233 | // Sm = Sample Mean. |
| 234 | // Sd = Sample Standard Deviation. |
| 235 | // Sn = Sample Size. |
| 236 | // alpha = Significance Level. |
| 237 | |
| 238 | Most of the procedure is pretty-printing, so let's just focus on the |
| 239 | calculation, we begin by calculating the t-statistic: |
| 240 | |
| 241 | // Difference in means: |
| 242 | double diff = Sm - M; |
| 243 | // Degrees of freedom: |
| 244 | unsigned v = Sn - 1; |
| 245 | // t-statistic: |
| 246 | double t_stat = diff * sqrt(double(Sn)) / Sd; |
| 247 | |
| 248 | Finally calculate the probability from the t-statistic. If we're interested |
| 249 | in simply whether there is a difference (either less or greater) or not, |
| 250 | we don't care about the sign of the t-statistic, |
| 251 | and we take the complement of the probability for comparison |
| 252 | to the significance level: |
| 253 | |
| 254 | students_t dist(v); |
| 255 | double q = cdf(complement(dist, fabs(t_stat))); |
| 256 | |
| 257 | The procedure then prints out the results of the various tests |
| 258 | that can be done, these |
| 259 | can be summarised in the following table: |
| 260 | |
| 261 | [table |
| 262 | [[Hypothesis][Test]] |
| 263 | [[The Null-hypothesis: there is |
| 264 | *no difference* in means] |
| 265 | [Reject if complement of CDF for |t| < significance level / 2: |
| 266 | |
| 267 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] |
| 268 | |
| 269 | [[The Alternative-hypothesis: there |
| 270 | *is difference* in means] |
| 271 | [Reject if complement of CDF for |t| > significance level / 2: |
| 272 | |
| 273 | `cdf(complement(dist, fabs(t))) > alpha / 2`]] |
| 274 | |
| 275 | [[The Alternative-hypothesis: the sample mean *is less* than |
| 276 | the true mean.] |
| 277 | [Reject if CDF of t > 1 - significance level: |
| 278 | |
| 279 | `cdf(complement(dist, t)) < alpha`]] |
| 280 | |
| 281 | [[The Alternative-hypothesis: the sample mean *is greater* than |
| 282 | the true mean.] |
| 283 | [Reject if complement of CDF of t < significance level: |
| 284 | |
| 285 | `cdf(dist, t) < alpha`]] |
| 286 | ] |
| 287 | |
| 288 | [note |
| 289 | Notice that the comparisons are against `alpha / 2` for a two-sided test |
| 290 | and against `alpha` for a one-sided test] |
| 291 | |
| 292 | Now that we have all the parts in place, let's take a look at some |
| 293 | sample output, first using the |
| 294 | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm |
| 295 | Heat flow data] from the NIST site. The data set was collected |
| 296 | by Bob Zarr of NIST in January, 1990 from a heat flow meter |
| 297 | calibration and stability analysis. The corresponding dataplot |
| 298 | output for this test can be found in |
| 299 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm |
| 300 | section 3.5.2] of the __handbook. |
| 301 | |
| 302 | [pre |
| 303 | __________________________________ |
| 304 | Student t test for a single sample |
| 305 | __________________________________ |
| 306 | |
| 307 | Number of Observations = 195 |
| 308 | Sample Mean = 9.26146 |
| 309 | Sample Standard Deviation = 0.02279 |
| 310 | Expected True Mean = 5.00000 |
| 311 | |
| 312 | Sample Mean - Expected Test Mean = 4.26146 |
| 313 | Degrees of Freedom = 194 |
| 314 | T Statistic = 2611.28380 |
| 315 | Probability that difference is due to chance = 0.000e+000 |
| 316 | |
| 317 | Results for Alternative Hypothesis and alpha = 0.0500 |
| 318 | |
| 319 | Alternative Hypothesis Conclusion |
| 320 | Mean != 5.000 NOT REJECTED |
| 321 | Mean < 5.000 REJECTED |
| 322 | Mean > 5.000 NOT REJECTED |
| 323 | ] |
| 324 | |
| 325 | You will note the line that says the probability that the difference is |
| 326 | due to chance is zero. From a philosophical point of view, of course, |
| 327 | the probability can never reach zero. However, in this case the calculated |
| 328 | probability is smaller than the smallest representable double precision number, |
| 329 | hence the appearance of a zero here. Whatever its "true" value is, we know it |
| 330 | must be extraordinarily small, so the alternative hypothesis - that there is |
| 331 | a difference in means - is not rejected. |
| 332 | |
| 333 | For comparison the next example data output is taken from |
| 334 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. |
| 335 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 |
| 336 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] |
| 337 | The values result from the determination of mercury by cold-vapour |
| 338 | atomic absorption. |
| 339 | |
| 340 | [pre |
| 341 | __________________________________ |
| 342 | Student t test for a single sample |
| 343 | __________________________________ |
| 344 | |
| 345 | Number of Observations = 3 |
| 346 | Sample Mean = 37.80000 |
| 347 | Sample Standard Deviation = 0.96437 |
| 348 | Expected True Mean = 38.90000 |
| 349 | |
| 350 | Sample Mean - Expected Test Mean = -1.10000 |
| 351 | Degrees of Freedom = 2 |
| 352 | T Statistic = -1.97566 |
| 353 | Probability that difference is due to chance = 1.869e-001 |
| 354 | |
| 355 | Results for Alternative Hypothesis and alpha = 0.0500 |
| 356 | |
| 357 | Alternative Hypothesis Conclusion |
| 358 | Mean != 38.900 REJECTED |
| 359 | Mean < 38.900 NOT REJECTED |
| 360 | Mean > 38.900 NOT REJECTED |
| 361 | ] |
| 362 | |
| 363 | As you can see the small number of measurements (3) has led to a large uncertainty |
| 364 | in the location of the true mean. So even though there appears to be a difference |
| 365 | between the sample mean and the expected true mean, we conclude that there |
| 366 | is no significant difference, and are unable to reject the null hypothesis. |
| 367 | However, if we were to lower the bar for acceptance down to alpha = 0.1 |
| 368 | (a 90% confidence level) we see a different output: |
| 369 | |
| 370 | [pre |
| 371 | __________________________________ |
| 372 | Student t test for a single sample |
| 373 | __________________________________ |
| 374 | |
| 375 | Number of Observations = 3 |
| 376 | Sample Mean = 37.80000 |
| 377 | Sample Standard Deviation = 0.96437 |
| 378 | Expected True Mean = 38.90000 |
| 379 | |
| 380 | Sample Mean - Expected Test Mean = -1.10000 |
| 381 | Degrees of Freedom = 2 |
| 382 | T Statistic = -1.97566 |
| 383 | Probability that difference is due to chance = 1.869e-001 |
| 384 | |
| 385 | Results for Alternative Hypothesis and alpha = 0.1000 |
| 386 | |
| 387 | Alternative Hypothesis Conclusion |
| 388 | Mean != 38.900 REJECTED |
| 389 | Mean < 38.900 NOT REJECTED |
| 390 | Mean > 38.900 REJECTED |
| 391 | ] |
| 392 | |
| 393 | In this case, we really have a borderline result, |
| 394 | and more data (and/or more accurate data), |
| 395 | is needed for a more convincing conclusion. |
| 396 | |
| 397 | [endsect] |
| 398 | |
| 399 | [section:tut_mean_size Estimating how large a sample size would have to become |
| 400 | in order to give a significant Students-t test result with a single sample test] |
| 401 | |
| 402 | Imagine you have conducted a Students-t test on a single sample in order |
| 403 | to check for systematic errors in your measurements. Imagine that the |
| 404 | result is borderline. At this point one might go off and collect more data, |
| 405 | but it might be prudent to first ask the question "How much more?". |
| 406 | The parameter estimators of the students_t_distribution class |
| 407 | can provide this information. |
| 408 | |
| 409 | This section is based on the example code in |
| 410 | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp] |
| 411 | and we begin by defining a procedure that will print out a table of |
| 412 | estimated sample sizes for various confidence levels: |
| 413 | |
| 414 | // Needed includes: |
| 415 | #include <boost/math/distributions/students_t.hpp> |
| 416 | #include <iostream> |
| 417 | #include <iomanip> |
| 418 | // Bring everything into global namespace for ease of use: |
| 419 | using namespace boost::math; |
| 420 | using namespace std; |
| 421 | |
| 422 | void single_sample_find_df( |
| 423 | double M, // M = true mean. |
| 424 | double Sm, // Sm = Sample Mean. |
| 425 | double Sd) // Sd = Sample Standard Deviation. |
| 426 | { |
| 427 | |
| 428 | Next we define a table of significance levels: |
| 429 | |
| 430 | double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; |
| 431 | |
| 432 | Printing out the table of sample sizes required for various confidence levels |
| 433 | begins with the table header: |
| 434 | |
| 435 | cout << "\n\n" |
| 436 | "_______________________________________________________________\n" |
| 437 | "Confidence Estimated Estimated\n" |
| 438 | " Value (%) Sample Size Sample Size\n" |
| 439 | " (one sided test) (two sided test)\n" |
| 440 | "_______________________________________________________________\n"; |
| 441 | |
| 442 | |
| 443 | And now the important part: the sample sizes required. Class |
| 444 | `students_t_distribution` has a static member function |
| 445 | `find_degrees_of_freedom` that will calculate how large |
| 446 | a sample size needs to be in order to give a definitive result. |
| 447 | |
| 448 | The first argument is the difference between the means that you |
| 449 | wish to be able to detect, here it's the absolute value of the |
| 450 | difference between the sample mean, and the true mean. |
| 451 | |
| 452 | Then come two probability values: alpha and beta. Alpha is the |
| 453 | maximum acceptable risk of rejecting the null-hypothesis when it is |
| 454 | in fact true. Beta is the maximum acceptable risk of failing to reject |
| 455 | the null-hypothesis when in fact it is false. |
| 456 | Also note that for a two-sided test, alpha must be divided by 2. |
| 457 | |
| 458 | The final parameter of the function is the standard deviation of the sample. |
| 459 | |
| 460 | In this example, we assume that alpha and beta are the same, and call |
| 461 | `find_degrees_of_freedom` twice: once with alpha for a one-sided test, |
| 462 | and once with alpha/2 for a two-sided test. |
| 463 | |
| 464 | for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i) |
| 465 | { |
| 466 | // Confidence value: |
| 467 | cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]); |
| 468 | // calculate df for single sided test: |
| 469 | double df = students_t::find_degrees_of_freedom( |
| 470 | fabs(M - Sm), alpha[i], alpha[i], Sd); |
| 471 | // convert to sample size: |
| 472 | double size = ceil(df) + 1; |
| 473 | // Print size: |
| 474 | cout << fixed << setprecision(0) << setw(16) << right << size; |
| 475 | // calculate df for two sided test: |
| 476 | df = students_t::find_degrees_of_freedom( |
| 477 | fabs(M - Sm), alpha[i]/2, alpha[i], Sd); |
| 478 | // convert to sample size: |
| 479 | size = ceil(df) + 1; |
| 480 | // Print size: |
| 481 | cout << fixed << setprecision(0) << setw(16) << right << size << endl; |
| 482 | } |
| 483 | cout << endl; |
| 484 | } |
| 485 | |
| 486 | Let's now look at some sample output using data taken from |
| 487 | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. |
| 488 | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 |
| 489 | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] |
| 490 | The values result from the determination of mercury by cold-vapour |
| 491 | atomic absorption. |
| 492 | |
| 493 | Only three measurements were made, and the Students-t test above |
| 494 | gave a borderline result, so this example |
| 495 | will show us how many samples would need to be collected: |
| 496 | |
| 497 | [pre''' |
| 498 | _____________________________________________________________ |
| 499 | Estimated sample sizes required for various confidence levels |
| 500 | _____________________________________________________________ |
| 501 | |
| 502 | True Mean = 38.90000 |
| 503 | Sample Mean = 37.80000 |
| 504 | Sample Standard Deviation = 0.96437 |
| 505 | |
| 506 | |
| 507 | _______________________________________________________________ |
| 508 | Confidence Estimated Estimated |
| 509 | Value (%) Sample Size Sample Size |
| 510 | (one sided test) (two sided test) |
| 511 | _______________________________________________________________ |
| 512 | 75.000 3 4 |
| 513 | 90.000 7 9 |
| 514 | 95.000 11 13 |
| 515 | 99.000 20 22 |
| 516 | 99.900 35 37 |
| 517 | 99.990 50 53 |
| 518 | 99.999 66 68 |
| 519 | '''] |
| 520 | |
| 521 | So in this case, many more measurements would have had to be made, |
| 522 | for example at the 95% level, 14 measurements in total for a two-sided test. |
| 523 | |
| 524 | [endsect] |
| 525 | [section:two_sample_students_t Comparing the means of two samples with the Students-t test] |
| 526 | |
| 527 | Imagine that we have two samples, and we wish to determine whether |
| 528 | their means are different or not. This situation often arises when |
| 529 | determining whether a new process or treatment is better than an old one. |
| 530 | |
| 531 | In this example, we'll be using the |
| 532 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm |
| 533 | Car Mileage sample data] from the |
| 534 | [@http://www.itl.nist.gov NIST website]. The data compares |
| 535 | miles per gallon of US cars with miles per gallon of Japanese cars. |
| 536 | |
| 537 | The sample code is in |
| 538 | [@../../example/students_t_two_samples.cpp students_t_two_samples.cpp]. |
| 539 | |
| 540 | There are two ways in which this test can be conducted: we can assume |
| 541 | that the true standard deviations of the two samples are equal or not. |
| 542 | If the standard deviations are assumed to be equal, then the calculation |
| 543 | of the t-statistic is greatly simplified, so we'll examine that case first. |
| 544 | In real life we should verify whether this assumption is valid with a |
| 545 | Chi-Squared test for equal variances. |
| 546 | |
| 547 | We begin by defining a procedure that will conduct our test assuming equal |
| 548 | variances: |
| 549 | |
| 550 | // Needed headers: |
| 551 | #include <boost/math/distributions/students_t.hpp> |
| 552 | #include <iostream> |
| 553 | #include <iomanip> |
| 554 | // Simplify usage: |
| 555 | using namespace boost::math; |
| 556 | using namespace std; |
| 557 | |
| 558 | void two_samples_t_test_equal_sd( |
| 559 | double Sm1, // Sm1 = Sample 1 Mean. |
| 560 | double Sd1, // Sd1 = Sample 1 Standard Deviation. |
| 561 | unsigned Sn1, // Sn1 = Sample 1 Size. |
| 562 | double Sm2, // Sm2 = Sample 2 Mean. |
| 563 | double Sd2, // Sd2 = Sample 2 Standard Deviation. |
| 564 | unsigned Sn2, // Sn2 = Sample 2 Size. |
| 565 | double alpha) // alpha = Significance Level. |
| 566 | { |
| 567 | |
| 568 | |
| 569 | Our procedure will begin by calculating the t-statistic, assuming |
| 570 | equal variances the needed formulae are: |
| 571 | |
| 572 | [equation dist_tutorial1] |
| 573 | |
| 574 | where Sp is the "pooled" standard deviation of the two samples, |
| 575 | and /v/ is the number of degrees of freedom of the two combined |
| 576 | samples. We can now write the code to calculate the t-statistic: |
| 577 | |
| 578 | // Degrees of freedom: |
| 579 | double v = Sn1 + Sn2 - 2; |
| 580 | cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n"; |
| 581 | // Pooled variance: |
| 582 | double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v); |
| 583 | cout << setw(55) << left << "Pooled Standard Deviation" << "= " << sp << "\n"; |
| 584 | // t-statistic: |
| 585 | double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2)); |
| 586 | cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n"; |
| 587 | |
| 588 | The next step is to define our distribution object, and calculate the |
| 589 | complement of the probability: |
| 590 | |
| 591 | students_t dist(v); |
| 592 | double q = cdf(complement(dist, fabs(t_stat))); |
| 593 | cout << setw(55) << left << "Probability that difference is due to chance" << "= " |
| 594 | << setprecision(3) << scientific << 2 * q << "\n\n"; |
| 595 | |
| 596 | Here we've used the absolute value of the t-statistic, because we initially |
| 597 | want to know simply whether there is a difference or not (a two-sided test). |
| 598 | However, we can also test whether the mean of the second sample is greater |
| 599 | or is less (one-sided test) than that of the first: |
| 600 | all the possible tests are summed up in the following table: |
| 601 | |
| 602 | [table |
| 603 | [[Hypothesis][Test]] |
| 604 | [[The Null-hypothesis: there is |
| 605 | *no difference* in means] |
| 606 | [Reject if complement of CDF for |t| < significance level / 2: |
| 607 | |
| 608 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] |
| 609 | |
| 610 | [[The Alternative-hypothesis: there is a |
| 611 | *difference* in means] |
| 612 | [Reject if complement of CDF for |t| > significance level / 2: |
| 613 | |
| 614 | `cdf(complement(dist, fabs(t))) < alpha / 2`]] |
| 615 | |
| 616 | [[The Alternative-hypothesis: Sample 1 Mean is *less* than |
| 617 | Sample 2 Mean.] |
| 618 | [Reject if CDF of t > significance level: |
| 619 | |
| 620 | `cdf(dist, t) > alpha`]] |
| 621 | |
| 622 | [[The Alternative-hypothesis: Sample 1 Mean is *greater* than |
| 623 | Sample 2 Mean.] |
| 624 | |
| 625 | [Reject if complement of CDF of t > significance level: |
| 626 | |
| 627 | `cdf(complement(dist, t)) > alpha`]] |
| 628 | ] |
| 629 | |
| 630 | [note |
| 631 | For a two-sided test we must compare against alpha / 2 and not alpha.] |
| 632 | |
| 633 | Most of the rest of the sample program is pretty-printing, so we'll |
| 634 | skip over that, and take a look at the sample output for alpha=0.05 |
| 635 | (a 95% probability level). For comparison the dataplot output |
| 636 | for the same data is in |
| 637 | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm |
| 638 | section 1.3.5.3] of the __handbook. |
| 639 | |
| 640 | [pre''' |
| 641 | ________________________________________________ |
| 642 | Student t test for two samples (equal variances) |
| 643 | ________________________________________________ |
| 644 | |
| 645 | Number of Observations (Sample 1) = 249 |
| 646 | Sample 1 Mean = 20.145 |
| 647 | Sample 1 Standard Deviation = 6.4147 |
| 648 | Number of Observations (Sample 2) = 79 |
| 649 | Sample 2 Mean = 30.481 |
| 650 | Sample 2 Standard Deviation = 6.1077 |
| 651 | Degrees of Freedom = 326 |
| 652 | Pooled Standard Deviation = 6.3426 |
| 653 | T Statistic = -12.621 |
| 654 | Probability that difference is due to chance = 5.273e-030 |
| 655 | |
| 656 | Results for Alternative Hypothesis and alpha = 0.0500''' |
| 657 | |
| 658 | Alternative Hypothesis Conclusion |
| 659 | Sample 1 Mean != Sample 2 Mean NOT REJECTED |
| 660 | Sample 1 Mean < Sample 2 Mean NOT REJECTED |
| 661 | Sample 1 Mean > Sample 2 Mean REJECTED |
| 662 | ] |
| 663 | |
| 664 | So with a probability that the difference is due to chance of just |
| 665 | 5.273e-030, we can safely conclude that there is indeed a difference. |
| 666 | |
| 667 | The tests on the alternative hypothesis show that we must |
| 668 | also reject the hypothesis that Sample 1 Mean is |
| 669 | greater than that for Sample 2: in this case Sample 1 represents the |
| 670 | miles per gallon for Japanese cars, and Sample 2 the miles per gallon for |
| 671 | US cars, so we conclude that Japanese cars are on average more |
| 672 | fuel efficient. |
| 673 | |
| 674 | Now that we have the simple case out of the way, let's look for a moment |
| 675 | at the more complex one: that the standard deviations of the two samples |
| 676 | are not equal. In this case the formula for the t-statistic becomes: |
| 677 | |
| 678 | [equation dist_tutorial2] |
| 679 | |
| 680 | And for the combined degrees of freedom we use the |
| 681 | [@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite] |
| 682 | approximation: |
| 683 | |
| 684 | [equation dist_tutorial3] |
| 685 | |
| 686 | Note that this is one of the rare situations where the degrees-of-freedom |
| 687 | parameter to the Student's t distribution is a real number, and not an |
| 688 | integer value. |
| 689 | |
| 690 | [note |
| 691 | Some statistical packages truncate the effective degrees of freedom to |
| 692 | an integer value: this may be necessary if you are relying on lookup tables, |
| 693 | but since our code fully supports non-integer degrees of freedom there is no |
| 694 | need to truncate in this case. Also note that when the degrees of freedom |
| 695 | is small then the Welch-Satterthwaite approximation may be a significant |
| 696 | source of error.] |
| 697 | |
| 698 | Putting these formulae into code we get: |
| 699 | |
| 700 | // Degrees of freedom: |
| 701 | double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2; |
| 702 | v *= v; |
| 703 | double t1 = Sd1 * Sd1 / Sn1; |
| 704 | t1 *= t1; |
| 705 | t1 /= (Sn1 - 1); |
| 706 | double t2 = Sd2 * Sd2 / Sn2; |
| 707 | t2 *= t2; |
| 708 | t2 /= (Sn2 - 1); |
| 709 | v /= (t1 + t2); |
| 710 | cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n"; |
| 711 | // t-statistic: |
| 712 | double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2); |
| 713 | cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n"; |
| 714 | |
| 715 | Thereafter the code and the tests are performed the same as before. Using |
| 716 | are car mileage data again, here's what the output looks like: |
| 717 | |
| 718 | [pre''' |
| 719 | __________________________________________________ |
| 720 | Student t test for two samples (unequal variances) |
| 721 | __________________________________________________ |
| 722 | |
| 723 | Number of Observations (Sample 1) = 249 |
| 724 | Sample 1 Mean = 20.145 |
| 725 | Sample 1 Standard Deviation = 6.4147 |
| 726 | Number of Observations (Sample 2) = 79 |
| 727 | Sample 2 Mean = 30.481 |
| 728 | Sample 2 Standard Deviation = 6.1077 |
| 729 | Degrees of Freedom = 136.87 |
| 730 | T Statistic = -12.946 |
| 731 | Probability that difference is due to chance = 1.571e-025 |
| 732 | |
| 733 | Results for Alternative Hypothesis and alpha = 0.0500''' |
| 734 | |
| 735 | Alternative Hypothesis Conclusion |
| 736 | Sample 1 Mean != Sample 2 Mean NOT REJECTED |
| 737 | Sample 1 Mean < Sample 2 Mean NOT REJECTED |
| 738 | Sample 1 Mean > Sample 2 Mean REJECTED |
| 739 | ] |
| 740 | |
| 741 | This time allowing the variances in the two samples to differ has yielded |
| 742 | a higher likelihood that the observed difference is down to chance alone |
| 743 | (1.571e-025 compared to 5.273e-030 when equal variances were assumed). |
| 744 | However, the conclusion remains the same: US cars are less fuel efficient |
| 745 | than Japanese models. |
| 746 | |
| 747 | [endsect] |
| 748 | [section:paired_st Comparing two paired samples with the Student's t distribution] |
| 749 | |
| 750 | Imagine that we have a before and after reading for each item in the sample: |
| 751 | for example we might have measured blood pressure before and after administration |
| 752 | of a new drug. We can't pool the results and compare the means before and after |
| 753 | the change, because each patient will have a different baseline reading. |
| 754 | Instead we calculate the difference between before and after measurements |
| 755 | in each patient, and calculate the mean and standard deviation of the differences. |
| 756 | To test whether a significant change has taken place, we can then test |
| 757 | the null-hypothesis that the true mean is zero using the same procedure |
| 758 | we used in the single sample cases previously discussed. |
| 759 | |
| 760 | That means we can: |
| 761 | |
| 762 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean]. |
| 763 | If the endpoints of the interval differ in sign then we are unable to reject |
| 764 | the null-hypothesis that there is no change. |
| 765 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the |
| 766 | result is consistent with a true mean of zero, then we are unable to reject the |
| 767 | null-hypothesis that there is no change. |
| 768 | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need |
| 769 | in order to obtain a significant result]. |
| 770 | |
| 771 | [endsect] |
| 772 | |
| 773 | [endsect][/section:st_eg Student's t] |
| 774 | |
| 775 | [/ |
| 776 | Copyright 2006, 2012 John Maddock and Paul A. Bristow. |
| 777 | Distributed under the Boost Software License, Version 1.0. |
| 778 | (See accompanying file LICENSE_1_0.txt or copy at |
| 779 | http://www.boost.org/LICENSE_1_0.txt). |
| 780 | ] |
| 781 | |