Exploring Robustness of Mean-Difference Confidence Intervals

Requires a Wolfram Notebook System
Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.
This Demonstration examines confidence intervals at the 95% level for the difference in means in random samples and
from either a normal, uniform, Laplace, or centered exponential distribution. The distributions all have mean 0. The variance of the
distribution can be varied from 1 to 10 and the variance of
is fixed at 1. Several methods available in the Mathematica function MeanDifferenceCI are examined. The "t (Welch)" method is the default method with MeanDifferenceCI, "t (pooled)" corresponds to the option setting
, while "Z" and "Z (pooled)" correspond to setting the option KnownVariance to the sample variances of
and
or else to the pooled variance estimate. The fifth method is a conservative approximation using the two-sample t-statistic with degrees of freedom equal to
.
Contributed by: Ian McLeod (March 2011)
Open content licensed under CC BY-NC-SA
Snapshots
Details
Snapshot 1: Using "t (Welch)", setting ,
,
's from Laplace and
's from centered exponential distribution, we find after about 10,000 simulations
; in this case the "t (Welch)" method is robust. Using "Z" or "Z pooled",
after about 10,000 simulations, so clearly the
-methods are not suitable using estimated standard deviations.
Snapshot 2: As in Snapshot 1, except the roles of and
are reversed; in this case
after 10,000 simulations and we conclude even "t (Welch)" is not robust. Increasing the sample sizes,
, we obtain
so for large enough samples, "t (Welch)" is accurate.
Snapshot 3: Assuming and
both normal, setting
, and selecting "t (pooled)", we find after 10,000 simulations that
; instead of 95% coverage, it is less. The "t (pooled)" intervals err on the side of not being conservative, that is, they are too narrow; for this reason, the use of the "t (pooled)" method is disparaged; see p. 487 of [1].
Snapshot 4: With ,
, both
and
normal, and using the "t (conservative)" method, after 10,000 simulations,
, showing that this method is indeed conservative.
Snapshot 5: Settings as in Snapshot 1 but using the -method; after 10,000 simulations
, showing that this method is not conservative and not acceptable with small samples. Increasing
,
after 10,000 simulations, showing that for larger samples the
-method is sufficiently accurate.
Some textbooks recommend using the pooled t-test if a simple -test for equal variances is not rejected. As noted on p. 488 of [1], this is not a good idea, because this test itself is not robust against skewness and/or outliers.
The robustness for other confidence levels could be explored by resetting the variable in the initialization code.
[1] D. S. Moore, The Basic Practice of Statistics, 5th. ed., New York: W. H. Freeman and Company, 2010.
Permanent Citation