Nonparametric Density Estimation: Robust Cross-Validation Bandwidth Selection via Randomized Choices
Requires a Wolfram Notebook System
Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.
This Demonstration considers a simple nonparametric curve estimation problem: how to estimate a univariate pdf (short for probability density function ) when observations drawn from are known: , . In this Demonstration, can be a normal distribution, a mixture (more or less peaked) of two normals, a skew-normal distribution or the widely studied claw density (see Details).
[more]
Contributed by: Didier A. Girard (May 2018)
(CNRS-LJK and Univ. Grenoble Alpes)
Open content licensed under CC BY-NC-SA
Snapshots
Details
Snapshot 1 illustrates that often happens to be a quite flat function of . The robustified choice, here with , is clearly more satisfying, since all the spurious oscillations are eliminated.
Snapshot 2 demonstrates that this undersmoothing issue with the LSCV bandwidth choice is observed even with a large amount of data, here . The robustified choice is thus still useful.
Snapshot 3 concerns the so-called claw pdf from [4]. This particular pdf was widely studied: for example, [5, Section 10.2.2] shows that LSCV is quite efficient to recover it for various data sizes. Some good news (illustrated by Snapshot 3) is that LSCV and this robustified LSCV give quite similar results for this pdf. By clicking the second tab and playing with the seed control in Snapshot 3, it is also observed that the automatic histogram used, with the classical option FreedmanDiaconis of the built-in Mathematica function Histogram, often does not detect the five peaks of this pdf when is "small" (here ).
In these plots, the term "extended data range" stands for the interval over which each curve estimate is computed at equidistant points. Recall that the computational cost (including that for the LSCV criterion) is greatly reduced by exploiting well-known approximations by periodic convolution formulas (see [7, Section 3.5]). Here we put , and , which appears to give quite satisfying results for all the considered settings in this Demonstration.
References
[1] D. A. Girard. "Nonparametric Curve Estimation by Smoothing Splines: Unbiased-Risk-Estimate Selector and Its Robust Version via Randomized Choices" from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/NonparametricCurveEstimationBySmoothingSplinesUnbiasedRiskEs.
[2] SmoothKernelDistribution. reference.wolfram.com/language/ref/SmoothKernelDistribution.html.
[3] D. A. Girard, "Estimating the Accuracy of (Local) Cross-Validation via Randomised GCV Choices in Kernel or Smoothing Spline Regression," Journal of Nonparametric Statistics, 22(1), 2010 pp. 41–64. doi:10.1080/10485250903095820.
[4] J. S. Marron and M. P. Wand, "Exact Mean Integrated Squared Error," The Annals of Statistics, 20(2), 1992 pp. 712–736. www.jstor.org/stable/2241980.
[5] C. Loader, Local Regression and Likelihood, New York: Springer, 1999.
[6] M. A. Lukas, F. R. de Hoog and R. S. Anderssen, "Practical Use of Robust GCV and Modified GCV for Spline Smoothing," Computational Statistics, 31(1), 2016 pp. 269–289. doi:10.1007/s00180-015-0577-7.
[7] B. W. Silverman, Density Estimation for Statistics and Data Analysis, London: Chapman and Hall, 1986.
Permanent Citation