Nonparametric Density Estimation: Robust Cross-Validation Bandwidth Selection via Randomized Choices

Initializing live version
Download to Desktop

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

This Demonstration considers a simple nonparametric curve estimation problem: how to estimate a univariate pdf (short for probability density function ) when observations drawn from are known: , . In this Demonstration, can be a normal distribution, a mixture (more or less peaked) of two normals, a skew-normal distribution or the widely studied claw density (see Details).


The study here has some similarities with that of [1], except that we are concerned here with the estimation of a pdf in place of a regression function, and we use the classical kernel estimation method (see [2, Details and Options]) in place of a smoothing spline. It is well known that a good value must be chosen for the famous bandwidth, denoted by . Recall that a very small produces a quasi-interpolation of the empirical distribution (i.e. derivative of the empirical cdf), and a very large yields a constant density. A very popular method to find a good choice is to try several values, compute for each one the least squares cross validation (LSCV) criterion (also named unbiased CV), and retain the that yields a minimal . This minimization is implemented here using a fine-enough grid of -values as in [1], where (this unbiased risk estimate of the global prediction error is also denoted as UBR) is used in place of .

It is frequently observed, especially for smooth pdfs, that theLSCV criterion as a function of the smoothing parameter may be a rather flat function around its minimum. In such cases, even if the global prediction error may itself be similarly flat (and thus the impact on the predictive quality of the fit might be weak), a too-small value of can then easily be produced byLSCV. Here "too small" means that spurious oscillations (which could be wrongly interpreted as real peaks) are present in the final estimate of .

In this Demonstration, we have also implemented a randomization-based "robustifying" method very similar to the method introduced in [3, Section 7.2], which, more precisely, permits us to compute a parsimonious yet near-optimal fit. Such a fit is parameterized by a percentile that determines an upward modification of the originalLSCV choice.

As in [1] and [6], this parameterized modification is called the "robustLSCV choice corresponding to the percentile ."

By playing with the controls for various underlying pdfs, you can observe that the results are often very satisfactory, in the sense that the mentioned spurious oscillations are almost always eliminated. Furthermore, it is rather easy to choose since, very often, all the values of chosen among , or (and even in many cases) yield a quite similar (at least visually) final estimate of .


Contributed by: Didier A. Girard  (May 2018)
(CNRS-LJK and Univ. Grenoble Alpes)
Open content licensed under CC BY-NC-SA



Snapshot 1 illustrates that often happens to be a quite flat function of . The robustified choice, here with , is clearly more satisfying, since all the spurious oscillations are eliminated.

Snapshot 2 demonstrates that this undersmoothing issue with the LSCV bandwidth choice is observed even with a large amount of data, here . The robustified choice is thus still useful.

Snapshot 3 concerns the so-called claw pdf from [4]. This particular pdf was widely studied: for example, [5, Section 10.2.2] shows that LSCV is quite efficient to recover it for various data sizes. Some good news (illustrated by Snapshot 3) is that LSCV and this robustified LSCV give quite similar results for this pdf. By clicking the second tab and playing with the seed control in Snapshot 3, it is also observed that the automatic histogram used, with the classical option FreedmanDiaconis of the built-in Mathematica function Histogram, often does not detect the five peaks of this pdf when is "small" (here ).

In these plots, the term "extended data range" stands for the interval over which each curve estimate is computed at equidistant points. Recall that the computational cost (including that for the LSCV criterion) is greatly reduced by exploiting well-known approximations by periodic convolution formulas (see [7, Section 3.5]). Here we put , and , which appears to give quite satisfying results for all the considered settings in this Demonstration.


[1] D. A. Girard. "Nonparametric Curve Estimation by Smoothing Splines: Unbiased-Risk-Estimate Selector and Its Robust Version via Randomized Choices" from the Wolfram Demonstrations Project—A Wolfram Web Resource.

[2] SmoothKernelDistribution.

[3] D. A. Girard, "Estimating the Accuracy of (Local) Cross-Validation via Randomised GCV Choices in Kernel or Smoothing Spline Regression," Journal of Nonparametric Statistics, 22(1), 2010 pp. 41–64. doi:10.1080/10485250903095820.

[4] J. S. Marron and M. P. Wand, "Exact Mean Integrated Squared Error," The Annals of Statistics, 20(2), 1992 pp. 712–736.

[5] C. Loader, Local Regression and Likelihood, New York: Springer, 1999.

[6] M. A. Lukas, F. R. de Hoog and R. S. Anderssen, "Practical Use of Robust GCV and Modified GCV for Spline Smoothing," Computational Statistics, 31(1), 2016 pp. 269–289. doi:10.1007/s00180-015-0577-7.

[7] B. W. Silverman, Density Estimation for Statistics and Data Analysis, London: Chapman and Hall, 1986.

Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.