Nonparametric Curve Estimation by Smoothing Splines: Unbiased-Risk-Estimate Selector and its Robust Version via Randomized Choices

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram CDF Player or other Wolfram Language products.

Requires a Wolfram Notebook System

Edit on desktop, mobile and cloud with any Wolfram Language product.

This Demonstration considers a simple nonparametric regression problem: how to recover a function of one variable, here over , when only couples () are known for that satisfy the model , where and the are independent, standard normal random variables. For simplicity, assume that the variance is also known.

[more]

The setting is the same as in [1] except that the (the design) are not regularly spaced, and this Demonstration uses the well-known smoothing spline method instead of kernel smoothers (allowing fast computations; notably, see the recent forum [2] where useful code is provided). Recall that, in place of a bandwidth value, a good value has to be chosen for the famous smoothing parameter, denoted by . Recall that a very small produces a quasi-interpolation of the data, and a very large yields the well-known polynomial regression fit, here of degree 1 since classical cubic splines are considered. A very popular method for a good choice is to try several values, to compute for each one the Mallows's criterion, and to retain the , which yields a minimal (as in [1], where is denoted as UBR since it is an unbiased risk estimate of the global prediction error).

It is frequently observed that the criterion as a function of the smoothing parameter may be a rather flat function around its minimum (this is also true for the similar GCV criterion). In such a case, even if the global prediction error may itself be similarly flat (and thus the impact on the predictive quality of the fit may be weak), a that is too small can then be produced by , where "too small" means that spurious oscillations (which could be wrongly interpreted as real peaks) are present in the final estimate of .

See [3] for a recent review of several approaches to remedy such troubles. Let us recall that Mallows emphasized, in his original paper, that a careful examination of the whole curve should be preferred to a blind minimization of the pure (or GCV) criterion.

In this Demonstration, we have implemented the randomization-based method introduced in [4, section 7.2], which permits computing a "more parsimonious yet 'near-optimal' fit". Such a fit is parameterized by a percentile , which determines an upward modification of the original choice.

As in [3], this parameterized modification is called the "robust choice corresponding to the percentile ".

By playing with this Manipulate, with various underlying functions, you can observe that the results are often very satisfactory for a large range of values (the noise magnitude), in the sense that the mentioned spurious oscillations are almost always eliminated. Furthermore, it is rather easy to choose since, very often, all the values of chosen among , or (and even in many cases) yield a quite similar (at least visually) final estimate of .

[less]

Contributed by: Didier A. Girard (September 2017)
(CNRS-LJK and Univ. Grenoble Alpes)
Open content licensed under CC BY-NC-SA


Snapshots


Details

References

[1] D. A. Girard, "Nonparametric Curve Estimation by Kernel Smoothers: Efficiency of Unbiased Risk Estimate and GCV Selectors," from the Wolfram Demonstrations Project—A Wolfram Web Resource. (Jan 9, 2013) demonstrations.wolfram.com/NonparametricCurveEstimationByKernelSmoothersEfficiencyOfUnb.

[2] jojosthegreat, "Implementation of Smoothing Splines Function," Mathematica Stack Exchange. (Sep 5, 2017) mathematica.stackexchange.com/questions/33206/implementation-of-smoothing-splines-function/33262.

[3] M. A. Lukas, F. R. de Hoog and R. S. Anderssen, "Practical Use of Robust GCV and Modified GCV for Spline Smoothing," Computational Statistics, 31(1), 2016 pp. 269–289. do:10.1007/s00180-015-0577-7.

[4] D. A. Girard, "Estimating the Accuracy of (Local) Cross-Validation via Randomised GCV Choices in Kernel or Smoothing Spline Regression," Journal of Nonparametric Statistics, 22(1), 2010 pp. 41–64. doi:10.1080/10485250903095820.



Feedback (field required)
Email (field required) Name
Occupation Organization
Note: Your message & contact information may be shared with the author of any specific Demonstration for which you give feedback.
Send