Nonparametric Curve Estimation by Smoothing Splines: Unbiased-Risk-Estimate Selector and its Robust Version via Randomized Choices
This Demonstration considers a simple nonparametric regression problem: how to recover a function of one variable, here over , when only couples () are known for that satisfy the model , where and the are independent, standard normal random variables. For simplicity, assume that the variance is also known.[more]
The setting is the same as in  except that the (the design) are not regularly spaced, and this Demonstration uses the well-known smoothing spline method instead of kernel smoothers (allowing fast computations; notably, see the recent forum  where useful code is provided). Recall that, in place of a bandwidth value, a good value has to be chosen for the famous smoothing parameter, denoted by . Recall that a very small produces a quasi-interpolation of the data, and a very large yields the well-known polynomial regression fit, here of degree 1 since classical cubic splines are considered. A very popular method for a good choice is to try several values, to compute for each one the Mallows's criterion, and to retain the , which yields a minimal (as in , where is denoted as UBR since it is an unbiased risk estimate of the global prediction error).
It is frequently observed that the criterion as a function of the smoothing parameter may be a rather flat function around its minimum (this is also true for the similar GCV criterion). In such a case, even if the global prediction error may itself be similarly flat (and thus the impact on the predictive quality of the fit may be weak), a that is too small can then be produced by , where "too small" means that spurious oscillations (which could be wrongly interpreted as real peaks) are present in the final estimate of .
See  for a recent review of several approaches to remedy such troubles. Let us recall that Mallows emphasized, in his original paper, that a careful examination of the whole curve should be preferred to a blind minimization of the pure (or GCV) criterion.
In this Demonstration, we have implemented the randomization-based method introduced in [4, section 7.2], which permits computing a "more parsimonious yet 'near-optimal' fit". Such a fit is parameterized by a percentile , which determines an upward modification of the original choice.
As in , this parameterized modification is called the "robust choice corresponding to the percentile ".
By playing with this Manipulate, with various underlying functions, you can observe that the results are often very satisfactory for a large range of values (the noise magnitude), in the sense that the mentioned spurious oscillations are almost always eliminated. Furthermore, it is rather easy to choose since, very often, all the values of chosen among , or (and even in many cases) yield a quite similar (at least visually) final estimate of .[less]
 D. A. Girard, "Nonparametric Curve Estimation by Kernel Smoothers: Efficiency of Unbiased Risk Estimate and GCV Selectors," from the Wolfram Demonstrations Project—A Wolfram Web Resource. (Jan 9, 2013) demonstrations.wolfram.com/NonparametricCurveEstimationByKernelSmoothersEfficiencyOfUnb.
 jojosthegreat, "Implementation of Smoothing Splines Function," Mathematica Stack Exchange. (Sep 5, 2017) mathematica.stackexchange.com/questions/33206/implementation-of-smoothing-splines-function/33262.
 M. A. Lukas, F. R. de Hoog and R. S. Anderssen, "Practical Use of Robust GCV and Modified GCV for Spline Smoothing," Computational Statistics, 31(1), 2016 pp. 269–289. do:10.1007/s00180-015-0577-7.
 D. A. Girard, "Estimating the Accuracy of (Local) Cross-Validation via Randomised GCV Choices in Kernel or Smoothing Spline Regression," Journal of Nonparametric Statistics, 22(1), 2010 pp. 41–64. doi:10.1080/10485250903095820.