Kernel Density Estimation

Requires a Wolfram Notebook System
Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.
Histograms are a useful but limited way to estimate or visualize the true, underlying density of some observed data with an unknown distribution. Histograms are essentially discontinuous step functions. So, if you believe that observed data is generated by a continuous density—or even a differentiable density—then another histogram-like estimation procedure might be preferable.
[more]
Contributed by: Jeff Hamrick (March 2011)
Open content licensed under CC BY-NC-SA
Snapshots
Details
The author's interest in kernel estimation techniques stems from a recent paper in which the author used similar techniques to nonparametrically estimate the function in the stochastic differential equation
, where
is a standard Brownian motion. However, kernel estimation techniques are also used, for example, to estimate the functions in the nonlinear regression equation
, where
is an independent, identically distributed sequence such that
. There are numerous applications of kernel estimation techniques, including the density estimation technique featured in this Demonstration. For more information about kernel density estimation, see the Wiki entries.
Several lessons about kernel histograms can be learned quickly from this Demonstration. First, notice that when the number of data is quite small (before you start adding lots of additional data points), you can see the kernel functions quite clearly. Moreover, it is not easy to see how the kernel functions are "estimating" the true underlying density.
Continue to add new data and notice that making the bandwidth small reveals a great deal about the random data that has been generated according to the law of the selected target distribution. However, making the bandwidth small also makes the resulting kernel histogram rather unbelievable. Making the bandwidth very large smooths out the wrinkles in the kernel histogram, but may result in a kernel histogram that does not retain any unusual or interesting features of the data.
Next, notice that while the kernel histogram is converging to the true, underlying density, the rate of convergence does not seem fast. However, it has been shown that if the true, underlying distribution of the data is sufficiently smooth, the rate of convergence in an sense is
. In other words, kernel histograms converge at a rate that is faster than the analogous rate of convergence in the central limit theorem (see Kolmogorov's addendum to the Glivenko-Cantelli theorem for additional information).
The kernel histograms that we generate in this Demonstration have not been adjusted for any underlying assumptions regarding the support of the target distribution. Consider the use of a kernel histogram to estimate an exponential density. An exponential random variable assumes negative values with zero probability, but virtually all kernel histograms used to estimate an exponential density are strictly positive to the left of zero.
There are techniques, however, to manage this undesirable property. There are also techniques to "optimally" choose the bandwidth, even without knowing the underlying distribution of the data. For more information, see Fan and Yao (2003) or Bradley and Taqqu (2003).
Permanent Citation