There is an extensive literature on nonlinear regression and nonparametric estimation techniques. We refer the reader to the notable book by Fan and Yao and extensive work by Bjerve and Doksum. Roughly speaking, the procedure to estimate

can be implemented in the following fashion. First, take a set of evenly spaced design points over an interior interval of the empirical support of the covariate

. Then, at each design point, solve a kernel-weighted least squares problem to locally fit a polynomial of order

. (In this Demonstration, the local fit is parabolic.)

By "kernel-weighted", we mean that the data are weighted according to the Epanechnikov kernel

, where

,

is a design point, and

is the "kernel bandwidth". The variable

effectively controls the amount of nearby data that are permitted to influence the estimate of

(and its derivatives) locally. There are a variety of techniques or heuristics available to choose

. You can vary the size of the bandwidth. Smaller bandwidths reveal too many of the local features of the data, perhaps, and larger bandwidths oversmooth the data.

By solving a kernel-weighted least squares regression at each design point, we obtain an estimate of the value of

and its first two derivatives at each design point. We then have all the information we need to fit a spline.

The "cubic" set of data is a simulated set formed by generating 600 realizations of

with a standard normal distribution and then simulating

, where

are independently simulated standard normal random variables.

The "sine" set of data is another simulated set, formed by first generating 600 realizations of

with a uniform distribution over the interval

. We then obtain simulations of

according to the rule

, where

are independently simulated standard normal random variables.

The "baseball" data consists of performance data for all regular major league baseball players during the 1999 baseball season. We compute the overall proportion of hits to at-bats, and the proportion of hits to at-bats when there is a teammate in scoring position. The two proportions are obviously positively correlated, but the nonlinear regression model offers a potentially more useful fit than the usual linear regression model. These data were taken from John Rasp's

website.

The "body fat" data were also taken from John Rasp's website. The data were taken from a sample of 252 men. The covariate

is the weight (in pounds) of the male subject, and the dependent variable

is the body fat percentage (obtained through an underwater weighing procedure).

The "stock" data consists of U.S. equity returns (

variable) and French equity returns (

variable) for 1000 trading days in the late 1990s.