Estimating a Centered Ornstein-Uhlenbeck Process under Measurement Errors
The problem of estimating the two parameters of a stationary process satisfying the differential equation , where follows a standard Wiener process, from observations at equidistant points of the interval , has been well studied. This is also the classical problem of fitting an autoregressive time series of order 1 (AR1), the case " large" yielding the "near unit root" situation. This Demonstration considers the important case where the observations may have additive measurement errors: we assume that these errors are independent, normal random variables with known variance .
Recall that , assumed positive, is often referred to as the mean reversion speed (here assume the constant mean of the process is zero). In geostatistics is called the inverse-range parameter. It is well known that the autoregression coefficient in the equivalent AR1 formulation is given by , where .
Here we use the two parameters (the diffusion coefficient) and (recall that is then the marginal variance of the process; see the Details section in the help page for the OrnsteinUhlenbeckProcess function). We restrict ourselves to the case (so that is also the noise-to-signal ratio).
A simple "solution" to this fitting problem is to neglect the noise, that is, to use the most appealing estimator among those available for the non-noisy case and to substitute the noisy observations, as was studied in . Here as "most appealing" we choose the celebrated maximum likelihood (ML) estimator. Indeed, it is known that this estimator can be exactly and reliably calculated by first solving a simple cubic equation in (see  and the references therein), the ML estimate of being then an explicit "Gibbs energy" (a quadratic form whose computation cost is of order ).
On the other hand, as soon as , the exact maximization of the correctly specified likelihood criterion (the one that takes into account the noise) is not so easy.
This Demonstration considers the recently proposed "CGEM-EV" approach . In short, firstly is simply estimated by the bias-corrected empirical variance, say ; secondly an estimating equation is invoked to estimate . Precisely, is searched so that the conditional mean of the "candidate Gibbs energy" (where we substitute in place of the true so that this conditional mean is a function of only ) is equal to . It is easy to show that these two equations are unbiased, that is, they are true on average when and are set to their true values (the averaging is ensemble-averaging, i.e. from infinitely repeated simulations of the process and of the noise under the true model). Stronger properties are studied in .
Implementation of CGEM-EV is much simpler than exact ML, since it reduces to one-dimensional numerical root finding. A simple fixed-point algorithm is used here. It proves to be reliable (with fast convergence) for all the settings in this Demonstration.
Snapshot 1: Selecting as the true diffusion coefficient (the value of from which the non-noisy data is simulated, being fixed) and choosing , this setting may be thought of as "close" to a case where the noise could be forgotten; this could be confirmed by moving only from to (the underlying is then unchanged but the measurement errors are eliminated) and observing that all the estimates are almost unchanged (up to three digits). Concerning the diffusion coefficient, you can observe that the two estimation methods produce very close results. Such closeness is less pronounced for the two estimates of the variance. By changing the seed (and thus a new and new measurement errors used to generate the data) you can be convinced that this is not an accident. Furthermore, a rather large variability, from seed to seed, for the two estimates of the variance is also observed; it is much larger than the variability of the estimates of the diffusion coefficient; notice that this observation is well in agreement with the known theory about the estimation of the variance, the inverse-range parameter, and their product (see  and the references therein). By moving from seed to seed, neither of the two methods seems a clear winner in this "small noise" setting. Let us now consider a higher noise level. Another important point to note is that the estimates are not very influenced by the noise perturbing the data, provided the noise level stays lower than . However, by increasing to 0.05, a clear degradation of the neglecting-errors-ML is observed. In contrast, CGEM-EV still produces reasonable estimates of . Here ; however, you can change from to and the conclusions remain similar.
Snapshot 2: Staying at and selecting as true diffusion coefficient (so that we are close to the "near unit root" situation), a noise with can no longer be considered as a negligible noise. Indeed, we can observe that by diminishing the amplitude of the present noise from to , we restore significantly the quality of the estimates of using neglecting-errors-ML. And by moving from to and trying several seeds, one can be convinced that a noise-to-signal ratio of order or less is required if we want to trust the neglecting-errors-ML estimator (since we can be content with three accurate digits in the estimates). By increasing to 0.05, the neglecting-errors-ML becomes meaningless. In contrast, CGEM-EV still produces reasonable estimates of .
Snapshot 3: Selecting an "intermediate" value as true diffusion coefficient, similar conclusions can be drawn, except that the approximate upper bound on required to trust the neglecting-errors-ML estimator of seems also intermediate between and .