The problem of estimating the two parameters of a stationary process satisfying the differential equation , where follows a standard Wiener process, from observations at equidistant points of the interval , has been well studied. This is also the classical problem of fitting an autoregressive time series of order 1 (AR1), the case " large" yielding the "near unit root" situation. This Demonstration considers the important case where the observations may have additive measurement errors: we assume that these errors are independent, normal random variables with known variance . Recall that , assumed positive, is often referred to as the mean reversion speed (here assume the constant mean of the process is zero). In geostatistics is called the inverserange parameter. It is well known that the autoregression coefficient in the equivalent AR1 formulation is given by , where . Here we use the two parameters (the diffusion coefficient) and (recall that is then the marginal variance of the process; see the Details section in the help page for the OrnsteinUhlenbeckProcess function). We restrict ourselves to the case (so that is also the noisetosignal ratio). A simple "solution" to this fitting problem is to neglect the noise, that is, to use the most appealing estimator among those available for the nonnoisy case and to substitute the noisy observations, as was studied in [2]. Here as "most appealing" we choose the celebrated maximum likelihood (ML) estimator. Indeed, it is known that this estimator can be exactly and reliably calculated by first solving a simple cubic equation in (see [3] and the references therein), the ML estimate of being then an explicit "Gibbs energy" (a quadratic form whose computation cost is of order ). On the other hand, as soon as , the exact maximization of the correctly specified likelihood criterion (the one that takes into account the noise) is not so easy. This Demonstration considers the recently proposed "CGEMEV" approach [1]. In short, firstly is simply estimated by the biascorrected empirical variance, say ; secondly an estimating equation is invoked to estimate . Precisely, is searched so that the conditional mean of the "candidate Gibbs energy" (where we substitute in place of the true so that this conditional mean is a function of only ) is equal to . It is easy to show that these two equations are unbiased, that is, they are true on average when and are set to their true values (the averaging is ensembleaveraging, i.e. from infinitely repeated simulations of the process and of the noise under the true model). Stronger properties are studied in [1]. Implementation of CGEMEV is much simpler than exact ML, since it reduces to onedimensional numerical root finding. A simple fixedpoint algorithm is used here. It proves to be reliable (with fast convergence) for all the settings in this Demonstration.
Snapshot 1: Selecting as the true diffusion coefficient (the value of from which the nonnoisy data is simulated, being fixed) and choosing , this setting may be thought of as "close" to a case where the noise could be forgotten; this could be confirmed by moving only from to (the underlying is then unchanged but the measurement errors are eliminated) and observing that all the estimates are almost unchanged (up to three digits). Concerning the diffusion coefficient, you can observe that the two estimation methods produce very close results. Such closeness is less pronounced for the two estimates of the variance. By changing the seed (and thus a new and new measurement errors used to generate the data) you can be convinced that this is not an accident. Furthermore, a rather large variability, from seed to seed, for the two estimates of the variance is also observed; it is much larger than the variability of the estimates of the diffusion coefficient; notice that this observation is well in agreement with the known theory about the estimation of the variance, the inverserange parameter, and their product (see [1] and the references therein). By moving from seed to seed, neither of the two methods seems a clear winner in this "small noise" setting. Let us now consider a higher noise level. Another important point to note is that the estimates are not very influenced by the noise perturbing the data, provided the noise level stays lower than . However, by increasing to 0.05, a clear degradation of the neglectingerrorsML is observed. In contrast, CGEMEV still produces reasonable estimates of . Here ; however, you can change from to and the conclusions remain similar. Snapshot 2: Staying at and selecting as true diffusion coefficient (so that we are close to the "near unit root" situation), a noise with can no longer be considered as a negligible noise. Indeed, we can observe that by diminishing the amplitude of the present noise from to , we restore significantly the quality of the estimates of using neglectingerrorsML. And by moving from to and trying several seeds, one can be convinced that a noisetosignal ratio of order or less is required if we want to trust the neglectingerrorsML estimator (since we can be content with three accurate digits in the estimates). By increasing to 0.05, the neglectingerrorsML becomes meaningless. In contrast, CGEMEV still produces reasonable estimates of . Snapshot 3: Selecting an "intermediate" value as true diffusion coefficient, similar conclusions can be drawn, except that the approximate upper bound on required to trust the neglectingerrorsML estimator of seems also intermediate between and . [2] A. Gloter and J. Jacod, "Diffusions with Measurements Errors, IIOptimal Estimators," ESAIMProbability and Statistics, 5, 2001 pp. 243–260. [3] Y. Zhang, H. Yu, and A. Ian McLeod, "Developments in Maximum Likelihood Unit Root Tests," Communications in Statistics—Simulation and Computation, 42(5), 2013 pp. 1088–1103.
