Estimators of a Noisy Centered Ornstein-Uhlenbeck Process and Its Noise Variance

This Demonstration considers three estimators for a noisy centered Ornstein–Uhlenbeck process. This work is a logical sequel to [1]; they both consider a classic "AR1 plus noise" model for time series, but in [1], the noise variance was assumed to be known. Recall the setting and notations in [1]: an underlying process, assumed to be a stationary solution to the differential equation
,
where and follows a standard Wiener process, is observed at equispaced times over , but with additional unknown noise components (often measurement errors), is assumed to be independent, normal random variables with variance (denoted by for brevity in the following). We also denote by the column vector of observed data, whose component is given by
,
where .
Recall that the underlying AR(1) time series can be parameterized by the more classical couple , where is the autoregression coefficient and is the signal variance (see the Details section for the OrnsteinUhlenbeckProcess function). (The coefficient is necessarily strictly positive here; it is also called the "serial correlation".) An equivalent parameterization is . In this Demonstration, the two estimation methods of , ML and CGEMEV are extended to the case when the noise level is unknown. CGEMEV stands for "Conditional Gibbs-Energy Mean and Empirical Variance"; see the paragraph after equation (1). ML and CGEMEV were implemented in [1], although ML was simplified there to the "neglecting-errors ML". Playing with [1] shows that CGEMEV works quite well, whether the level you choose for the assumed known is non-negligible (for example or ) or very small ( or ).
The principle behind the ML method provides a direct extension, in theory; it simply consists of maximizing the likelihood with respect to as well. The maximization of the correctly specified likelihood criterion (which takes into account a known or unknown nonzero noise level) is not as easy as that of the "neglecting-errors likelihood" (which reduces to a simple cubic equation) [1]. Here, a maximizer is with respect to only the two variables , where is the signal-to-noise ratio (SNR) since a classical concentration of the likelihood can eliminate the variable , see equation (2). This maximizer is numerically approximated by using the built-in Mathematica function FindMaximum. The likelihood function is evaluated via the built-in function LogLikelihood[ARMAProcess[{a1},{b1},.],.] since an AR(1) plus noise series can be rewritten as a particular ARMA(1,1) series whose marginal variance is [8]. As is also well known, the speed and, more importantly, sometimes the result of such a numerical maximization can be impacted by the required selection of a starting point [2, Section 2.3]. In this Demonstration, the starting point is randomized and you can change the seed from among six values to assess the stability of the algorithm.
On the other hand, an extension of CGEMEV to the case of unknown is not so direct. First, when is given, is simply estimated by the bias-corrected empirical variance, say , and the estimated SNR satisfies . See [2, equation (1.2)] for a definition of the bias-corrected empirical variance and comments on its "unbiasedness". Second, denoting the "smoothing matrix" (here a Kalman smoother) by , where is the identity matrix and is the correlation matrix of the underlying AR(1) time series with the inverse-range parameter , that is, with , then , or equivalently , is found by solving . (In [1], a fixed-point algorithm is used.) Here, is defined by
(1),
see [2, equation (1.3)].
Recall a possible interpretation of CGEMEV: it can be shown (via classic algebraic manipulation) that is the conditional expectation, given , of the Gibbs energy of the underlying AR(1) time series for the probabilistic mechanism associated with and a candidate . Thus, means a "matching between the conditional Gibbs energy mean and the above estimate () of the variance of the underlying signal," so we call this equation in the CGEMEV estimating equation. The good statistical properties of the matching as an estimation method of for unnoisy data are discussed in [5, 6]. If the fixed is chosen smaller and smaller, then in definition (1) of , the matrix approaches the identity matrix and the second term with approaches for any candidate .
Two possible extensions of CGEMEV to the case of unknown are considered here. The first is a natural and simple "plugin" method: compute the classical nonparametric estimator of that can be obtained by extrapolating to zero the variogram function as a function of the lag. Then substitute for the true in the CGEMEV equation in , considered in [1]. See [3] and the references therein for details of this "variogram extrapolation at zero" estimator denoted here by .
The second method is to use a well-known equation satisfied by the ML estimate of when the signal-to-noise ratio and are known; precisely, is the root of
(2),
see [4, Section 3.2.3]. The fixed-point algorithm mentioned is quite fast and accurate for solving for any candidate ; let us denote its root by . Here, a common bisection method is used to solve the concentrated CGEMEV-ML equation in .
The right-hand column of the displayed table shows the two estimates , where is given by one of these two methods and with . For the second method, is either the first root found for this concentrated CGEMEV-ML equation or 0.005 when no root is found, the search being over an interval whose smallest value is . A similar constraint is used in FindMaximum for the maximization of the concentrated likelihood with respect to and the SNR ; after a limited number of iterations (), is deduced from equation (2). The corresponding solutions , are displayed in the left-hand column (with in parentheses).
Concerning the computational cost of this alternative to ML, these two extensions of CGEMEV are particularly appropriate for large datasets as soon as a reasonably fast algorithm exists for evaluating matrix-vector products of the form . This is the case when iterative methods are appropriate for such evaluations, as in [6, 7].

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

Each realization of the underlying process (a non-noisy dataset) is generated by using the built-in function OrnsteinUhlenbeckProcess. Changing consists of adding a standard Gaussian white noise series that is amplified by the chosen to such a non-noisy dataset.
Snapshot 1: Select "diffusion coefficient parameter" 32 (the value of from which the non-noisy data is simulated), being fixed, and choose . This setting (with ) is also analyzed in Snapshot 1 of [1]. There it was observed that CGEMEV with a priori known was much more efficient than the neglecting-errors-ML; this holds as long as the noise level stays greater than, say, . The two methods—ML taking account of an unknown and CGEMEV-ML—have very similar statistical efficiencies.
Let us compare them more precisely. For the diffusion coefficient, you can observe that the two estimation methods both produce satisfactory and rather close results. For the variance , the two methods both give to two digits. Changing the seed from the initial value 6499 should convince you that this closeness is not accidental; indeed, even if they are not quite close to the true , the two estimates of are close to each other relatively to their variability. This was already observed in the case of no noise. If we come back to the seed of 6499 but with (and thus new measurement errors are used to generate the data), you see that CGEMEV-ML was not able to produce a root; as said above, among the candidate estimates , we then choose the one with rather arbitrarily. In any case, you can see that the resulting combination is a very good estimate of . (In fact the impact of the choice is weak; we did try, for example, instead of , and it also worked well in this setting.) You also see that is very good, and yet is a clear underestimation. The constraint used in ML does not strictly impose since no specific constraint is imposed on .
The third method, CGEMEV using , is obtained by clicking "variogram extrapolation at 0+" instead of "hybrid EV-ML". It is not so bad considering that it is sometimes 10 times faster than CGEMEV-ML. First, as in ML or CGEMEV-ML, a lower bound is imposed on ; precisely is used as soon as . By changing the generation-seed, this lower bound is often active when . This is no longer the case when we come back to . More precisely, for a seed of 6499, you can see that CGEMEV❘ gives a slight underestimation of and an overestimation of ; for other seeds, these errors of estimation are in the opposite direction (see, for instance, a seed of 6517). These "small" inaccuracies of the nonparametric are very often corrected by CGEMEV-ML.
Snapshot 2: Keeping and selecting a small (here ) as the "diffusion coefficient parameter" (so that we are close to the near unit root situation), a noise with can no longer be considered as a negligible noise. This claim can be easily demonstrated by playing with [1] where "not negligible noise" means that the neglecting-errors-ML method cannot be trusted. The good news is that with such small , the noise level can now be reliably estimated, even when it is only , either by ML or CGEMEV-ML. Concerning the starting point (here randomized) required by the iterative maximization of the likelihood, it may sometimes have a non-negligible impact; by changing the randomization seed from 321 to 1492, you can see a weak impact in this setting, even though the estimation of is now slightly worse than the one by CGEMEV-ML. However, if the iteration is stopped too early (try 50 instead of 100), the impact is strong; the seed 2018 greatly deteriorates the ML estimator of .
Snapshot 3: The meaning of "small " in the previous statement depends on , as expected; here with and , the noise is still reliably estimated. This estimation is less accurate as increases; this is the case already with , even for ML where the upper bound , imposed on the SNR, is often active, yet the estimation of the diffusion coefficient by where is arbitrarily set at is still almost as efficient as ML in such a case.
References
[1] D. A. Girard. "Estimating a Centered Ornstein–Uhlenbeck Process under Measurement Errors" from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/EstimatingACenteredOrnsteinUhlenbeckProcessUnderMeasurementE.
[2] D. A. Girard, "Efficiently Estimating Some Common Geostatistical Models by ‘Energy–Variance Matching’ or Its Randomized ‘Conditional–Mean’ Versions," Spatial Statistics, 21(Part A), 2017 pp. 1–26. doi:10.1016/j.spasta.2017.01.001.
[3] M. Katzfuss and N. Cressie, "Bayesian Hierarchical Spatio-temporal Smoothing for Very Large Datasets,"
Environmetrics, 23(1), 2012 pp. 94–107. doi:10.1002/env.1147.
[4] C. Gu, Smoothing Spline ANOVA Models, 2nd ed., New York: Springer, 2013.
[5] D. A. Girard. "Three Alternatives to the Likelihood Maximization for Estimating a Centered Matérn (3/2) Process" from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/ThreeAlternativesToTheLikelihoodMaximizationForEstimatingACe.
[6] D. A. Girard. "Estimating a Centered Matérn (1) Process: Three Alternatives to Maximum Likelihood via Conjugate Gradient Linear Solvers" from the Wolfram Demonstrations Project—A Wolfram Web Resource. demonstrations.wolfram.com/EstimatingACenteredMatern1ProcessThreeAlternativesToMaximumL.
[7] D. A. Girard. "Estimating a Centered Isotropic Matérn Field from a (Possibly Incomplete and Noisy) Lattice Observation." (Oct 1, 2018) github.com/didiergirard/CGEMEV.
[8] J. Staudenmayer and J.P. Buonaccorsi, "Measurement Error in Linear Autoregressive Models,"
Journal of the American Statistical Association, 100(471), 2005 pp. 841–852. doi.org/10.1198/016214504000001871.
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.