Use the "mode" buttons to choose 2D or 3D view of trajectories. Evaluation is shown on sorted values from

lattice of initial points.

Use the 2D slider "start" to choose the starting point.

The "steps" slider sets the number of steps used for all methods.

Use the "range" slider to set the range shown in 2D and 3D plots and the lattice of starting points in evaluation mode. To prevent escape to infinity, positions are enforced to be in a ball of radius

.

Use the "noise" slider to control standard deviation of random Gaussian noise added to the gradient.

Use the "seed" slider to choose the random number generator seed for this noise.

Then you can choose one of five popular 2D test functions from [1], which were shifted to have 0 minimum in (0,0), marked with a red "X".

Finally, you can tune hyperparameters of the methods to investigate their dependence. Source code prevents very large steps in ADAM or OGR.

Vanilla SGD uses only "

mom" learning rate; momentum additionally has gradient averaging—with adaptation rate controlled by

. ADAM uses "

" learning rate and "

", "

" adaptation rates.

OGR has "

" similar to the learning rate, for which

would mean jumping to the modeled minimum of a parabola. It has also two adaptation rates: "

" for averaged gradient in Newton step, and "

" for the Hessian estimator. We could assume

, simplifying and reducing computational cost; however, for generality, they are separated here, with the suggestion to use

.

[3] J. Duda, "Improving SGD Convergence by Online Linear Regression of Gradients in Multiple Statistically Relevant Directions."

arxiv.org/abs/1901.11457.