This Demonstration illustrates the operation of the Kraskov–Stögbauer–Grassberger (KSG) estimator [1] of mutual information on a small dataset with a nonlinear dependence structure, which cannot be captured by the Pearson correlation coefficient.

The KSG estimator employs an adaptive kernel size: for each point, the size of the kernel (green square) is determined by the distance to its nearest neighbor ( in this Demonstration; the nearest neighbors are highlighted in red). By clicking the points (or using the slider), you can observe that the kernel size is small in densely populated regions of the space, but increases in sparsely populated regions. This strategy enables the KSG estimator to overcome some of the limitations of the fixed-size kernel methods and of the binning methods, for example, preventing empty bins.

A low estimation bias is achieved by using the same kernel size to compute the entropy of the joint space distribution (green square) and of the marginal distributions (gray stripes). The local mutual information estimate is computed for each point using the number of neighbors , in the marginal spaces:

,

where is the digamma function and is the number of samples.

The final estimate of the mutual information is computed as the average over the local values:

.

Efficient implementations of the KSG algorithm and other information-theoretic estimators are provided by dedicated toolboxes (e.g. the JIDT toolbox [2]).

[1] A. Kraskov, H. Stögbauer and P. Grassberger, "Estimating Mutual Information," Physical Review E, 69(6), 2004. doi:10.1103/PhysRevE.69.066138.

[2] J. T. Lizier, "JIDT: An Information-Theoretic Toolkit for Studying the Dynamics of Complex Systems," Frontiers in Robotics and AI, 1, 2014. doi:10.3389/frobt.2014.00011.