k-Nearest Neighbor (kNN) Classifier![]() In §2.3.2 of [1], Hastie et al. pointed out that Voronoi tessellations may be used to visualize the performance of the kNN classifier and produced several examples. The data we use is generated independently of the mixture data used in those examples, but the overall setup is the same, that is, each class is generated from a mixture of ten normal distributions with the same means and variances as suggested in §2.3.4 of [1]. For this model, it can be shown that the optimal Bayes misclassification rate is . This assumes perfect knowledge of the model. If as the training sample size, , also increases, the misclassification rate of kNN will tend to for test data.With a given finite set of training data (in the present case, ), we can ask what is the best possible choice of in the kNN algorithm to predict future test data. This can be determined by simulation. We simulated a test sample of size and calibrated the misclassification rate for . It was found that when and that the standard deviation for was sufficiently narrow to exclude other possible values of .In-depth treatments of the kNN method are provided in chapter 13 of [1] and Hastie et al. (2009, Ch. 13) and §6.2 of [3]. [1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., New York: Springer, 2009. [2] C. C. Holmes and N. M. Adams, "Likelihood Inference in Nearest-Neighbour Classification Models," Biometrika, 90, 2003 pp. 99–112. ![]() "k-Nearest Neighbor (kNN) Classifier" from The Wolfram Demonstrations Project http://demonstrations.wolfram.com/KNearestNeighborKNNClassifier/ Contributed by: Ian McLeod |
![]() | ||
|
|
||











































Browse all topics















