Section II · 1960s–1970s

Early Exploration & The First AI Winter

k-NN, Naive Bayes, and the chain rule — simple but powerful ideas that still matter today.

1967

k-Nearest Neighbors Paper

No training needed — classify a new point by majority vote of its k closest known samples. Simple yet surprisingly effective.

A non-parametric alternative to Perceptron's linear boundary; its distance-based approach later inspires kernel methods in SVM.
prediction = mode(labels of k nearest neighbors)
1960s

Naive Bayes Classifier Paper

Assumes features are independent (they usually aren't!), yet amazingly effective for spam filtering and text classification.

Directly applies Bayes' Theorem to classification with the "naive" independence assumption; probabilistic thinking later extends to GMM+EM.
P(class|features) ∝ P(class) × ∏ P(featureᵢ | class)
1970

Automatic Differentiation (Chain Rule) Paper

Linnainmaa's chain rule in code — compute gradients backward from output to inputs. The mathematical foundation of ALL neural network training.

Extends Adaline's gradient idea to arbitrary computation graphs; directly enables Backpropagation in deep networks.
∂L/∂x = (∂L/∂z) · (∂z/∂y) · (∂y/∂x) — multiply local gradients backward!