Section II · 1960s–1970s

Early Exploration & The First AI Winter

k-NN, Naive Bayes, and the chain rule — simple but powerful ideas that still matter today.

1967

No training needed — classify a new point by majority vote of its k closest known samples. Simple yet surprisingly effective.

A non-parametric alternative to Perceptron's linear boundary; its distance-based approach later inspires kernel methods in SVM.

prediction = mode(labels of k nearest neighbors)

1960s

Assumes features are independent (they usually aren't!), yet amazingly effective for spam filtering and text classification.

Directly applies Bayes' Theorem to classification with the "naive" independence assumption; probabilistic thinking later extends to GMM+EM.

P(class|features) ∝ P(class) × ∏ P(featureᵢ | class)

1970

Linnainmaa's chain rule in code — compute gradients backward from output to inputs. The mathematical foundation of ALL neural network training.

Extends Adaline's gradient idea to arbitrary computation graphs; directly enables Backpropagation in deep networks.

∂L/∂x = (∂L/∂z) · (∂z/∂y) · (∂y/∂x) — multiply local gradients backward!