Section III · 1980s

The Revival & Classical ML

Backpropagation revives neural networks. Decision trees, RNNs, and Boltzmann machines emerge.

1980

Neocognitron Paper

Fukushima's hierarchical pattern recognizer — inspired by the visual cortex. Simple cells detect local features, complex cells pool them for translation invariance.

Extends Perceptron into a hierarchical architecture; the direct ancestor of CNN/LeNet (adds backprop-based training).

S-cells (feature detect) → C-cells (pool/invariance) → deeper layers → recognition

1986

RNN (Recurrent Neural Network) Paper

Networks with loops — the hidden state acts as memory, carrying information from previous time steps. Essential for sequences like text, speech, and time series.

Overcomes Markov Chain's memoryless limitation by adding recurrence; its vanishing gradient problem is solved by LSTM.

hₜ = tanh(W_h · hₜ₋₁ + W_x · xₜ + b) — hidden state = f(previous state + current input)

1985

Boltzmann Machine Paper

Hinton & Sejnowski's stochastic network — neurons randomly flip on/off based on their energy. Lower energy states are more likely.

Introduces energy-based probabilistic learning inspired by Markov Chain sampling; its restricted variant directly leads to DBN pretraining.

P(state) ∝ e^(-Energy/T) — lower energy = more probable. Energy = -Σ wᵢⱼ sᵢ sⱼ

1986

Backpropagation Paper

Rumelhart, Hinton & Williams made neural networks trainable. Compute the error at the output, then propagate gradients backward through each layer.

Applies the Chain Rule to multi-layer networks, solving Perceptron's XOR problem; enables ALL deep learning from CNN to Transformer.

∂Loss/∂wᵢ = ∂Loss/∂output · ∂output/∂hidden · ∂hidden/∂wᵢ — chain rule through layers

1986

Decision Tree Paper

Quinlan's ID3 algorithm — recursively split data on the feature that gives the most information gain. Simple, fast, and explainable.

A non-neural alternative to Perceptron; later ensembled into Random Forest, GBDT, and XGBoost.

Split on feature with max Information Gain = H(parent) - Σ (|child|/|parent|) H(child)