Deep belief nets, autoencoders, gradient boosting, and neural language models set the stage for the deep learning revolution.
Hinton's breakthrough — train deep networks by stacking Restricted Boltzmann Machines one layer at a time. Each layer learns increasingly abstract features.
Compress data through a bottleneck, then reconstruct it. Sparsity constraint ensures only a few neurons activate — forcing efficient, meaningful features.
Corrupt the input with noise, then train the network to reconstruct the CLEAN original. Forces robust features that capture true data structure.
Friedman's gradient boosting — each new tree fits the RESIDUAL errors of the previous ensemble. Sequentially reduces loss by correcting current mistakes.
Bengio's breakthrough — predict the next word using a neural network over word embeddings. Each word gets a learned vector representation.