I learned very early the difference between knowing the name of something and knowing something.
Richard FeynmanA History of Large Language Models
01 October 2025
I trace an academic history of some of the core ideas behind large language models, such as distributed representations, transducers, attention, the transformer, and generative pre-training.
129 April 2018
A common explanation for the reparameterization trick with variational autoencoders is that we cannot backpropagate through a stochastic node. I provide a more formal justification.
215 April 2018
Backprogation is an algorithm that computes the gradient of a neural network, but it may not be obvious why the algorithm uses a backward pass. The answer allows us to reconstruct backprop from first principles.
3From Convolution to Neural Network
24 February 2017
Most explanations of CNNs assume the reader understands the convolution operation and how it relates to image processing. I explore convolutions in detail and explain how they are implemented as layers in a neural network.
4