Attempting to scale normalized probability counts grows exponentially You can use Multi Layer Perceptrons (MLPs) as a solution to maximize the log-likelihood of the training data. MLPs let you make predictions by embedding words close togehter in a space such that knowledge transfer of interchangability can occur with good confidence. With a vocabulary of 17000 […]
Machine Learning Blog Posts
Notes for — The spelled-out intro to language modeling: building makemore
Derived from: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part1_bigrams.ipynb Makemore Purpose: to make more of examples you give it. Ie: names training makemore on names will make unique sounding names this dataset will be used to train a character level language model modelling sequence of characters and able to predict next character in a sequence makemore implements a services of language […]
Notes for – The spelled-out intro to neural networks and backpropagation: building micrograd
Summary of above: Backpropagation is just a recursive application of chain rule backward through the graph storing the computed derivative as a gradient (grad) variable within each node. Gotcha! We must be careful that when adding multiple of the same term we correctly derive wrt all terms instead of just a single term! Ie: b […]