Make more (bigram model) implementation with mojograd

I made a simple implementation of Karpathy’s make more (bigram model) that uses a discrete set of classes (one hot encoding) to embed the inputs, assumes the outputs are logits, applies softmax to get probs, takes the mean and then applies backwards to update the layer’s (2d matrix) weights, and learns a distributed representation of a probability distribution that describes the input data.

Since this way of representing the data has the known dimensionality curse, and on top of that, the mojograd engine works with scalar tensors, a Value object needs to be created by each element; that is why the training is done in batches.

Repo:

Source:

Check it out:

mojo train_makemore.mojo
3 Likes