Make more (bigram model) implementation with mojograd

kraudy · March 9, 2025, 11:05pm

I made a simple implementation of Karpathy’s make more (bigram model) that uses a discrete set of classes (one hot encoding) to embed the inputs, assumes the outputs are logits, applies softmax to get probs, takes the mean and then applies backwards to update the layer’s (2d matrix) weights, and learns a distributed representation of a probability distribution that describes the input data.

Since this way of representing the data has the known dimensionality curse, and on top of that, the mojograd engine works with scalar tensors, a Value object needs to be created by each element; that is why the training is done in batches.

Repo:

Source:

github.com/kraudy/mojo-grad

train_makemore.mojo

master

from mojograd.engine import Value 
from mojograd.nn import Neuron, MLP, Layer
from pathlib import Path
from collections import Set, Dict

fn read_words() raises -> List[String]:
  return Path("./datasets/names.txt").read_text().split("\n")

fn words_to_chars(words: List[String]) raises -> List[String]:
  var char_set = Set[String]()
  for word in words:
      for char in word[]:
          char_set.add(String(char))

  var chars = List[String]()
  for char in char_set:
      chars.append(char[])

  sort(chars)

This file has been truncated. show original

Check it out:

mojo train_makemore.mojo

Topic		Replies	Views
Micrograd implementation in mojo Community Showcase docs	1	168	February 24, 2025
Getting started with deep learning Mojo discussion , 24_6	11	719	July 8, 2025
Introducing NABLA Community Showcase	6	405	June 6, 2025
Problem statement Mojo	1	93	March 13, 2025
What should I build? Hack Weekend	0	640	June 23, 2025

Make more (bigram model) implementation with mojograd

Related topics