Mojo vs max mandelbrot performance

eshishkin · May 11, 2025, 6:47pm

I decided to compare mandelbrot performance in plain mojo modular/examples/mojo/mandelbrot.mojo at main · modular/modular · GitHub
and in max modular/examples/custom_ops/mandelbrot.py at main · modular/modular · GitHub

i modified both to use

WIDTH = 1024
HEIGHT = 1024
MAX_ITERATIONS = 64
MIN_X = -2.0
MAX_X = 1.0
MIN_Y = -1.5
MAX_Y = 1.5

i modified max version to do inference 1000 times to amortize graph compilation time

for i in range(1000):
    start_time = time.perf_counter()
    result = model.execute()[0]
    assert isinstance(result, Tensor)
    result = result.to(CPU())
    execution_time = time.perf_counter() - start_time
    execution_times.append(execution_time)

i am running on MacBook m3

mojo mandelbrot.mojo
Number of physical cores: 12
Vectorized: 4.213 ms
Parallelized: 0.885 ms

magic run mandelbrot
Execution Time Statistics (1000 runs):
Min time: 0.002 ms
Max time: 0.010 ms
Average time: 0.002 ms
Median time: 0.002 ms

how is it that fast? is it optimized out?

Ehsan · May 11, 2025, 9:33pm

Generally speaking, the custom ops approach involves the graph compiler which takes care memory planning, kernel fusion etc. For the Mandelbrot case it uses a special foreach function which can optimize element-wise algorithms to their fullest extent.

samufi · May 12, 2025, 8:28am

The difference is about factor 1000, which coincides with your number of repetitions in the max case, if I understood you correctly. Just to be on the safe side, could you change the number of repetitions to 10000 and confirm the difference remains factor 1000?

eshishkin · May 12, 2025, 8:41am

Using device: Device(type=cpu,id=0)

Execution Time Statistics (10000 runs):
Min time: 0.002 ms
Max time: 0.010 ms
Average time: 0.002 ms
Median time: 0.002 ms

eshishkin · May 12, 2025, 2:27pm

Then i changed cols and rows to 64
the difference is now only 10 fold

mojo times
Vectorized: 0.0218 ms
Parallelized: 0.0114 ms

max times the same
Min time: 0.002 ms
Max time: 0.013 ms
Average time: 0.002 ms
Median time: 0.002 ms

@Ehsan why cant mojo compiler do the same optimizations as max does?

BradLarson · May 12, 2025, 3:51pm

To be clear: both of these paths use the Mojo compiler, because the code for the Mandelbrot set in each is defined in Mojo and compiled either for direct execution in a standalone file or as a node in a MAX computational graph.

The two calculations here are performed in slightly different ways. As Ehsan indicated, the example that runs this a graph node uses our highly-efficient foreach abstraction which uses a number of tricks to fully parallelize across available hardware on CPU or GPU. I wouldn’t be surprised if it balances the load better than a manually vectorized implementation like the first.

Topic		Replies	Views
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	29	1567	October 6, 2025
CPU benchmark finding: Mojo vs Numba sensitive to default thread/runtime behavior — best practices for Mojo defaults? Mojo	8	198	March 30, 2026
I have discovered a suspect efficiency anomaly in the mojo compiler, how to proceed? Mojo discussion , mojo-compiler , 25_1	20	461	March 8, 2025
[Project] Mojo for Robotics: Porting GPU Navigation Kernels (Jetson / Strix Halo) Performance	8	541	May 17, 2026
Llama2.🔥 performance degradation after updating code to the latest Mojo compiler Mojo performance	2	239	November 21, 2025

Mojo vs max mandelbrot performance

Related topics