Marellel is a high-performance GPU acceleration library built entirely using Mojo. It serves as both a demonstration and a tool for other potential future, showcasing how Mojo can be used to write low-level, high-performance GPU kernels that are traditionally written in CUDA. The library implements fundamental parallel computing patterns, from vector addition to a highly-optimized parallel reduction to showcase the power of Mojo.
I built marallel as a way to push the greatest of GPU computation as someone fascinated by the concept of parallel computing, I wanted to put Mojo the self described GPU language to the ultimate test. My goal for this hackathon was to explore the very foundations of GPU programming within the Mojo ecosystem. The project, named Marellel (a play on “parallel”), is the result of that exploration to make high-performance computing more accessible.
I had zero experience with building on GPUs but I had some minor experience in Mojo so when I had access to a NVIDIA A10 from the Lambda Labs credits, I got to work fairly quickly. After a lot of troubleshooting with GPU block stuff, I managed to get a working version of all of the parallel functions within all of the mojo modules. After I got it working I shut off the GPU instance to prevent using all of the credits cause my friend with the credit card (I didn’t have one) was scared to charge his credit card but thankfully I was able to fully finish this project before the time was up on that.