mojoBLAS: A pure Mojo implementation of BLAS routines. (yes, peak naming creativity
)
I started this while working on numerical backends for NuMojo and ended up going down the rabbit hole of implementing the full BLAS routines in Mojo. This will later be embedded into my existing work on SciJo.
Current coverage
- Level 1: 12 routines
- Level 2: 16 routines
- Level 3: 6 routines
What’s included
- Pure Mojo kernels: no external BLAS dependencies in the core implementation.
- Generic support for real data types via
DType- Traditional
s/dprefixes are removed since Mojo handles this generically
- Traditional
- Test coverage across all levels
- Results validated against OpenBLAS
- Benchmark scripts comparing performance with system OpenBLAS
What’s not included (yet)
- Optimized routines
- Level 2 and Level 3 currently use naive implementations (this is reflected in benchmarks xD)
- Some of Level 1 routines include SIMD vectorisation.
- Complex number support
- Still exploring the best way to represent and handle complex types generically in Mojo.
Check out the benchmark plots here. There’s plenty of room for optimization, from low hanging improvements to more hardcore tuning. Contributions are very welcome!!! If you are interesting in going down the optimisations rabbit hole, I’ll be happy to take PRs. Also, I’m not super confident in my benchmarking abilities, so if that’s your thing, feel free to take a crack at it ![]()
Happy computing!
Repo: GitHub - shivasankarka/mojoBLAS: Implementation of BLAS routines in pure Mojo 🔥 · GitHub
Reference: https://www.netlib.org/blas/