Mojo Q3 Roadmap Update

We’re excited to share a sneak peek of what’s planned for Mojo over the next few months! :partying_face: Along with a look ahead at what’s coming this quarter, we’ve also included a review of what shipped over the past several months. This roadmap update includes the Mojo standard library, compiler, tooling, and packaging. Please post any questions and feedback in this thread!

Mojo Standard Library

Q2 (April-June) Recap

  • Open sourced remaining parts of stdlib and kernel library, with full commit history!
  • Shipped initial Python-Mojo bindings (25.4)
  • Blackwell bringup: core intrinsics support to enable performant kernels on B200
  • Consumer grade GPU AMD support
  • Hashing interface improvements; Dict and friends now parameterized with Hasher trait too

Overview of Q3 (July-September) Priorities

  1. Core standard library evolution
  2. Improved ergonomics of Python-Mojo bindings
  3. Kernel developer velocity
  4. Blackwell support

1. Core standard library evolution

Why?

Core abstractions and low-level building blocks are paramount for making Mojo a productive language for kernel authors and general developers.

How?

  • Generic programming improvements
    • Make SIMD conform to EqualityComparable and other traits
    • Core traits (such as applying Iterator in places)
    • Clarify bounds checking and negative indexing
    • Apply new language features to rewrite and modernize parts of the library (requires clauses and extensions, for example)
  • Quality
    • Keep closing the gaps on implicit conversions:
      • Make conversion from Int to UInt explicit
    • Fix footguns of immutable UnsafePointers treated as mutable in APIs
    • Improve hardware target handling and how the core library can generalize for other targets

2. Improved ergonomics of Python-Mojo bindings

Why?

Python-Mojo bindings enable the integration of Mojo within a broader developer ecosystem and are an important 1.0 feature.

How?

  • Mojo methods with keyword arguments
  • Support calling Mojo static methods
  • Overloaded Mojo functions
  • Non-default constructors
  • And more!

3. Kernel developer velocity

Why?

Accelerate model bring-up time by improving productivity tools for kernel developers.

How?

  • Remove use of NDBuffer from kernels in favor of LayoutTensor
  • Apply GPU function type checking across kernel library
  • Unify layout and runtime layout
  • Improve construction and initialization of static and dynamic tensors
    • Remove LayoutTensorBuild in favor of existing constructors in LayoutTensor and/or new convenience factory methods on LayoutTensor.
    • Remove ManagedLayoutTensor so users explicitly use LayoutTensor instead of creating type bifurcation again.
  • LayoutTensor API improvements

4. Blackwell support

Why?

Enable developers to run Mojo on NVIDIA’s latest, most advanced GPUs.

How?

  • Implement performant matmul for B200
  • Make Hopper and Blackwell “feature complete”

Mojo Compiler

Q2 (April-June) Recap

  • Trait unions
  • Implicit Explicit trait conformance
    • Paves the way to conditional conformance
  • Parametric aliases
  • Reference types
  • List and dictionary literals
  • Compiler speedup
    • 10-20% on matmul kernels
# Trait unions
alias CopyableAndMovable = Copyable & Movable

# Parametric aliases
alias Scalar[DT: DType] = SIMD[DT, 1]

# List and dictionary literals
var items = [1, 2, 3]

# Ref types
for ref item in items:
   item = item + 1

Q3 (July-September) Mojo Language Features

We will continue to build up the Mojo type system, to make the language more expressive and allow writing more concise and more readable code.

  • Default methods in traits
trait Comparable:
 fn compare(s: Self, other: Self) -> Int

 fn __eq__(s: Self, other: Self):
     return compare(s, other) == 0
 fn __ne__(s: Self, other: Self):
     return compare(s, other) != 0
# Unified closures
fn main(y: Int):
    # Syntax for captures TBD
    fn closureFunc(x:Int): 
        return x+y

    # can use the same closure in both parameter and argument domains
    myFun[SomeParam=closureFunc](someArg=closureFunc)
  • Struct extensions
# Struct extensions
struct Complex:
   ...

# Separately register conformance of this type to a trait
# This can be done in another .mojo file
extension Complex(EqualityComparable):
   fn __eq__(self, other: Self) -> Bool:
       return self.re == other.re and self.im == other.im
  • Require clauses
# Require clauses
struct SIMD[dtype: DType, size: Int]
   requires size.is_power_of_two(), "SIMD width must be power of 2"
   requires dtype is not DType.invalid, "Invalid type"

Other Q3 Projects

Our focus is on making Mojo easier to use, shorten iteration cycle for Mojo developers.

  • Rewrite parameter inference
    • More powerful algorithm, fix known bugs and limitations
  • Clean up and improve error messages
    • Start with constrained[] errors
  • Keep working on speeding up the Mojo compiler

Mojo Tooling

Q2 (April-June) Recap

  • Formatter Enhancements: Implemented sorting for struct and trait conformances in mojo format, improving code organization
    and consistency
  • Documentation Generation:
    • Improved docstring section parsing for empty sections
    • Added associated aliases to trait documentation
    • Enhanced type name readability by trimming builtin namespaces
    • Fixed formatting for getitem methods to display as subscript syntax
  • Compiler Infrastructure:
    • Consolidated all --emit-X functionality into a unified --emit=X command line parameter
  • REPL Improvements: Fixed incorrect value reporting for SIMD[DType.bool, 4] types
  • Test Infrastructure: Enhanced test discovery to skip .pixi directories
  • LSP Core Improvements:
    • Implemented selective parsing for improved performance
    • Added progress reporting for document parsing with random progress tokens
    • Enhanced document update handling with correct range processing
  • Profiling Infrastructure:
    • Connected Python Tracer color parameter to NVTX spans
    • Added environment variable support for default profiling state via MODULAR_ENABLE_PROFILING
    • Fixed tracing issues with correct nanobind type usage
    • Upgraded ROCm to version 6.3.3 for AMD GPU support
    • Fixed clean shutdown handling in serve functionality

VS Code Extension Modernization

  • Decouple the extension from the monorepo and open source the extension by moving it to its own repository.
  • Allow independent versioning and updates from stable and nightly releases of MAX
  • Align the extension release cycle with actual changes, not the entire SDK release cycle.
  • Only have one extension for both nightly and stable releases of Mojo that only updates in response to breaking changes.
  • Simplify the extension’s architecture by relying on the user’s Python virtual environment for locating binaries, rather than bundling or using our package system to fetch them.

Mojo Packaging

  • For pure Mojo projects, we continue to recommend using Conda packages and Pixi to manage development environments and dependencies.
  • For Mojo projects with a Python interface, we plan to support packaging as .whl files.
  • We’re working to improve the experience of distributing and installing packages via the Modular community channel and Pixi, and we want to involve the community more directly in this area. Feedback on the current experience is welcome!
    • A key area for improvement is nightly compatibility. Many community packages are not compatible with the latest nightly builds, and in some cases, even the latest stable release. We want to strike a balance between easing the maintenance burden for package maintainers and improving reliability. This includes making it easier to distribute nightly-compatible versions and enforcing stricter compatibility requirements, especially with the latest stable release.

Hardware Support Update

Blackwell support:

  • Mojo now has full support for running on NVIDIA Blackwell, with all intrinsics added.
    • Examples of Mojo code that uses Blackwell are now open source.
    • FlashAttention 3-style MHA now has Blackwell support.

Community contributions welcomed to help build out standard library and kernel support on consumer GPUs, particularly AMD RDNA and Apple Silicon GPUs.

21 Likes

Congrats on the amazing progress so far :flexed_biceps:

I’m very curious about the last point: is it „just“ a kernel issue that prevents Apple Silicon GPU support for now? I thought there was more to it (something like the underlying infra of accessing the GPU at all).

I’d be happy to support if I can.

The team is working on enabling basic Apple Silcon GPU support and will share that when the time is right. When that happens, there will be ample kernel performance work that it would be great to collaborate on! Stay tuned.

8 Likes