How to multithreading & synchronize state (CPU)

SichangHe · April 9, 2026, 3:44pm

Couldn’t find anything usable ATM by searching online. How do I use mojo if I need 2 different things on CPU simultaneously? I see there’s asyncrt but there’s no mutex/channel or even condvar…

Trying to figure out if mojo would be viable for my next project but starting to doubt it. I would highly appreciate an explanation on this!

owenhilyard · April 9, 2026, 3:57pm

The current state of async in Mojo is that it would be generous to call it “half baked”. We ran into a bunch of language level issues and Mojo does not currently have a formal threading model, primarily because we need to determine a threading model that works almost universally, applying not just to CPUs but also to GPUs and exotic ASICs. There are other things which are higher priority to fix before 1.0, such as traits, closures and the type system, so async is likely going to take a while.

Nothing actually stops you from using C ffi to lean on the platform threading primitives, but I know that’s not exactly an ideal path. If you need to start soon, you should probably look to another language. If you can wait and are willing, once the compiler opens up it should be easier to start making the language work better with threads and you can help develop threading from a language perspective.

SichangHe · April 9, 2026, 4:12pm

Wow, thanks for the insta reply!

Yeah, kind of disappointed that a general purpose language needs FFI to do threading.

I would even say I would accept an “experimental” MVP threading support bc otherwise many programs as simply impossible. Maybe just the ability to spawn OS threads and get 1 waker primitive?

I understand tho, once you provide an API that kind of works but is very quirky, you would get tons of complaints about whatever edge case people find…

cuchaz · April 9, 2026, 5:34pm

If it helps at all, in our projects, we use Mojo as a Python extension. Then we lean on Python to create all the threads. In each thread, we hand off data to Mojo code in the extension and then drop Python’s GIL. Then, for any inter-thread synchronization, we use a custom (ie, really dumb) Mutex built out of atomics from Mojo’s stdlib. I can point you at some open source code if you’re interested in that approach.

SichangHe · April 9, 2026, 6:08pm

Thank you! Would love to see how bad it is

And more importantly how you wake…

Ideally we’d have main stuff in mojo and only use Python for packages tho.

cuchaz · April 9, 2026, 6:14pm

Sure, here you go. It’s reaaaaly dumb.

github.com/bartesaghilab/cryoluge

src/cryoluge/sync/mutex.mojo

main


from os.atomic import Atomic, Consistency
from memory import alloc, UnsafePointer
from time import sleep


struct Mutex[
    T: AnyType & Movable = NoneType
](
    Movable
):
    var _ptr: Self._Ptr
    var _item: T

    alias _Val = Scalar[DType.int]
    alias unlocked = Self._Val(0)
    alias locked = Self._Val(1)

    alias _Ptr = UnsafePointer[Self._Val,MutOrigin.external]

This file has been truncated. show original

In a nutshell, we have spin waits and sleep-loop waits. It’s crude, but effective.

Not terribly high-performance though. Thankfully for us, the sync itself isn’t a bottleneck in our app. But for other apps that have greater synchronization needs, you’d probably want to FFI into the OS for some kind of thread park/signal tools.

owenhilyard · April 9, 2026, 7:26pm

Yeah, kind of disappointed that a general purpose language needs FFI to do threading.

This is quite temporary, nobody is happy with this state of affairs, but sadly Modular does not have unlimited software engineers and the community can also only provide a finite amount of time. As a result, the kinds of structured parallelism you find in compute code have been added as quick workarounds. At this stage in Rust’s life, it had a garbage collector and was ~8 years away from adding the borrow checker.

I would even say I would accept an “experimental” MVP threading support bc otherwise many programs as simply impossible.

What does spawning a thread look like on a NPU or a GPU? Do we go with what NVIDIA calls threads, or do we make it warp level? AMD’s NPU is made up of a grid of tiny processors which don’t directly share memory (they have private scratchpads and then use DMA), how do I spawn a thread on that and what are the guarantees that hardware can provide? These are questions that lead to long discussions, most of which are still happening.

Maybe just the ability to spawn OS threads and get 1 waker primitive?

Sadly, there is nothing more permanent than a temporary solution. The current heavy obfuscation of the async runtime is somewhat intentional, since it means that nobody aside from Modular, who can break compatibility whenever they need to, makes use of it. Additionally, I am a strong advocate that the stdlib async runtime should hide itself behind a lot of interfaces, mostly to help ensure portability when someone writes a “better” one, and to ensure that the one in the stdlib can adopt new OS features as necessary.

Topic		Replies	Views
Green threads and runtime Mojo discussion , feature-request , mojo-compiler , 24_6 , mojo-language-design	6	552	February 16, 2025
Structured Async for Mojo Mojo discussion , mojo-language-design , mojo-proposal	2	528	January 28, 2025
Concurrency Support Mojo discussion , mojo-compiler	1	484	June 11, 2025
313t and 314t Python Interop Python Interop discussion	1	139	May 26, 2026
How to write async code in mojo🔥? Mojo discussion , 24_6	3	809	January 24, 2025

How to multithreading & synchronize state (CPU)

Related topics