Couldn’t find anything usable ATM by searching online. How do I use mojo if I need 2 different things on CPU simultaneously? I see there’s asyncrt but there’s no mutex/channel or even condvar…
Trying to figure out if mojo would be viable for my next project but starting to doubt it. I would highly appreciate an explanation on this!
The current state of async in Mojo is that it would be generous to call it “half baked”. We ran into a bunch of language level issues and Mojo does not currently have a formal threading model, primarily because we need to determine a threading model that works almost universally, applying not just to CPUs but also to GPUs and exotic ASICs. There are other things which are higher priority to fix before 1.0, such as traits, closures and the type system, so async is likely going to take a while.
Nothing actually stops you from using C ffi to lean on the platform threading primitives, but I know that’s not exactly an ideal path. If you need to start soon, you should probably look to another language. If you can wait and are willing, once the compiler opens up it should be easier to start making the language work better with threads and you can help develop threading from a language perspective.
Yeah, kind of disappointed that a general purpose language needs FFI to do threading.
I would even say I would accept an “experimental” MVP threading support bc otherwise many programs as simply impossible. Maybe just the ability to spawn OS threads and get 1 waker primitive?
I understand tho, once you provide an API that kind of works but is very quirky, you would get tons of complaints about whatever edge case people find…
If it helps at all, in our projects, we use Mojo as a Python extension. Then we lean on Python to create all the threads. In each thread, we hand off data to Mojo code in the extension and then drop Python’s GIL. Then, for any inter-thread synchronization, we use a custom (ie, really dumb) Mutex built out of atomics from Mojo’s stdlib. I can point you at some open source code if you’re interested in that approach.
In a nutshell, we have spin waits and sleep-loop waits. It’s crude, but effective.
Not terribly high-performance though. Thankfully for us, the sync itself isn’t a bottleneck in our app. But for other apps that have greater synchronization needs, you’d probably want to FFI into the OS for some kind of thread park/signal tools.
Yeah, kind of disappointed that a general purpose language needs FFI to do threading.
This is quite temporary, nobody is happy with this state of affairs, but sadly Modular does not have unlimited software engineers and the community can also only provide a finite amount of time. As a result, the kinds of structured parallelism you find in compute code have been added as quick workarounds. At this stage in Rust’s life, it had a garbage collector and was ~8 years away from adding the borrow checker.
I would even say I would accept an “experimental” MVP threading support bc otherwise many programs as simply impossible.
What does spawning a thread look like on a NPU or a GPU? Do we go with what NVIDIA calls threads, or do we make it warp level? AMD’s NPU is made up of a grid of tiny processors which don’t directly share memory (they have private scratchpads and then use DMA), how do I spawn a thread on that and what are the guarantees that hardware can provide? These are questions that lead to long discussions, most of which are still happening.
Maybe just the ability to spawn OS threads and get 1 waker primitive?
Sadly, there is nothing more permanent than a temporary solution. The current heavy obfuscation of the async runtime is somewhat intentional, since it means that nobody aside from Modular, who can break compatibility whenever they need to, makes use of it. Additionally, I am a strong advocate that the stdlib async runtime should hide itself behind a lot of interfaces, mostly to help ensure portability when someone writes a “better” one, and to ensure that the one in the stdlib can adopt new OS features as necessary.