Ask Steffi anything about efficient coroutine implementation in MLIR

Modular · December 6, 2024, 4:22pm

owenhilyard · December 9, 2024, 8:38pm

Reposted from the discord.

It seems like there isn’t any room in that design to support arena allocating the frames, nor any place for handling the allocation of a coroutine frame failing.

This is somewhat concerning to me because while being able to move to stack allocations is nice, being able to grab a right-sized allocation from an arena allocator is nicer, especially in the context of ensuring you have enough memory for the coroutine. For frequently allocated coroutines (consider the handle_request top-level function of an HTTP server), this means that instead of going through all of the machinery in tcmalloc you may be performing a dequeue operation on a ring buffer of free frames, substantially faster.

Would it be possible to have the coroutine take an alloc: Allocator[CoroutineFrameType] = DefaultMojoAllocator parameter in some way or otherwise inject an allocator into the coroutine? I’m still thinking over how I would want custom allocators to behave, but I know that this is a feature I and others will want.

For my specialty of databases, not being able to handle allocation failures (because the database is likely the largest memory consumer on any system it is on and typically has a lot of caching, so it can actually do something about allocation failures), means that you can’t use the feature in production code because it could lead to unnecessary crashes.

One other question was that it didn’t look like the frame was a tagged union as I would expect. Is it represented that way at the level we would see in MLIR reflection?

ivellapillil · December 12, 2024, 4:02pm

What is the roadmap for the async implementation?

taylorpool · December 12, 2024, 5:13pm

Your last slide touched on function coloring. How does Modular plan to address that problem, if at all?

Brian-M-J · December 12, 2024, 5:20pm

Have you had time to look at Rust’s implementation of coroutines? Can we get a comparison b/w Mojo’s coroutine implementation and Rust’s?

a2svior · December 12, 2024, 7:01pm

Are there docs or examples of how one could use the current async implementation in community libraries? Or is it too early for this

Steffi · December 12, 2024, 10:34pm

Hi @owenhilyard, thanks for the question!

There is room to support custom allocation techniques. The coroutine lowering does depend on an allocator, but we have the power to specify which allocator we use. That specification is not yet exposed in the language. The plan was to migrate from using a malloc call to a bump pointer allocator. The allocator would be created when invoking a coroutine from a synchronous context and passed down. I don’t see why we could not expose this in mojo to allow for user customization. This would only apply to memory coroutines and is separate from memory promotion, which can be skipped using a flag.

Re: handling the allocation of a coroutine frame failing, Mojo async is under early development and that case is not currently handled. However we do have a place to handle that situation. Async functions can be throwing functions and coroutines already have error slots so any failure to create a child coroutine can fail in the resume and propagate up the chain via the error slot.

Steffi · December 12, 2024, 10:36pm

Hey @ivellapillil Mojo Async is under early development and paused to address some higher priorities but I expect that sometime early to mid next year we will have a production quality version to release.

Steffi · December 12, 2024, 10:37pm

Hi @taylorpool! Thanks for asking. At this time, we don’t have plans to address function coloring.

Steffi · December 12, 2024, 10:38pm

Hi @Brian-M-J I have not yet examined Rust’s implementation but I would like to and upon release we can include a side by side comparison.

Steffi · December 12, 2024, 11:55pm

Hi @a2svior, great question. We don’t have docs or examples yet, but we’re hoping to do this in the new year.

owenhilyard · December 13, 2024, 1:43am

A few more questions since it’s been a bit and I’ve spent some time talking about async with @Nick.

How is the async scheduler designed right now? While work stealing is great for some workloads, it causes a lot of headaches for others with either cache misses or requiring thread safety bounds (like Send + Sync + 'static in Rust), due to the inability to determine whether a task will end up on another thread. This causes issues for things like io_uring, which is designed to either have one thread do all of the io, have an “IO lock”, or create an io ring per core. Thread per core isn’t as great at work sharing, but tends to make these issues go away and align better with shared-nothing designs.

Has any attention been paid to being generic over “asyncness”? Rust ran into issues with this essentially forcing everyone who talks to a database or does things with HTTP into using async.

Brian-M-J · December 13, 2024, 9:03am

Now that’s interesting. Is this side by side comparison only going to be between the stackless coroutines of C++, Rust and Mojo or are you considering other concurrency models as well (like Go and Hylo’s stackful coroutines, futures/promises, C++26’s senders and receivers etc)?

system · June 11, 2025, 9:03am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Structured Async for Mojo Mojo discussion , mojo-language-design , mojo-proposal	2	259	January 28, 2025
Language proposal: abstracting over `async` (the `yields` effect) Mojo discussion , 24_6 , mojo-language-design , mojo-proposal	0	102	February 5, 2025
Proposal: Linear / Explicitly Destroyed Types Mojo discussion , mojo-compiler	11	677	June 2, 2025
MLIR Region Passthrough Mojo mojo-compiler	2	76	May 21, 2025
Concurrency Support Mojo discussion , mojo-compiler	1	149	June 11, 2025

Ask Steffi anything about efficient coroutine implementation in MLIR

Related topics