Intro
So this is an idea I’ve been playing around with for sometime and I posted over on the swift forums. I know Modular is a startup focused on finding their own product market fit, but I wonder if the good folks here would be interested in an open source moon shot project like this.
Opportunity
I think traditional apps are going to be basically irrelevant within 5 years thanks to advancing Agentic AI workflows. As such, big tech has no ecosystem moat anymore, there’s not so much a reason to stay with a traditional OS anymore. Just like Google’s search monopoly is suddenly vulnerable, I think the OS monopolies among macOS, iOS, Linux, Android, and Windows are as well.
Having a million apps won’t matter in a few years when AI agents do it all better. An OS that leans into this will provide a better experience than a traditional OS with Agents bolted on top.
This approach would be useful across traditional or XR devices, since the UI will be minimal and driven by chat and voice, as well as being useful for robotics since inference would be fast.
Experience
Could we not create a new kind of OS focused on fast inference (built on Max?) with composable AI agents and widgets instead of isolated apps? This has to be the future of computing, and it seems a new OS would provide a lot of value to a lot of people creating this kind of experience.
No longer would the OS experience center around a grid of apps, but rather suggested prompts for jobs to be done.
Ask
I want to build this, but I’m way out of my depth, and would love help if anyone here catches this vision. I would think it would benefit Modular a lot as well. I’m also happy to hear advice or reasons why it wouldn’t work.
There are definitely very interesting questions to be asked about what development looks like when you’re operating at the level of composable agents vs. traditional application development.
We’ve been working to enable high-level composition of LLMs using Modular’s technology. One of the first areas you see this is in 24.6’s release of MAX Serve with its support for a subset (growing in future releases) of the standard OpenAI API specification for LLM interaction. That lets LLMs served and accelerated via MAX interoperate with the many tools being built against the OpenAI API spec, and we’ve been using it ourselves to connect standard chat clients (like Open WebUI) into MAX-accelerated LLMs.
We have a clear interest in agent-based workflows, so keep an eye on nightlies and future releases for new capabilities to enable them. We’d be glad to hear more about how we could help to enable this, though.
That sounds great, and agreed lots of questions and rapid development in this space at both the technical and the UX level. Any advice on how to learn more about this? I’m primarily an application developer dipping my toes into OS and systems programming, as well as AI engineering. Is there like a CS50 for OS architecture? Also for AI I’ve been looking at Courses - DeepLearning.AI but there’s nothing for Max or Modular, would you guys do a collab with Andrew at some point?
I’m not familiar with Open WebUI, what is it? a collection of models and agents, or webUI generated on the fly by LLMs?
And FWIW this is the kind of UX I’m imagining (very rough mockups); Computers could become much simpler and easier for non-technical folks to use I think.
It’s been a long time since I did much work at the OS level (my dinosaur book has been gathering dust), but you might be interested in some of the concepts that are being explored in the Fuchsia OS. In particular, the modular design there may be conceptually similar to how independent agents may communicate through common interfaces. At one point, Swift was even supported as a language for building components in the Fuchsia OS, although I believe Rust is the current language of choice for constructing modules.
Open WebUI is an open-source interface for LLMs, largely oriented around chat-based experiences. It’s feature-rich, and can work against a variety of protocols, like OpenAI’s API standard. It’s a pure interface layer, with LLM serving occurring elsewhere (like with MAX Serve, for example). I will caution that it’s not a SwiftUI analogue, that’s a very different class of interface.
I forgot about Fuchsia… wow, after a many hour rabbit trail across the documentation and internet and comparing ideas with Grok, yes, this seems exactly what I’m looking for actually. The modular design, the composability, and I didn’t know it was inspired by iOS and other apple history, as well as beOS inspirations, which is kinda neat.
This helps me redefine and narrow my scope. What I really want to do is, create a desktop environment for Fuchsia in Swift, for a custom LLM driven UX (powered by Max I imagine), as well as enabling swift application development on the platform. This would be in contrast to their flutter system, and despite Google have cancelled it as a “workstation” project. It still seems Fuchsia is cutting edge while having more support overall than say RedoxOS or HaikuOS.
In the future, if it made sense, efforts could be made to re-introduce swift as a language to contribute to core user land systems currently written in C++, or even the Zircon kernel, but I recognize this is not necessarily practical nor may be welcomed, at least at this time.
so my first step will be to build on top of Scenic and Flatland (their window compositor and 2d rendering engine, built on Vulkan) Scenic, the Fuchsia system compositor
Ok, that book looks like a classic (pun intended) I will check it out.
I see, I will investigate it further, I’m familiar with the OpenAI API standard, but not much else in that area yet.
As an aside, is there any benefit or effort to building UI with mojo? Would something like Streamlit see a performance increase using mojo instead of regular python? Or is it not relevant to that?
In the current state, I don’t think building a fully Mojo OS is doable. Mojo doesn’t really support a free standing mode (meaning a mode where it can run without a libc), so doing pure Mojo is probably a long ways off.
As for UI, no efforts are being made, mostly because we would at least need trait objects first to avoid very painful manual vtable writing. Streamlit would probably be faster, but that’s like making parsing the config file on a training job faster, it’s only going to help so much.
What you probably want is to make some applications which have an API which exposes their internal state in some format LLMs can understand (over Unix sockets for local or TCP for remote), and then some kind of app launcher with LLM integration. There’s a lot of painful stuff you probably don’t want to rewrite (timezones, C Locales, etc), so building an app ecosystem is likely to be both far less effort and a far better user experience. One major concern I would have about this is that users tend to want their stuff to stay local, so you’d need to either have one very smart model or a bunch of smaller models you can quickly swap around. You also probably want some kind of way to prevent the LLM from doing certain kinds of tasks (ex: deleting files). Ideally, you would want something closer to a multi-model architecture so that you can have smaller, specialized models to save on VRAM, possibly with a big LLM backing all of it.
As for an intro to OS course, the MIT one is open access: 6.1810 / Fall 2024. It’s quite a bit beyond CS50, since operating systems are hard.
Thank you for all of this information, and I see about mojo, makes sense.
Agreed, what I am wanting to do is some kind of agent app ecosystem and launcher UI / paradigm.
And yeah I really don’t want to dig into all that stuff if I don’t need to, I’ve done applications for some time but was mostly self taught (I took CpE classes but skipped the CS ones, much to my regret now later in life) and now I’m wanting to dig a little deeper into OS development purely out of curiosity but then also with this project see where it makes sense to add value vs just reinventing the wheel.
Agreed several small local agents seem best, I was already thinking this way, and then some kind of smarter, larger, cloud agent to call if needed. Actually, a lot like Apple Intelligence architecture in this way, but the whole experience and technical stack is built around it.
I think i need to
prototype the desired experience more just in SwiftUI as a mac app
learn more about Mojo, Max, and Llama (I’ve just deployed with Ollama so far) and what hardware I can run the stack on.
learn more about window compositors, the Fuchsia stack, and how to build on top of it. I need to search for Fuchsia dev communities.