On the new json module

bgreni · May 6, 2025, 5:33pm

I see in the recent nightlies that a json module was introduced to the stdlib, which I think is a good thing, but I’m curious why we didn’t consider porting in the EmberJSON implementation. At the time I started the project I was told that a json module was unlikely to be included in the stdlib, so naturally I built it as a community library, but since that is no longer the case. I think it would make sense to take advantage of the work already done, rather than reinvent the wheel?

At the time of writing this EmberJSON has the following benefits

0 Dependencies
114 existing tests
An existing benchmarking suite
Superior performance to the current stdlib implementation
UTF-8 support
Fast float point parsing using the simdjson algorithm, with a truncating fallback implementation for floats with many decimal points
Distinct parsing of integers and floats to avoid precision loss for large integers
SIMD-accelerated minifier
SIMD-accelerated string parsing
Tree-based object representation to improve large object parsing performance
Pretty-print formatting

Granted I do employ a fair amount of unsafe code to achieve this, so if the team is unwilling to adopt it for that reason, then I respect that. I just figured I would prod a bit here out if curiosity.

- Brian

sora · May 6, 2025, 6:20pm

We should at least ~~blindly~~ port all the tests.

gryznar · May 6, 2025, 6:32pm

Also as I’ve pointed at Discord, current naming is far away from being good. load as free function is just a strict copy from Python which API is confusing. load and loads are not informative and have sense only with module name context.

from json import load

var _ = load(...) # not enough context to make `load` unambigous

As stated here there are no objections to have different API at Mojo side with extension support for Python compatibility.

Overall I am +1 for using EmberJson as a base for future improvements

@joe gentle ping as you are stdlib team leader

martinvuyk · May 6, 2025, 7:45pm

FWIW if anyone ever builds something that uses json for compute-heavy workloads, then they are probably going to search for libraries like EmberJSON.

I’m guessing, but I imagine the thought process behind the decision was somewhere along the lines of: “less lines, less maintenance”. The stlib implementation stands currently at around 1270 lines.

I do however sincerely hope that EmberJSON is fully incorporated into the stdlib, it is faster and much better organized. I think many in the community would be willing to step in as maintainers. Unstructured JSON parsing is something that should be SOTA in the stdlib, maybe we can leave out structured JSON for external libraries.

FWIW I also feel that I would become quite distraught if something similar were to happen with an inferior datetime library being incorporated into the stdlib instead of the one I’ve worked on for months with the express goal of integrating it into the stdlib. IMO Modular should not disregard the community like this…

adakkak · May 6, 2025, 11:27pm

Hi Bgreni,

I wasn’t aware of the EmberJSON project - thanks for bringing it to my attention. I’d definitely encourage you to submit a PR replacing our existing JSON module with what you’ve developed. Your implementation looks to be of higher quality than what I’ve put together.

To be transparent, I wrote the current JSON code during a plane ride as a quick solution, so there’s nothing particularly special about it. I’d be happy to see it replaced with your more robust implementation.

Looking forward to your contribution!

Best regards

bgreni · May 6, 2025, 11:38pm

Thanks Abdul, I appreciate the update! I’ll touch base with the stdlib team later to figure out how best to move forward with that since it would be a fairly hefty contribution

adakkak · May 6, 2025, 11:42pm

Just FYI: No need to integrate anything. I’d be happy to delete the json.mojo file and replace it with your implementation if that’s the easiest approach.

bgreni · May 6, 2025, 11:49pm

I’ve done a few things that Joe and others might take issue with (like write number parsing and stringification from scratch) from a maintenance burden point of view. So mostly there might need to be some discussion around that

gabrieldemarmiesse · May 7, 2025, 7:18am

Purely for coordination purposes, here is the PR for fast float parsing. If everything goes well, I should get a review this week.

joe · May 7, 2025, 5:36pm

Hey @bgreni I’ll take a look at your library and let’s work together with @gabrieldemarmiesse as there’s likely room for overlap in creating space for a “format” module to contain these shared utilities.

bgreni · May 7, 2025, 5:50pm

Thanks Joe! Admittedly the documentation is a bit sparse, so let me know if you have any questions

martinvuyk · May 7, 2025, 10:18pm

I think there might be room for a json-specific implementation or at least splitting up/parametrizing atof, because there are things like 1e10_0 which aren’t part of the json standard

bgreni · May 7, 2025, 10:23pm

If I had it my way I would probably prefer keeping them separate for such reasons, as well as I am inlining things much more aggressively then we probably should be for the general use atof function, but in the end it’s up to the team of course.

strangemonad · May 20, 2025, 5:43am

@bgreni @joe @adakkak, I’m loving the discussion about potentially including emberjson into the stdlib. I’d like to suggest a few thoughts related to data serialization / deserialization and data format support. I’m not sure if this is the right place to have that discussion?

The topics can roughly be organized as follows:

Ser/De and data-formats beyond json
What formats belong in std-lib

1. Serde beyond json

One of the things that both go and rust did really well was expose common mechanisms / traits for struct serde to and from various data formats. This is something that’s lacking in pure python but would be great to have in the stdlib for mojo.

Go exposed the conventional way of using tags + a few common interfaces (the various Mashaller and Unmarshaller interfaces in package encoding), to drive data marshaling. This is probably one of the language features that’s allowed go to be used heavily to define APIs and data types across the cloud-native ecosystem (eg the entire kubernettes API is canonically defined as go structs)
Rust obviously has the serde crate for a long time now. I haven’t followed recent developments of alternatives (eg sud, merde, etc) so some of the thoughts around state of the art might have evolved here

2. Implementations

I think the ecosystem would benefit from defining the core ser/de or marshalling traits in the std lib but not all data formats need to live there. I’m not sure if something like this has been discussed yet, but in a world where it’s easy to fetch packages with a package manager, it might be nice to keep the std lib smaller and setup an set of official extension packages similar to how go structures its various “x” packages Sub-repositories - Go Packages (ie those part of the go project but out of tree).

For things that might have an open ended number of variants like data formats or hashing algorithms, especially ones that might got stale over the years, it might be nice to set up an out of tree package structure that proactively resilient to future changes while still providing “official” implementations.

martinvuyk · June 17, 2025, 4:43pm

Hi @strangemonad

Serde beyond json

I think we are mostly waiting for reflection and parametric traits to enable good ser/de capabilities and traits.

Implementations

As for having different repos for the implementation, that sounds interesting. But I’m not sure how the stdlib team would be able to handle it given the whole modular stack is a monorepo

This is worth posting as a separate forum post about ser/de. But probably later on once we have some more language features

Topic		Replies	Views
EmberJson: JSON parsing in pure mojo Community Showcase	22	523	April 2, 2025
Reflection or other features for structured parsing? Mojo discussion , 24_6	2	125	February 4, 2025
Exploring Metaprogramming in Mojo Content	1	393	May 28, 2025
On the compatibility, familiarity, ergonomics, dynamism, and API design for the Python <-> Mojo relationship Language Design discussion	10	285	April 12, 2025
Need help with EmberJson package Mojo	7	83	May 31, 2025

On the new json module

1. Serde beyond json

2. Implementations

Related topics