EmberJson: JSON parsing in pure mojo

bgreni · December 12, 2024, 5:05am

I built an ergonomic and highly compliant JSON library in pure mojo. Find usage examples and benchmark results in the repository!

rcmpge · December 12, 2024, 1:58pm

I literally just finished a project for a class where I could have used this will check it out. Does it parse deeply nested json files? Either way this is really cool. I’m nerding out to the max rn

owenhilyard · December 12, 2024, 2:37pm

Very nice. I ran the benchmark with a pinned core on my zen 4 laptop (so not very scientific):

--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name                    , met (ms)    , iters    , min (ms)    , mean (ms)   , max (ms)    , duration (ms)
JsonParseSmall          ,     0.004516,    265923,     0.004516,     0.004516,     0.004516,  1200.894644
JsonArrayMedium         ,     0.010703,    100000,     0.010703,     0.010703,     0.010703,  1070.251731
JsonArrayLarge          ,     0.023752,     50554,     0.023752,     0.023752,     0.023752,  1200.772349
JsonArrayExtraLarge     ,     2.957336,       406,     2.957336,     2.957336,     2.957336,  1200.678550
JsonArrayCanada         ,     9.315350,       100,     9.315350,     9.315350,     9.315350,   931.534987
JsonArrayTwitter        ,     9.619888,       100,     9.619888,     9.619888,     9.619888,   961.988769
JsonArrayCitmCatalog    ,    13.546646,        88,    13.546646,    13.546646,    13.546646,  1192.104854
JsonStringify           ,     0.025460,     47074,     0.025460,     0.025460,     0.025460,  1198.496242
JsonStringifyCanada     ,    51.822438,        23,    51.822438,    51.822438,    51.822438,  1191.916068
JsonStringifyTwitter    ,     6.605087,       181,     6.605087,     6.605087,     6.605087,  1195.520797
JsonStringifyCitmCatalog,    11.119863,       100,    11.119863,    11.119863,    11.119863,  1111.986319
ValueParseBool          ,     0.000009, 100000000,     0.000009,     0.000009,     0.000009,   903.309820
ValueParseNull          ,     0.000007, 169310595,     0.000007,     0.000007,     0.000007,  1199.342806
ValueParseInt           ,     0.000015,  82296289,     0.000015,     0.000015,     0.000015,  1201.303085
ValueParseFloat         ,     0.000032,  37325867,     0.000032,     0.000032,     0.000032,  1202.584212
ValueParseString        ,     0.000143,   8602053,     0.000143,     0.000143,     0.000143,  1226.139617
ValueStringifyBool      ,     0.000042,  28896750,     0.000042,     0.000042,     0.000042,  1204.861508
ValueStringifyNull      ,     0.000042,  28612630,     0.000042,     0.000042,     0.000042,  1197.264981
ValueStringifyInt       ,     0.000045,  26639548,     0.000045,     0.000045,     0.000045,  1200.785410
ValueStringifyFloat     ,     0.000261,   4609638,     0.000261,     0.000261,     0.000261,  1201.257967
ValueStringifyString    ,     0.000122,  10000000,     0.000122,     0.000122,     0.000122,  1218.173832

bgreni · December 12, 2024, 4:18pm

There’s no explicit depth limit so you’re welcome to try and see if you can break it!

bgreni · December 12, 2024, 4:28pm

Thank you! Definitely lots of room for improvement, but I have enough testing in place now that I’m comfortable with starting to pull things apart and rewrite them. In particular float parsing since at the moment I read the full value and then pass it off to atof instead of doing it all in one pass like most libs do.

Do you have any recommendations for how I can improve my testing methods? Just plainly running them on my laptop tends to produce more variance then I would like?

owenhilyard · December 12, 2024, 5:59pm

If you want to hop over to Linux, you can make use of hwloc-bind to fix the program to a thread or set of threads, which how I got the results I gave you. Somewhere in the mojo marathons channel I discuss what I consider a proper benchmarking configuration at length in a thread with Jack and Benny.

martinvuyk · December 12, 2024, 6:06pm

It seems like your code evolved a lot since I last looked at it .

I found some interesting hacks to parse and validate unstructured json while reading, still haven’t had the time to properly test and benchmark. If you’d like to look it’s here (I don’t remember but I think I already sent you the link lol).

If you’d want to take this eventually to the stdlib you’d need to look at Python’s API and try to make it possible while also having better perf options like what I’m seeing in your repo .

Maybe I’ll have more time after christmas and try to implement the same benchmarks you’re using

dmeaux · December 13, 2024, 10:44pm

I’m going to check this out. Thanks for sharing!

bgreni · December 14, 2024, 7:15pm

I don’t think porting this to the stdlib is really on my radar right now unless the team asks for it (which I doubt they will). It’s good you mentioned it though as maybe I can start adding support for some of the options present in the python implementation, and then if the day comes it could be as simple as adding loads and dumps functions that wrap my existing code?

martinvuyk · December 14, 2024, 7:36pm

Yep, the Python APIs I think are quite simple to achieve returning object. And we can also provide higher perf alternatives.

I don’t think porting this to the stdlib is really on my radar right now unless the team asks for it (which I doubt they will)

I think JSON is something that a lot of people expect to be in the stdlib when they come in. And I think that given the language features to develop it fully are already there, compared to networking and other goodies, this has a high chance of landing IMO

owenhilyard · December 14, 2024, 7:37pm

I think that some level of unstructured parsing will be required to support things like data exploration, since we can’t always do things the JSON ↔ struct way. That means something like this has a place in the standard library, but I think some out of tree evolution makes sense.

bgreni · December 14, 2024, 8:43pm

I was planning on adding structured json support at some point, but we can’t really do it properly until we have more fancy meta-programming tools correct?

Otherwise, maybe once I improve number parsing performance I can look into migrating this into the stdlib. I’ve been putting it off since the fast float parsing algos I’ve found are very long and opaque and otherwise I currently have to do some extra work for atof to suffice which slows it down significantly.

owenhilyard · December 14, 2024, 8:46pm

Yes, we need reflection before we can do structured parsing, Zig style.

Float parsing is a rabbit hole but worthwhile to do, and something that the JSON parser may want to inline and do itself for extra perf, since JSON parsing is one of those things that is in the hot loop of many programs.

bgreni · December 14, 2024, 9:02pm

I know they’ve been doing work on float parsing in the stdlib, but they’re implementation seems to more permissive than the json standard itself so I end up needing to validate it myself beforehand.

martinvuyk · December 16, 2024, 3:21pm

Yes the atof implementation is not the optimum for json. I found a way to parse the number after validating (didn’t test or benchmark yet), and during validation you can differentiate between exponent and dot floats, then the number parsing can be accelerated using SIMD. You can see the algo here. I still haven’t looked at other implementations like simdjson, but I think we can still innovate a lot in this space.

bgreni · January 7, 2025, 7:17pm

Already mentioned this in the discord thread, I’m formally opening up to contributions if anyone here is so inclined. Mostly looking for help with performance optimizations. I’ve been staring at this code for too long so I’m sure there are some bottlenecks I’ve missed.

bgreni · January 22, 2025, 7:57pm

EmberJson is now available from the official modular community prefix channel!

bgreni · February 26, 2025, 4:18pm

Bit of a late announcement but A 25.1 compatible release of EmberJSON (0.1.3) is available from the modular community channel. The MAX builds page is lying about the supported versions lol

joshpeterson · February 27, 2025, 12:06pm

@hogepodge How can we update the MAX builds page to have the proper supported version for this package?

Caroline · February 27, 2025, 6:52pm

We’re working on updating this and I’ll report back when it’s fixed

Topic		Replies	Views
On the new json module Mojo discussion	14	342	June 17, 2025
Need help with EmberJson package Mojo	7	83	May 31, 2025
Reflection or other features for structured parsing? Mojo discussion , 24_6	2	125	February 4, 2025
Mimage: An image parsing library written in Mojo Community Showcase	1	125	April 13, 2025
Releasing the Nabla Python API Community Showcase	4	106	June 17, 2025

EmberJson: JSON parsing in pure mojo

Related topics