I built an ergonomic and highly compliant JSON library in pure mojo. Find usage examples and benchmark results in the repository!
I literally just finished a project for a class where I could have used this will check it out. Does it parse deeply nested json files? Either way this is really cool. I’m nerding out to the max rn
Very nice. I ran the benchmark with a pinned core on my zen 4 laptop (so not very scientific):
--------------------------------------------------------------------------------
Benchmark results
--------------------------------------------------------------------------------
name , met (ms) , iters , min (ms) , mean (ms) , max (ms) , duration (ms)
JsonParseSmall , 0.004516, 265923, 0.004516, 0.004516, 0.004516, 1200.894644
JsonArrayMedium , 0.010703, 100000, 0.010703, 0.010703, 0.010703, 1070.251731
JsonArrayLarge , 0.023752, 50554, 0.023752, 0.023752, 0.023752, 1200.772349
JsonArrayExtraLarge , 2.957336, 406, 2.957336, 2.957336, 2.957336, 1200.678550
JsonArrayCanada , 9.315350, 100, 9.315350, 9.315350, 9.315350, 931.534987
JsonArrayTwitter , 9.619888, 100, 9.619888, 9.619888, 9.619888, 961.988769
JsonArrayCitmCatalog , 13.546646, 88, 13.546646, 13.546646, 13.546646, 1192.104854
JsonStringify , 0.025460, 47074, 0.025460, 0.025460, 0.025460, 1198.496242
JsonStringifyCanada , 51.822438, 23, 51.822438, 51.822438, 51.822438, 1191.916068
JsonStringifyTwitter , 6.605087, 181, 6.605087, 6.605087, 6.605087, 1195.520797
JsonStringifyCitmCatalog, 11.119863, 100, 11.119863, 11.119863, 11.119863, 1111.986319
ValueParseBool , 0.000009, 100000000, 0.000009, 0.000009, 0.000009, 903.309820
ValueParseNull , 0.000007, 169310595, 0.000007, 0.000007, 0.000007, 1199.342806
ValueParseInt , 0.000015, 82296289, 0.000015, 0.000015, 0.000015, 1201.303085
ValueParseFloat , 0.000032, 37325867, 0.000032, 0.000032, 0.000032, 1202.584212
ValueParseString , 0.000143, 8602053, 0.000143, 0.000143, 0.000143, 1226.139617
ValueStringifyBool , 0.000042, 28896750, 0.000042, 0.000042, 0.000042, 1204.861508
ValueStringifyNull , 0.000042, 28612630, 0.000042, 0.000042, 0.000042, 1197.264981
ValueStringifyInt , 0.000045, 26639548, 0.000045, 0.000045, 0.000045, 1200.785410
ValueStringifyFloat , 0.000261, 4609638, 0.000261, 0.000261, 0.000261, 1201.257967
ValueStringifyString , 0.000122, 10000000, 0.000122, 0.000122, 0.000122, 1218.173832
There’s no explicit depth limit so you’re welcome to try and see if you can break it!
Thank you! Definitely lots of room for improvement, but I have enough testing in place now that I’m comfortable with starting to pull things apart and rewrite them. In particular float parsing since at the moment I read the full value and then pass it off to atof
instead of doing it all in one pass like most libs do.
Do you have any recommendations for how I can improve my testing methods? Just plainly running them on my laptop tends to produce more variance then I would like?
If you want to hop over to Linux, you can make use of hwloc-bind to fix the program to a thread or set of threads, which how I got the results I gave you. Somewhere in the mojo marathons channel I discuss what I consider a proper benchmarking configuration at length in a thread with Jack and Benny.
It seems like your code evolved a lot since I last looked at it .
I found some interesting hacks to parse and validate unstructured json while reading, still haven’t had the time to properly test and benchmark. If you’d like to look it’s here (I don’t remember but I think I already sent you the link lol).
If you’d want to take this eventually to the stdlib you’d need to look at Python’s API and try to make it possible while also having better perf options like what I’m seeing in your repo .
Maybe I’ll have more time after christmas and try to implement the same benchmarks you’re using
I’m going to check this out. Thanks for sharing!
I don’t think porting this to the stdlib is really on my radar right now unless the team asks for it (which I doubt they will). It’s good you mentioned it though as maybe I can start adding support for some of the options present in the python implementation, and then if the day comes it could be as simple as adding loads
and dumps
functions that wrap my existing code?
Yep, the Python APIs I think are quite simple to achieve returning object
. And we can also provide higher perf alternatives.
I don’t think porting this to the stdlib is really on my radar right now unless the team asks for it (which I doubt they will)
I think JSON is something that a lot of people expect to be in the stdlib when they come in. And I think that given the language features to develop it fully are already there, compared to networking and other goodies, this has a high chance of landing IMO
I think that some level of unstructured parsing will be required to support things like data exploration, since we can’t always do things the JSON ↔ struct way. That means something like this has a place in the standard library, but I think some out of tree evolution makes sense.
I was planning on adding structured json support at some point, but we can’t really do it properly until we have more fancy meta-programming tools correct?
Otherwise, maybe once I improve number parsing performance I can look into migrating this into the stdlib. I’ve been putting it off since the fast float parsing algos I’ve found are very long and opaque and otherwise I currently have to do some extra work for atof
to suffice which slows it down significantly.
Yes, we need reflection before we can do structured parsing, Zig style.
Float parsing is a rabbit hole but worthwhile to do, and something that the JSON parser may want to inline and do itself for extra perf, since JSON parsing is one of those things that is in the hot loop of many programs.
I know they’ve been doing work on float parsing in the stdlib, but they’re implementation seems to more permissive than the json standard itself so I end up needing to validate it myself beforehand.
Yes the atof implementation is not the optimum for json. I found a way to parse the number after validating (didn’t test or benchmark yet), and during validation you can differentiate between exponent and dot floats, then the number parsing can be accelerated using SIMD. You can see the algo here. I still haven’t looked at other implementations like simdjson, but I think we can still innovate a lot in this space.
Already mentioned this in the discord thread, I’m formally opening up to contributions if anyone here is so inclined. Mostly looking for help with performance optimizations. I’ve been staring at this code for too long so I’m sure there are some bottlenecks I’ve missed.
EmberJson is now available from the official modular community prefix channel!