With all the powerful new type reflection utilities that have come down the pipeline, I was able to easily implement structured de/serialization in EmberJSON. Then recently I got the idea to explore adding support for Lazy parsing for arbitrary types as well!
Before I dig into that, for those unfamiliar with Mojo’s current reflection toolkit I will give a quick overview of how I’ve implemented structured parsing so far.
Reflection based JSON parsing
My approach uses a recursive strategy to traverse the fields of a struct until it finds a type that
conforms to JsonDeserializable. This trait has been implemented for most reasonable stdlib types via the experimental __extension syntax. Any plain structs are simply treated as JSON objects, with the only requirement being that they conform to Movable & ImplicitlyDestructible and for types that include fields with non-trivial destructors they also must conform to Defaultable.
from emberjson import *
struct Foo(Movable, Writable):
var a: Int
var b: Float64
struct Bar(JsonDeserializable, Writable):
var a: Int
var b: Float64
@staticmethod
fn deserialize_as_array() -> Bool:
return True
fn main() raises:
var ob = '{"a": 10, "b": 345.234532}'
var arr = "[1234, 2.435]"
# Foo(a=10, b=345.234532)
# Bar(a=1234, b=2.435)
print(deserialize[Foo](ob))
print(deserialize[Bar](arr))
Lets dig deeper and see how this is actually implemented. The bread of butter is these three stdlib functions.
from std.reflection import (
struct_field_count,
struct_field_types,
struct_field_names,
get_type_name,
)
struct Foo:
var a: Int
var b: Bool
fn main():
print(struct_field_count[Foo]()) # 2
comptime types = struct_field_types[Foo]()
comptime names = struct_field_names[Foo]()
comptime for i in range(struct_field_count[Foo]()):
comptime name = names[i]
# Int a
# Bool b
print(get_type_name[types[i]](), name)
Now that we have a way of programmatically inspecting the fields of a struct without needing
explicit, ahead of time knowledge about what struct we are working with. We can start building the logic for parsing arbitrary Mojo structs.
The deserialize function is a thin wrapper around the _deserialize_impl which either dispatches
a types specific from_json implementation, or recursively calls the _default_deserialize function
to check all the target structs fields for types that do conform to the trait.
fn _deserialize_impl[
origin: ImmutOrigin, options: ParseOptions, //, T: _Base
](mut p: Parser[origin, options], out s: T) raises:
comptime assert is_struct_type[T](), non_struct_error
comptime if conforms_to(T, JsonDeserializable):
s = downcast[T, JsonDeserializable].from_json(p)
else:
s = _default_deserialize[T, False](p)
The JsonDeserialize also houses extra configurations to customize parsing behaviour without
the need for creating a completely custom from_json implementation (for now it only supports
deserializing a struct from an array instead of an object).
comptime _Base = ImplicitlyDestructible & Movable
trait JsonDeserializable(_Base):
@staticmethod
fn from_json[
origin: ImmutOrigin, options: ParseOptions, //
](mut p: Parser[origin, options], out s: Self) raises:
s = _default_deserialize[Self, Self.deserialize_as_array()](p)
@staticmethod
fn deserialize_as_array() -> Bool:
return False
@always_inline
fn _default_deserialize[
origin: ImmutOrigin,
options: ParseOptions,
//,
T: _Base,
](mut p: Parser[origin, options], out s: T) raises:
...
comptime field_count = struct_field_count[T]()
comptime field_names = struct_field_names[T]()
comptime field_types = struct_field_types[T]()
comptime if is_array:
...
else:
p.expect(`{`)
var seen = InlineArray[Bool, field_count](fill=False)
while p.peek() != `}`:
var ident = p.read_string()
p.expect(`:`)
var matched = False
comptime for i in range(field_count):
comptime name = field_names[i]
if ident == name:
if unlikely(seen[i]):
raise Error("Duplicate key: ", name)
seen[i] = True
matched = True
ref field = __struct_field_ref(i, s)
comptime TField = downcast[type_of(field), _Base]
field = _deserialize_impl[TField](p)
...
p.expect(`}`)
Despite how intimidating the type system wizardry may appear. The logic here is actually quite simple. We loop through each field in the object string, read the identifier string, try and match that identifier against the field names in our target struct. Upon finding a match we fetch a reference to that particular field using __struct_field_ref. Then we use downcast to confirm that the type of
the target field is ImplicitlyDestructible & Movable so the type checker accepts it as a parameter in _deserialize_impl where the value of the field will be parsed and returned. The recursive process continues until the entire JSON structure has been parsed.
Lazy parsing
With all that in our toolkit, let’s turn out attention to how we can use this to lazy parse arbitrary structs as well. We already have all we need to perform the final parsing of these structures, so all we need is an additional layer for first collecting a view of the bytes that contain the target value.
Introducing the Lazy wrapper struct.
comptime ReadBytesFn[origin: ImmutOrigin] = fn(
mut Parser[origin]
) raises -> Span[Byte, origin]
comptime ParseFn[T: _Base, origin: ImmutOrigin] = fn(
Span[Byte, origin]
) raises -> T
fn __pick_byte_expect[T: _Base, origin: ImmutOrigin]() -> ReadBytesFn[origin]:
comptime if conforms_to(T, JsonDeserializable) and downcast[
T, JsonDeserializable
].deserialize_as_array():
return _get_array_bytes[origin]
else:
return _get_object_bytes[origin]
@fieldwise_init
struct Lazy[
T: _Base,
origin: ImmutOrigin,
parse_value: ReadBytesFn[origin] = __pick_byte_expect[T, origin](),
extract_value: ParseFn[T, origin] = _deserialize_bytes[T, origin],
](Hashable, JsonDeserializable, JsonSerializable, TrivialRegisterPassable):
var _data: Span[Byte, Self.origin]
@staticmethod
fn from_json[
o: ImmutOrigin, options: ParseOptions, //
](mut p: Parser[o, options], out s: Self) raises:
s = {Self.parse_value(rebind[Parser[Self.origin]](p))}
fn write_json(self, mut writer: Some[Serializer]):
writer.write(StringSlice(unsafe_from_utf8=self._data))
fn get(self) raises -> Self.T:
return Self.extract_value(self._data)
Once again this snippet may seem intimidating, but is actually fairly simple. We have a Lazy struct which takes 4 parameters. T which is the target type for when we need to fully deserialize the value, origin which is the origin of the source data being parsed. Then there is parse_value and extract_value which are each responsible for one of the two steps in the parsing process. parse_value is some function that given a Parser instance, will return a Span containing the byte representation of the target value. For example, if T is some plain struct then the default _get_object_bytes will return all the bytes for the next JSON object.
When the user needs the concrete value they can invoke the get() method which will simply invoke extract_value and return the result. Aliases for the baseline JSON types are already implement like so.
comptime LazyInt[origin: ImmutOrigin] = Lazy[
Int64, origin, _get_int_bytes[origin]
]
As a result users can easily choose particular fields in a struct to be evaluated lazily.
struct Foo[origin: ImmutOrigin](Movable, Writable):
var a: Int
var b: LazyFloat[Self.origin]
fn main() raises:
var j = '{"a": 12, "b": 3.435}'
var f = deserialize[Foo[origin_of(j)]](j)
print(f.a) # 12
print(f.b.get()) # 3.435
Or just lazily parse an entire arbitrary struct.
struct Foo(Movable, Writable):
var a: Int
var b: Float64
fn main() raises:
var j = '{"a": 12, "b": 3.435}'
var f = deserialize[Lazy[Foo, origin_of(j)]](j)
print(f.get().a) # 12
Thanks to @joe for pushing reflection forward in Mojo. I have been having a blast seeing how far I can push these new features!
If anyone would like to try out these new features, you can depend on the emberjson git repo on mojo nightly directly using the pixi-build mojo backend (thank you @duck_tape)
emberjson = {git = "https://github.com/bgreni/EmberJson.git"}