Mojo-yaml v0.1.0 Lite - Native YAML Parser

mojo-yaml v0.1.0 Lite - Native YAML Parser

From the author (@mjboothaus) of mojo-toml, mojo-ini, and mojo-dotenv comes mojo-yaml, a native YAML Lite parser for Mojo with zero dependencies, covering ~80% of common YAML use cases.

What it does

Parses block-style YAML configuration files into native Mojo structures:

  • Nested mappings (dicts) and sequences (lists) of any depth
  • Inline list-mappings: - name: Alice\n age: 30
  • All scalar types: int, float, bool, null, string (quoted/unquoted)
  • Comments anywhere with #
  • Type-safe value access with .get() and .get_at()
  • Clear error messages with line/column context
  • 91 tests ensuring reliability (100% passing)

Installation

git clone https://github.com/DataBooth/mojo-yaml.git
cd mojo-yaml
pixi run test-all
pixi run example-all  # See working examples

Coming soon to the modular-community channel.

Usage

Basic Parsing:

from yaml import parse

fn main() raises:
    var config = parse("""
server:
  host: localhost
  port: 8080
  debug: true
users:
  - name: Alice
    role: admin
  - name: Bob
    role: user
""")
    
    # Type-safe access
    var server = config.get("server")
    print(server.get("host").as_string())   # localhost
    print(server.get("port").as_int())      # 8080
    print(server.get("debug").as_bool())    # True
    
    # Navigate sequences
    var users = config.get("users")
    var first_user = users.get_at(0)
    print(first_user.get("name").as_string())  # Alice

File I/O:

from yaml import parse
from pathlib import Path

fn main() raises:
    var content = Path("config.yaml").read_text()
    var config = parse(content)
    # Work with parsed data

What’s in v0.1.0 Lite

  • Core Parser: Lexer (518 LOC) + Parser (300 LOC) + YamlValue (318 LOC)
  • Coverage: ~80% of common YAML patterns (block-style only)
  • Tests: 91/91 passing across 15 test suites (100%)
  • Examples: 4 working code examples with real-world fixtures
  • Documentation: Comprehensive README, CHANGELOG, and COMPATIBILITY.md

Real-World Testing

:white_check_mark: Works: .pre-commit-config.yaml, custom configs
:warning: Requires quoting: Multi-word strings, version numbers
:cross_mark: Not supported: Flow-style [...], empty values, anchors/aliases

See COMPATIBILITY.md for detailed feature matrix.

Compatibility Tips

:white_check_mark: Do:

version: "1.0.0"           # Quote version numbers
description: "My app"      # Quote multi-word strings  
host: localhost            # Single words OK unquoted
list:
  - item1                  # Use block style
  - item2

:cross_mark: Don’t:

version: 1.0.0             # ❌ Multiple dots fail
description: My app        # ❌ Spaces in unquoted strings
list: [item1, item2]       # ❌ Flow style not supported  

Feature Comparison

Feature Support Notes
Nested Mappings :white_check_mark: Full Any depth
Nested Sequences :white_check_mark: Full Any depth
Inline List-Mapping :white_check_mark: Full - name: value with continuation
Scalars :white_check_mark: Full int, float, bool, null, string
Comments :white_check_mark: Full # anywhere
Quoted Strings :white_check_mark: Full "text", 'text'
Unquoted Strings :warning: Single word Must quote multi-word
Version Numbers :warning: Must quote "1.0.0" not 1.0.0
Flow Style :cross_mark: Not supported [1, 2], {a: b}
Empty Values :cross_mark: Not supported key: → use key: null
Anchors/Aliases :cross_mark: Not supported &anchor, *ref
Multi-Document :cross_mark: Not supported ---
Writing YAML :cross_mark: Not implemented Reader-only v0.1.0

Why “Lite”?

Full YAML 1.2 is complex (~84-page spec). YAML Lite focuses on the ~80% use case:

  • :white_check_mark: Configuration files (pre-commit, custom configs)
  • :white_check_mark: Data serialization (block-style only)
  • :white_check_mark: Nested structures (any depth)
  • :cross_mark: Advanced features (anchors, flow-style, multi-doc)

This provides immediate value while keeping implementation maintainable.

Roadmap

Possible enhancements for v0.2.0+:

  • Support version number patterns (multiple decimal points)
  • Handle empty values gracefully
  • Flow-style array support [1, 2, 3]
  • YAML writer functionality

Not planned:

  • Anchors/aliases (complex, rarely needed)
  • Multi-document streams (niche use case)
  • Literal/folded blocks (marginal utility)

Links

Related Projects

Together these provide comprehensive configuration file support for Mojo! :bullseye:

Acknowledgements

Open source project with initial development sponsored by DataBooth, building high-performance data and AI services with Mojo.

Feedback and contributions welcome!

2 Likes

You’re on fire

1 Like

Awesome! Maybe XML Parser next?:slightly_smiling_face:

Hmmm… apparently significantly more effort than toml… maybe my parsing days are done :slight_smile:

Very cool! Out of curiosity, have you looked at performance at all?

There is now some initial (not comprehensive) benchmarking that I have included the outputs in the README.md

Indicatively at a high level the benchmarks demonstrate mojo-yaml’s 9-21x performance advantage over Python’s pyyaml.

Caveat - this is just a “lite” implementation in Mojo for now - as noted (also in README) more intricate yaml features are not implemented.


Maybe a Mojo BenchmarkSuite to mirror the testing TestSuite? :slight_smile:

Maybe a Mojo BenchmarkSuite to mirror the testing TestSuite?

Today’s you’re lucky day! benchmark | Modular

Hi @owenhilyard

Actually I basically simultaneously discovered benchmark in the stdlib :laughing:

I wanted to share a complementary approach, inspired by TestSuite, that I have prototyped in the following repo:

Mojo BenchSuite: TestSuite-style patterns for benchmarking :fire:

Repo: GitHub - DataBooth/mojo-benchsuite: A lightweight, TestSuite-inspired benchmarking framework for Mojo

Benchmarking is clearly important to the Modular team (the stdlib module is excellent!). This project explores making it as frictionless as TestSuite.

Key additions over stdlib benchmark:

:bullseye: Suite-level organisation — Group and run multiple benchmarks
:bar_chart: Environment capture — OS/CPU/version for reproducibility
:counterclockwise_arrows_button: Adaptive iterations — Auto-adjust for reliable statistics
:floppy_disk: Multiple outputs — Console, markdown, CSV with timestamps

Example

from benchsuite import BenchReport

fn my_algorithm():
    pass

def main():
    var report = BenchReport()
    report.benchmark[my_algorithm]("my_algorithm")

Future: Exploring TestSuite.discover_tests[__functions_in_module__()]() pattern for auto-discovery of bench_* functions.

Any feedback and ideas welcome!