WARNING: It’s Friday. This was just for fun. Please remember that all benchmarks are lies and not to be trusted. Mostly this was a useful exercise for improving the performance and APIs in ExtraMojo.
REMINDER: The point of this “benchmark” wasn’t to optimize the problem, but just to follow the template to write code that I often end up writing for parsing files.
I have a common contrived benchmark I like to try in various langagues to get a feel for how they might perform in “real” cases vs most other benchmarks out there. The repo is here (note, just the “records” benchmark for now) but I’ll have all relevant code below.
Input Data is a big TSV generated with the following command:
awk 'BEGIN{for (i=0; i<2000000; i++){print "abcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123\tabcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123\tabcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123\tabcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123\tabcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123\tabcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123"}}' > big.tsv
The Python reference implementation is the following:
import sys
class Record(object):
__slots__ = ["name", "count"]
def __init__(self, name, count):
self.name = name
self.count = count
def create_record(vals):
count = len([val for val in vals if "bc" in val[1:4].lower()])
return Record(vals[0], count)
def main():
records = []
for line in sys.stdin:
records.append(create_record(line.split('\t')))
print(sum([r.count for r in records]))
if __name__ == '__main__':
main()
The Rust impl is:
use std::io::prelude::*;
use std::io::stdin;
struct Record {
pub name: String,
pub count: usize,
}
impl Record {
pub fn new(name: String, count: usize) -> Record {
Record { name, count }
}
}
fn create_record(line: &str) -> Record {
let mut iter = line.split('\t');
let name = iter.next().unwrap();
let count = std::iter::once(name)
.chain(iter)
.filter(|s| s[1..4].contains("bc"))
.count();
Record::new(name.to_string(), count)
}
fn main() {
let mut records = vec![];
let mut buffer = String::new();
let stdin = stdin();
let mut input = BufReader::new(stdin.lock());
while let Ok(bytes_read) = input.read_line(&mut buffer) {
if bytes_read == 0 {
break;
}
buffer.make_ascii_lowercase();
records.push(create_record(&buffer));
buffer.clear();
}
let count: usize = records.iter().map(|r| r.count).sum();
println!("{}", count);
}
And finally, the Mojo impl, using my library ExtraMojo to cover all the things that aren’t in the stdlib yet:
from ExtraMojo.fs.file import FileReader
from ExtraMojo.bstr.bstr import SplitIterator, find, to_ascii_lowercase
from memory import Span
alias TAB = ord("\t")
@value
struct Record:
var name: String
var count: Int
fn create_record(line: Span[UInt8]) raises -> Record:
var name = String()
var iter = SplitIterator(line, TAB)
name.write_bytes(iter.peek().value())
var count = 0
for value in SplitIterator(line, TAB):
if find(value[1:4], "bc".as_bytes()):
count += 1
return Record(name, count)
fn main() raises:
var fh = open("/dev/stdin", "r")
var reader = FileReader(fh^)
var buffer = List[UInt8]()
var records = List[Record]()
while reader.read_until(buffer) != 0:
to_ascii_lowercase(buffer)
records.append(create_record(buffer))
var count = 0
for record in records:
count += record[].count
print(count)
Results from hyperfine:
# Run on a MacBook Pro with an M1
❯ hyperfine --warmup 3 '< big.tsv ./mojo/count_lines' '< big.tsv ./rust/target/release/count_lines' 'time < big.tsv python3 ./count_lines.py'
Benchmark 1: < big.tsv ./mojo/count_lines
Time (mean ± σ): 2.070 s ± 0.012 s [User: 1.970 s, System: 0.087 s]
Range (min … max): 2.061 s … 2.100 s 10 runs
Benchmark 2: < big.tsv ./rust/target/release/count_lines
Time (mean ± σ): 2.453 s ± 0.015 s [User: 2.315 s, System: 0.121 s]
Range (min … max): 2.439 s … 2.486 s 10 runs
Benchmark 3: time < big.tsv python3 ./count_lines.py
Time (mean ± σ): 11.492 s ± 0.098 s [User: 11.178 s, System: 0.196 s]
Range (min … max): 11.343 s … 11.615 s 10 runs
Summary
< big.tsv ./mojo/count_lines ran
1.18 ± 0.01 times faster than < big.tsv ./rust/target/release/count_lines
5.55 ± 0.06 times faster than time < big.tsv python3 ./count_lines.py
Mojo is FAST. I did take the time to implement a SIMD to ascii lower, as well as a SIMD search for the first occurrence of a character in a buffer. If I turn both of those off, the perf is a tad slower than Rust. However, I think it’s fair to leave them since writing SIMD is one of the major selling points of Mojo.