Many functions in the standard library have to deal with many different cases and likely have many code branches due to this. A possible optimization is to allow users to create parameterized functions such that the programmer can allow the function to make some assumption in order to execute faster.
The most obvious example of this is to allow the list’s append function to have an assume_capacity
parameter:
fn append[assume_capacity: Bool = False](mut self, item: T):
if assume_capacity:
self.unsafe_set(self._len, item)
self._len += 1
else:
self.append(item)
But the optimizations don’t have to stop there. Mojo strings have to support utf-8 encoding, but by adding an assume_ascii
parameter, we can make many functions faster (.upper()
, .lower()
, .split()
, and pretty much any other function that needs to iterate over the entire string). In the upper and lower cases, removing branches required by utf-8 support also allows us to reduce the entire function to 3 SIMD operations. Since mojo’s string type also supports cow, and sso, we can add additional parameters such as assume_static
, assume_heap
, assume_inline
.
These performance improvements are not just hypothetical either: I implemented a very similar string type in rust, and allowed the users to pass preconditions to certain functions: sso_string_rs/src/lib.rs at master · akneni/sso_string_rs · GitHub. These preconditions often resulted in 10-20% performance improvements.
While this works in rust, mojo is already better prepared to support these design choices. In order to be able to pass an enum as a generic parameter in rust, I needed to convert it to an integer and back since regular enums can’t be used as generic parameters. Additionally, there is no way to set defaults for generic parameters, so I needed to split the functions into two, one for the normal operation and another where the user could pass assumptions (though splitting into 2 may be ideal since the assumption based function should be marked as unsafe in idiomatic rust).