@clattner has expressed a desire to get rid of the Indexer trait and standardize on Int as the type used for indexing in Mojo. I disagree and think that, at a minimum, we should standardize on UInt but that preserving negative indexing may be worth it for some use-cases so the Indexer trait may be good to preserve.
I’ll start by saying that I’m a big fan of “make invalid states unrepresentable”. It provides the compiler more information to work with, and helps developers correctly use an API. I also think that allowing negative numbers near indexing which isn’t explicitly designed to handle it is a bad idea, since it often leads to checks like index < length and then a load or store at that offset from a pointer. I think it’s also relevant to look at large projects written in languages which have both signed and unsigned types to determine which are the most common kinds of indexing:
| Project | Count of Int |
Count of UInt |
Int Ratio |
UInt Ratio |
|---|---|---|---|---|
| Linux | 22856 | 52770 | 0.302224 | 0.697776 |
| DPDK | 154 | 4641 | 0.032117 | 0.967883 |
| The LLVM Monorepo | 676 | 189792 | 0.003549 | 0.996451 |
| swiftlang/swift | 72 | 4278 | 0.016552 | 0.983448 |
| AMD’s ‘TheRock’ and dependencies | 3853 | 308603 | 0.012331 | 0.987669 |
| VLC Media Player | 798 | 5861 | 0.119838 | 0.880162 |
| Chromium | 1446 | 122794 | 0.011639 | 0.988361 |
| rust-lang/rust | 9373 | 30989 | 0.232223 | 0.767777 |
| nvidia/TensorRT | 1 | 1439 | 0.000694 | 0.999306 |
| Postgres | 267 | 2052 | 0.115136 | 0.884864 |
| MongoDB | 702 | 96109 | 0.007251 | 0.992749 |
Note: For C/C++ Int is ssize_t and UInt is size_t, for Rust, Int is isize and UInt is usize
For C/C++, run this bash command in the repository root: SSIZE=$(rg "ssize_t" | wc -l); BOTH=$(rg "size_t" | wc -l); printf "ssize_t=%d, size_t=%d, ssize_t_ratio=%f, size_t_ratio=%f\n" $SSIZE $(( $BOTH - $SSIZE )) $(awk "BEGIN { print $SSIZE / $BOTH }") $(awk "BEGIN { print ($BOTH - $SSIZE) / $BOTH }" )
For Rust: SSIZE=$(rg "isize" | wc -l); BOTH=$(rg "isize|usize" | wc -l); printf "ssize_t=%d, size_t=%d, ssize_t_ratio=%f, size_t_ratio=%f\n" $SSIZE $(( $BOTH - $SSIZE )) $(awk "BEGIN { print $SSIZE / $BOTH }") $(awk "BEGIN { print ($BOTH - $SSIZE) / $BOTH }" )
While not the most scientific survey of programmer habits, it still comes down staggeringly on one side. To me, what this says is that if we must have a default, it should be UInt. I’m open to adding more projects to the list, but these are fairly varied projects. Even Rust, which does not let you index with isize and forces you to cast it, still has a strong majority of usize, even with there being more i32 than u32 and more i64 than u64. For those not familiar with Rust, you may only index using usize, which means that indexing with anything aside from usize requires an explicit cast.
There are substantial ergonomics issues with using Int for indexing:
- Every time a library or application calls
lenon a type they do not own and which does not specify it will never have a negative length, it must check for a negative length. This also means that anything generic ofSizedmust check for negative lengths. - A user will get an
abortor other runtime error from passing in a negative value, instead of LSP feedback that the type does not support negative indexing. - Losing half of your address range on an NPU, where you may very well have a single array which takes up >50% of your address space since many are 32 bit, is distinctly not good.
I think that there are also going to be performance impacts from checking for negative values, so I’d like to propose an experiment. Take everywhere in LayoutTensor and NDBuffer that currently takes an unsigned integer or an Indexer and convert it to an Int, then add negative index handling to every public method of those types which takes an Int for indexing, including those that take IndexLists or similar, then use Modular’s benchmark suite to observe the performance impact. I looked into doing it myself but I’m not familiar enough with that part of MAX, especially the internals of tiling, to do it properly.