Agree on 1, 3 with no comments.
To +1 Nickâs point, len(x)-1 returning some huge number or trapping is a pretty big non-starter for usability. Furthermore, len needs to work on the Lenable trait (or some other name, which is out of scope for this thread), so we need a consistent type across conforming collections. It needs to be signed.
On this point, I think that code which has overflow in this situation is likely going to be erroneous even with Int. If that len is being used for indexing, List()[-1] will raise a range error. Many other functions that you would want to pass a list length to, such as realloc (if resizing the list), donât have a reasonable behavior when handed -1.
What I see is the largest advantage of UInt for cases like this is that it lets the user turn on -fmake_my_code_slow_but_help_me_debug or whatever we decide to call it. The compiler canât automatically insert âdid this go negative?â checks after every subtraction of Int because there will be many places where that behavior is what the programmer wanted, but it can insert code to check for unsigned int overflow and underflow, made more efficient on x86 and ARM using hardware support (RISC-V gets asm version of the portable C code). We can do what Rust does and make overflow checks the default in optimized mode, meaning that the default behavior for Mojo is still wrapping, but leave debug mode as a place where you accept that code runs slow but the compiler will help you with debugging.
On 4 + 6, I think that having a type called Int may be the problem. We have generations of programmers trained to use that as the default data type for everything, even when itâs arguably not appropriate such as for a value between 0 and 100. Rust managed to completely avoid the the problems you are concerned about, and I think that not having an int type is how Rust did it. If we take the way Rust handles things, and then use requires to make safe, widening casts implicit (potentially non-sign changing to avoid ambiguity), we shouldnât have a lot of extra casts from devs using the datatype that fits the expected range of their data, devs can get help from the compiler in tracking down when those invariants are violated, and we can get rid of a lot of x >= 0 bounds checks in functions. The data showing datatype usage is my attempt to show that systems programmers are mostly using unsigned values where sizes are concerned, and that using Int for that will add friction to people who want to use the type system express invariants.
After reading more, I also have a new concern. Some code may simply stop working on 32 bit targets if the default integer type is not fixed-width (or decaying to fixed with, see below) or some form of arbitrary precision. Ex: var world_population = 8142000000 will compile on a 64 bit system but not a 32 bit one.
As pointed out above, Mojo already has a default integer type, it is Int. Using things other than that will require casting, and that casting should have to pay for itself.
We could also make the default type something has both a materialized value and a parameter value, and will implicitly cast to any integer type which can represent it.
fn _value_fits_in_dtype[dtype: DType, value: IntLiteral]() -> Bool:
return Scalar[dtype].MIN <= value and Scalar[dtype].MAX >= value
fn _minimum_dtype_for_value[value: IntLiteral]() -> DType:
alias dtypes = [
DType._uint1, DType._uint2, DType._uint4, DType.uint8, DType.uint16, DType.uint32, DType.uint64, DType.uint128, DType.uint256,
DType.int8, DType.int16, DType.int32, DType.int64, DType.int128, DType.int256
]
@parameter
for dtype in dtypes:
if _value_fits_in_dtype[dtype, value]():
return dtype
constrained[UInt256.MAX >= value and Int256.MIN <= value, "Unpresentable value"]()
return DType.bool # make the compiler happy
struct MaterializedIntLiteral[value: IntLiteral]:
alias BackingType = _minimum_dtype_for_value[value]()
var backing: Scalar[Self.BackingType]
fn __init__(out self):
self.backing = value
fn into[dtype: DType](self) -> Scalar[dtype]: # Used in constructors for SIMD, UInt and Int that take MaterializedIntLiteral
constrained[_value_fits_in_dtype[dtype, value](), "Unable to safely cast to dtype"]()
return self.backing.cast[dtype]()
I think that having a type like this, which acts as an IntLiteral that is not arbitrary precision, is worthwhile to reconsider once we have restrict and can use implicit constructors to give good error messages if someone tries to put a value somewhere it doesnât fit. Doing this also means no casts when you have var x = 10 and try to pass it to something like a String.find_ascii_char. I think that users may find this level of âcompiler just does the right thingâ desirable.
I donât see a reason to /remove/ Indexer, but we shouldnât use Some[Indexer] everywhere as âthe input integer typeâ for reasons stated above.
I know you arenât a fan of syntactic sugar, but I think that this may be a good use-case for it. âThing which implements these traitsâ is something that I think will come up a lot, and even if Some shows that we technically can do it in library code, perhaps we shouldnât, in the same way that we can technically implement structs in Library code using StaticTuple but really should. Rust has found impl to be reasonably acceptable for static dispatch, with dyn for dynamic dispatch. The involvement of a keyword also signals that something different from a normal type bound is happening.
To reiterate, I donât consider this design point to be controversial.
In my opinion, using Int for places where there is not good behavior when it is negative goes against the idea of making invalid states unrepresentable. It also has performance considerations as every function that wants a user-defined index must now check that value is not negative or risk UB/crashes. Right now, despite taking Int in many places, LayoutTensor considers values less than zero to be invalid in many cases, representing this with comments, which puts Mojo exactly back to where C++ is as far as type safety, or even worse since many C++ libraries use size_t.
This combination of âless type correctnessâ, and a choice between less performance (double the comparison ops) or less memory safety (unchecked indexing in one direction), and being harder to debug when things go wrong makes this an supremely unattractive idea from my point of view. To me, given all of this, if all of the indexing in Mojo uses Int, that is not only a type correctness regression from Rust, itâs a regression from C and C++ where libraries generally indicate that negative values are not allowed using unsigned or size_t. Then, Mojo code gets to choose whether to be a memory safety regression or a performance regression vs Rust code in whether it checks negative values or not. You say that checked operations are expensive on the GPU, but so is doubling the number of compares needed to uphold memory safety, since memory safe indexing is going to need to stall the whole lane for an extra 2 operations (cmp + and) compared to unsigned indexing.
In my opinion, writing .cast[dtype]() or Int(len(arr)) once every ~70 lines of code is a much better option than forcing the whole ecosystem to face more than doubled overhead for memory safety.