`Indexer` and `Int` vs `UInt`

I think we would all find this conversation more productive if it was focused on evaluating alternative proposals for how var x = 0 should be handled.

So let’s do that.

Below is what I am calling proposal A. I have also written a proposal B later in the thread. @clattner I expect you would prefer proposal B.

Proposal A: The compiler infers the correct type for an integer variable based on how it is used.

First, let’s assume that all of Mojo’s integer types (including “Int”, i.e. ssize_t) are modelled as Scalar[DType.<blah>].

  • If you don’t know what this means, check out the stdlib.

Whenever the compiler encounters an integer literal in a position where a specific Scalar type cannot be immediately inferred, e.g. var x = 0 or var nums = [3, 4, 5], it will assume the literal is being converted to a Scalar of unknown DType, e.g:

  • Scalar[DType.int8]
  • Scalar[DType.int16]
  • Scalar[DType.index]
  • …etc.

Because all of these types are special cases of SIMD, when the compiler sees var x = 0, it already has enough information to resolve statements such as x += 1 to the right implementation of __iadd__, and so on. This keeps the subsequent inference algorithm small and simple.

All the compiler needs to do is figure out which DType values are compatible with the manner in which x is being used. Or said another way: it needs to filter out the DType values that would cause the program to fail to type check.

In many cases the filtering will happen naturally. For example, if x is used in the expression nums[x], and the implementation of __getitem__ requires a DType.index, then x is immediately resolved to a Scalar[DType.index].

As another example, given var c: Int64, if x is used in the expression x += c, then x must be of DType.int64 or larger, because those are the only alternatives where x += c type checks. (You can’t increment an Int8 by an Int64.)

Also very important: The compiler should filter out DTypes that are incapable of storing the literal! So if the number is 1000, the smallest valid DType is int16, and if the number is 8142000000 (the world’s population), the smallest valid DType is int64.

Once all of the uses of x have been analyzed, we will be in one of the following situations:

  1. x has a unique DType, in which case we are done.
  2. x has several valid DTypes, in which case the compiler should default to DType.int64 if it’s valid, or otherwise the largest valid DType that is a signed integer. (The unsigned alternatives are prone to overflow so they are not good defaults.) We could also consider defaulting to DType.index, if it’s valid.
  3. x has no valid DType. This will only happen if the programmer has made a mistake. It’s worth thinking about what a good error message would be, but the simplest error message would be “Cannot infer an integer type: no integer type satisfies all uses of x.” If we wanted to be more helpful, we could list the operation(s) that caused DType.index and DType.int64 to be excluded as possibilities.

I think this approach is very promising. In particular, this means that:

  • As long as __getitem__ is defined to take a DType.index, all integer variables used for indexing will be inferred as being 32 bits on 32-bit platforms, and 64 bits on 64-bit platforms. @clattner , I think this addresses one of your main concerns!
  • var world_population = 8142000000 would be inferred as DType.int64 (because its literal value requires >32 bits), and therefore the statement compiles on all platforms.

Finally, I’d like to emphasise that this proposed inference algorithm would be very fast and simple. This is no full blown “bidirectional type checking” or anything like that. We are just considering a handful of DTypes.

And the average Mojo user doesn’t need to think about any of this. They can write var x = 0, and everything will “just work”. When they hover over x, the type will typically Int64 or Int (naming aside). And in the Mojo tutorial, we will just tell people that “Mojo chooses the right size for your integers”. (We will also explain what those sizes are, and how they are chosen.)

Next up: Check out Proposal B.

I expect some readers will dislike proposal A, owing to the fact that var x = 0 has a type that is decided by “spooky action at a distance”. If that’s you, you might prefer proposal B. Please check it out. :slight_smile: