`Indexer` and `Int` vs `UInt`

My experience is that this is not much of an issue in the real world. I say this as someone that moved from Python to Go.

For one thing, as a beginner in the new language your project rarely gets enough use to reach this limit before you’ve understood what’s going on, for another you’re expected to learn some of the semantics of the new language.

I’m not a fan of designing languages so that it can never be misused, this is an impossible problem and makes a language more complicated than it should be. I also believe programmers should have more responsibility, especially when trying to learn system-level languages.

The issue manifests when you’re writing code that needs to be portable across platforms. For example, maybe you’re trying to write code that runs on both on a 64-bit CPU and a funky AI accelerator where each core has a 32-bit address space. I would anticipate that most of the Go community is writing code that runs on 64-bit CPUs. In such a world, the fact that int has a platform-dependent size wouldn’t be the source of many bugs, because int is “de facto” 64 bits.

This is not a great argument IMO. If we were to apply that argument to C++, we’d be saying “C++ isn’t at fault; you just need to be more responsible with it”. Most people would disagree with that sentiment.

As a language designer you have a responsibility to help your users build bug-free applications, because software bugs have a signficant economic cost. I’m not satisfied with telling programmers to “be more responsible”, I want to help them succeed.

The points highlighted by Nick are valid and important considerations. There is no point in limiting what a language can do because a programmer has to learn something new. I feel very strongly about this.

As an example, look at Rust - although there are a lot of difficult/new concepts, most people are willing to learn these concepts - look at how widely Rust (with all of its issues) is being used across the industry.

Have sensible defaults that feel familiar, but have specific types that allow advanced users to take full advantage of what the language is capable of.

The sensible defaults should be backed by correctness and soundness - any misconceptions a programmer has coming from another language should be caught by a compiler warning/error.

I strongly disagree with this.

The issue I have with Python, having worked with it for 6+ years professionally, is that you are allowed to shoot yourself in the foot, with issues (hopefully) raised at runtime. Then community tools have to work overtime to give programmers warnings at dev-time, because there is limited feedback the Python interpreter gives you about the correctness of your program compared to Rust’s compiler.

I am not saying Mojo should be Rust - what I am saying is that like Rust, Mojo should have a compiler that is able to warn users of issues, and teach users what the safe approach is in a given scenario - of course this takes time to build.

While I disagree on the names, I think that you are right, the name Int has 40 years of legacy attached to it and for far too many people int is 32 bits, even if Mojo is technically correct as far as what int is supposed to be. I would probably take isize and usize from Rust, since they look just weird enough that people coming from high level languages will stop and think, but familiar enough to people coming from systems programming languages that it’s not a real stumbling block to them.

We have 40+ years of “int-defaultism” and I think that’s what causes the issues in Swift. If we avoid having something called Int, we avoid that problem entirely. In the manual, we can tell people to use Int64 if they are unsure of what type to use and it’s not related to indexing.

“Just program better” is how we got numerous huge vulnerabilities. I believe the general consensus is that a skilled developer can solo develop a project and have it be fine, but as soon as you add another person things start to go sideways. On a corporate team of 50 people with junior developers younger than the multi-million line of code codebase with 20 years of mixed coding styles, “nearly impossible to misuse” is one of the most valuable things a library or API can be, right after “functioning correctly”.

There is nothing more frustrating than an insufficiently documented API where every time you use it wrong your PC crashes or you generate “magic smoke” out of a $1000 PCIe card.

Strong agree on compile-time correctness.

I’d rather take twice as long to write it than get a call a 2am that everything has broken because of some silent bug.

I was responding to your website example, but if you’re writing code where such portability across platform bitwidth is important, then you would take care to factor platform portability into your code rather than depending on whatever default we go with here. In my experience, no solution ever covers 100% of every use case.

C++ is a mess, and most of its problems come from bad design and its C roots. Plenty of newer languages have tried to fix things, but none have caught on as a real replacement—not even Java, which has a different purpose entirely. The industry just keeps pouring money into C++ to make it better without giving up any of its control. In fact, my impression of C++ is that the responsibility and flexibility it gives to programmers is the only reason many continue to tolerate the language.

As a language designer, you have no such responsibility; any pursuit in that direction is likely to result in a convoluted mess that nobody is going to use for any real-world project. It is impossible to write bug-free code in any reasonable scale, and most bugs result from logic rather than any inherent flaw in a language.

Strong disagree.

The point of language design is not to stop programmers from making logic-mistakes. Logic mistakes/bugs will happen in every language.

The point of language design is the to help programmers to create correct and sound software. The point is to prevent bugs that are caused by soundness/correctness issues.

Note that the CISA (America’s Cyber Defence Agency) had publish a report stating the development of new products in service of critical infrastructure should not use C++

I agree. I must admit that I don’t understand where the talk of limiting the language is coming from.

That is fine.

First, there is no way for Python to achieve Guido’s initial goal of being a glue scripting language without the dynamism; language semantics should fall out of the goals it is trying to achieve more than anything else.

Secondly, I have not seen a useful language that does not allow programmers to shoot themselves in the foot, and my list includes Rust. You could say that Rust shifted the paradigm by making it obvious and explicit where exactly you can potentially shoot yourself, but control and flexibility are non-negotiables in certain kinds of applications.

I have to say that I don’t understand some of the responses to my comment. Nobody is arguing against safety, soundness, and correctness. However, my experience using Go, where int is pervasive, is that it is fine in the vast majority of places.

Secondly, I don’t understand the argument against giving programmers responsibility - how is this an argument? compilers can help in some things, but users have to take responsibility to understand how the systems they’re programming work. This is not controversial.

The issue manifests when you’re writing code that needs to be portable across platforms. For example, maybe you’re trying to write code that runs on both on a 64-bit CPU and a funky AI accelerator where each core has a 32-bit address space. I would anticipate that most of the Go community is writing code that runs on 64-bit CPUs. In such a world, the fact that int has a platform-dependent size wouldn’t be the source of many bugs, because int is “de facto” 64 bits.

I was responding to your website example, but if you’re writing code where such portability across platform bitwidth is important, then you would take care to factor platform portability into your code rather than depending on whatever default we go with here. In my experience, no solution ever covers 100% of every use case.

Or, we can make the default indexing type one that properly handles platform differences, and you only use that type for indexing things or storing the size of things instead of as a general purpose type.

This is not a great argument IMO. If we were to apply that argument to C++, we’d be saying “C++ isn’t at fault; you just need to be more responsible with it”. Most people would disagree with that sentiment.

C++ is a mess, and most of its problems come from bad design and its C roots. Plenty of newer languages have tried to fix things, but none have caught on as a real replacement—not even Java, which has a different purpose entirely. The industry just keeps pouring money into C++ to make it better without giving up any of its control. In fact, my impression of C++ is that the responsibility and flexibility it gives to programmers is the only reason many continue to tolerate the language.

The most common reason I’ve heard for still using C++ is “because we have no other option”. Mojo should try to be that option for as many cases as possible. Rust missed a few important things that some developers really, really need, so they’re still stuck on C++. The piles of heavily optimized SIMD libraries are another reason, but in Mojo every library can be an optimized SIMD library.

As a language designer you have a responsibility to help your users build bug-free applications, because software bugs have a signficant economic cost. I’m not satisfied with telling programmers to “be more responsible”, I want to help them succeed.

As a language designer, you have no such responsibility; any pursuit in that direction is likely to result in a convoluted mess that nobody is going to use for any real-world project. It is impossible to write bug-free code in any reasonable scale, and most bugs result from logic rather than any inherent flaw in a language.

Rust seems to see a decent amount of use in such niche and academic projects as the Windows Kernel, Linux, Android, Chromium, and Firefox, and Epic Games is in the middle of building tons of stuff on top of Verse, which makes it impossible to have a lot of kinds of distributed system errors, for making multiplayer user-created content run well. Systems languages are, in my opinion, more bound to this than normal languages because systems languages don’t get to ignore that code can kill people, destroy their lives, or destroy companies.

Yes logic bugs are many of the bugs and they are nearly impossible to prevent. However, there’s a reason that Tony Hoare calls null, “His Trillion Dollar Mistake”, it added a hidden extra state to every value in so many languages, which developers would frequently forget to check, and he estimates the damage caused by his decision to add a nice little feature because it was convenient to add is at least one trillion USD. How much do you think forgetting to lock mutexes have cost humanity? Race conditions between threads? Implicit integer casts? Array to pointer decay?

Designing a programming language means that any decision you make can have a very wide reaching impact if the language is successful, and since we have the knowledge of how to prevent bugs, I think it’s irresponsible to not use it.

First, there is no way for Python to achieve Guido’s initial goal of being a glue scripting language without the dynamism; language semantics should fall out of the goals it is trying to achieve more than anything else.

Sure there is, use HM or another type inference algorithm. Luma doesn’t even need you to specify struct fields for type safety.

Secondly, I have not seen a useful language that does not allow programmers to shoot themselves in the foot, and my list includes Rust. You could say that Rust shifted the paradigm by making it obvious and explicit where exactly you can potentially shoot yourself, but control and flexibility are non-negotiables in certain kinds of applications.

Yes, they are non-negotiable. The goal is to make the control and flexibility as safe as you can make it, so that you the human only have to consider a small slice of code when writing a safe abstraction. We’ve shown time and time again that a single person working on a small piece of code can produce safe low-level code if they are very careful, the problems tend to start when the second person joins the project. By allowing for safe abstractions, we can make APIs which are impossible to use in an unsafe manner and reduce the amount of code in the “danger zone”. Mojo, with its more advanced type system, can make smaller or eliminate some of those regions.

I have to say that I don’t understand some of the responses to my comment. Nobody is arguing against safety, soundness, and correctness. However, my experience using Go, where int is pervasive, is that it is fine in the vast majority of places.

Go also has lots of “this value must be >= 0” doc comments you have to read. Those can and should be encoded in the type system.

Secondly, I don’t understand the argument against giving programmers responsibility - how is this an argument? compilers can help in some things, but users have to take responsibility to understand how the systems they’re programming work. This is not controversial.

Software developers have, as a group, proven that they can’t handle the responsibility, especially at the scale of modern programs. Systems-level developers or otherwise.

:100:. This is a prerequisite for writing correct platform independent code. And Modular’s mission is to enable code to run across, CPUs, GPUs, and other upcoming accelerators, so I expect they will take this seriously.

I’m trying not to bikeshed too much, but if we wanted a name along those lines, Mojo’s terminology (inherited from Python) seems to be converging on “length” rather than “size”, so we should consider whether Len makes sense. (With a ULen option, for those that need it.)

var total: Len = len(x) + len(y)

var i: Len = 0

There are a few other alternatives to Len/Size we could consider, e.g. Count or Idx.

I’m not sure whether I would prefer one of those alternatives to a suffixed Int . But they’re all better than the status quo IMO. We need a name that people won’t mistakenly interpret as the “default” or “standard” integer type. Each type serves a separate purpose, and if you choose the wrong type you’re going to end up with code that breaks when it’s deployed on certain platforms/accelerators. (As I showed earlier.)

Mojo does have a default integer type - it is the type IntLiteral materializes to.

var x = 8  # x has the default integer type

True—integer literals default to something. What I meant was, there is no integer type that the author of a struct should choose “by default” when they want to store an integer quantity. The correct type depends on the use case. I’ve edited my post to clarify.

They would prefer the “IntLiteral materialization type” because:

  1. That is the default of mojo IntLiteral - seems to be preferred by mojo for some reason, so probably a good default.
  2. to avoid explicit casting back and forth to and from their type and the default. (Both direction are important accepting the type and returing it).

They would be making a mistake. I already presented an example of where choosing Int for everything will cause your app or library to malfunction on 32-bit platforms/accelerators.

You know—maybe we’re not even looking at this problem the right way. Maybe we don’t need Int at all. Maybe the size and capacity of List (and therefore it’s len()) should default to Int64, and so should literals, e.g. x = 4. Then, if you’re working in a domain where you know that your container is (significantly) smaller than 4GB, you can write List[cap=Int32], which causes the list’s internal size and capacity to be stored using Int32, and likewise, len() would return Int32. This is valuable even on 64-bit platforms, because it reduces the memory consumption of both the list and indices into it. All of the container types in the stdlib would be configurable like this. We could even define aliases like List32, if we think that would be helpful.

We could probably tie this design into the design for custom allocators, so that when you define a function that is generic over an allocator, it’s also generic over the list’s maximum capacity etc.

IMO, we should give this serious thought! This could simplify Mojo a lot, by saving users from needing to understand the subtleties of Int (whatever its name is). We probably still need an equivalent of size_t to define memory allocators etc, but the average Mojo user wouldn’t need to know about this type.

Of course, there are probably reasons why eliminating Int is a bad idea, but we won’t know until we explore this further. I doubt anyone has given this idea much thought before. Languages like C and Go were designed without generics in mind, so this wasn’t an option they had available.

It will probably only take 10 minutes to find an example of why this is a bad idea. Any takers? :slight_smile:

I feel like this discussion is looping, but we’re not making new points. To recap a few things:

  1. To some of the points above, we need logic that runs on cpus and gpus and other accelerators. We can’t use BigInt as the “default type” because of efficiency, but also because it requires malloc, which isn’t practical on accelerators.

  2. To +1 Nick’s point, len(x)-1 returning some huge number or trapping is a pretty big non-starter for usability. Furthermore, len needs to work on the Lenable trait (or some other name, which is out of scope for this thread), so we need a consistent type across conforming collections. It needs to be signed.

  3. To -1 a point above, we can’t just use int64. This isn’t acceptable on 32-bit targets, e.g. some GPUs, because it doubles register use and wastes compute - not to mention embedded systems.

  4. There are comments upthread about Mojo already requiring tons of casting all over the place, but this is the problem we’re seeking to improve, not the ultimate destination.

  5. As pointed out above, Mojo already has a default integer type, it is Int. Using things other than that will require casting, and that casting should have to pay for itself.

  6. I’ll reiterate again that Swift faced pretty much exactly the same set of problems and ended up with a good solution that led to consistent large scale system design. I haven’t heard anyone point out problems with its design in this area, and I don’t see any reason that UInt is better than Int in any of these discussions other than “C++ programmer tradition”.

  7. I agree BigInt/Nat types are useful, Mojo should totally have one, but that doesn’t really affect this sort of thing. Similarly, I don’t see a reason to /remove/ Indexer, but we shouldn’t use Some[Indexer] everywhere as “the input integer type” for reasons stated above.

-Chris

I didn’t say this :confused:. I only said we should encourage people to think about the integer type that they’re using. (And if they decide the number can be arbitrarily big, they should choose BigInt. But that would be very rare—I don’t think I was clear about that.)

I also understand this. I am moreso wondering whether there is a different way we can abstract over the size of certain integers, other than the 1950s approach of making the size of an Int implementation-defined. It’s possible that this is still the best approach in 2025, but maybe it’s not. I don’t believe systems languages have ever reconsidered this assumption. Maybe if we step back, we can find another alternative?

Agree. But how to decide when to use Some[Indexer] and when not?

I like the idea mentioned in a previous post of providing an alias to simplify the parameter even more if we stick to the current design:

alias Index = Some[Indexer]

The Idea of @Nick to parametrize collections over the Indexer type is rather close to what we have now with the Indexer type as a parameter in the function signature.

The advantage of the current design is that the collection itself is not parametrized over the Indexer. So no need to convert a collection to another type just to use a different index type.

The advantage of @Nick idea is that functions that return an index (like len()) or store an index (like capacity) are parametrized too. So no lock in to a specific default indexer.

struct List[T: …, I: Indexer = Int]