The case for explicit `read` variable bindings

With the introduction of ref variable bindings (see variable bindings proposal) we gained two new powerful features concerning immutable references:

  • The ability to implicitly declare (deeply) immutable values
  • Binding variables to immutable references

I think it would be a useful enhancement to allow for explicit read value bindings to improve code readability and ease adoption.

Take this example:

import time

fn as_read[T: AnyType](read x: T) -> ref [ImmutableAnyOrigin] T:
    """Helper function to return a value as a immutable reference."""
    return x


fn print_elapsed_time(read value: Int):
    print("elapsed time =", time.perf_counter_ns() - value, "ns")


fn main() raises:
    # immutable value
    ref start_time = as_read(time.perf_counter_ns())
    # start_time *= 2  # error: start_time is immutable

    # deeply immutable value
    ref list = as_read[List[List[Int]]]([[0, 1], [2, 3]])

    # immutable reference binding
    ref list_val = list[0][0]
    # list_val += 1  # error: list_val is immutable
    print(list_val)  # ok

    # sub_list is an immutable reference binding
    for sub_list in list:
        # sub_list[0] += 1  # error: sub_list is immutable
        print(sub_list[0])  # ok

    print_elapsed_time(start_time)  # ok

By introducing explicit read value bindings, the code would be self explaining and the as_read() helper function would not be needed anymore:

import time

fn print_elapsed_time(read value: Int):
    print("elapsed time =", time.perf_counter_ns() - value, "ns")


fn main() raises:
    read start_time = time.perf_counter_ns()
    # start_time *= 2  # error: start_time is immutable

    read list = [[0, 1], [2, 3]]

    read list_val = list[0][0]
    # list_val += 1  # error: list_val is immutable
    print(list_val)  # ok

    for read sub_list in list:
        # sub_list[0] += 1  # error: sub_list is immutable
        print(sub_list[0])  # ok

    print_elapsed_time(start_time)  # ok

Somehow this is just “syntactic sugar” for already available functionality and can be introduced at any time without breaking changes. Nevertheless I think it would be a nice enhancement and fit well with the variable bindings proposal and the current state of the ownership model and argument conventions.

4 Likes

Strongly agree with this :slight_smile:
It improves readability immensely!

Being able to be explicit when needed, is also part of expressivity.

1 Like

Strong +1 from me as well. I would even like mut to be able to be explicitly set as well. I find using ref should be more for cases where the mutability needs to be inferred for generic functions. I think being explicit about the intent is important.

4 Likes

I think it’s also worthwhile to consider people jumping between languages. It’s much easier to interpret read/mut than remember what the default is or what inference rules are. However, I think that we should keep read and mut for references. I’d prefer a const or similar for owned but immutable so that we can keep that consistency.

5 Likes

allowing explicit mut could resolve this inconsistency:

def my_append(my_list: List[Int]):
    my_list.append(0)  
    # Error need to use mut/owned:

def my_append(mut my_list: List[Int]):
    my_list.append(0)  # now it works 

def my_append(ref my_list: List[Int]):
    my_list.append(0)  # error my_list could be immutable since it's parametric


# where as in for loops
def f():
    for my_list in nested:
        my_list.append(0)  # Error use ref/var instead of mut/owned?
    
    for ref my_list in nested:
        my_list.append(0)  # now it works and the ref is mutable

I think if we want to unify arg convention and variable convention we can use:

Convention Read Mutate Move Delete
own :white_check_mark: :white_check_mark: :white_check_mark: :red_question_mark:linear
ref :white_check_mark: :red_question_mark:inferred :cross_mark: :cross_mark:
mut :white_check_mark: :white_check_mark: :cross_mark: :cross_mark:
read :white_check_mark: :cross_mark: :cross_mark: :cross_mark:

I think var comes from the word vary/variable, so it can be easily confused with mut which means mutate - both convey the concept change, so using own instead of var will avoid this confusion.

Both mutable reference and owned variable are mutable/changeable/variable the difference is ownership which the own keyword makes obvious unlike the var keyword.

The case for adding both read - immutable reference and const - immutable owned is to ensure you received a reference and not an owned value, so you would get an error, but I’m not convinced how common it is and if it justify the complexity of adding yet another convention where python has none.
using read for both similar to for loops solves the immutability concerns.

Related: changing the default to read everywhere

2 Likes

A tiny module that provides reference bindings with explicit mutability in Mojo.

2 Likes

Hi Christoph,

Thank you for raising this, I’m sorry for the delay responding to this - tons of other things going on.

Your proposal is well thought out and presented. I see two concerns about it:

  1. read and mut are currently “soft” keywords, not a hard keywords. Adding support for explicit read bindings would prevent using them for other purposes, e.g. would break your explicit ref package.

  2. More significantly, adding this just adds more conceptual complexity to the language. You’re right that it adds more power and control (which many people would appreciate) but it also increases the size and scope of what needs to be taught.

Channeling my history with Swift, I’m concerned about #2. In contrast to Swift and C++ and Rust, the go language community has done a pretty good (and intentional!) job of holding back language complexity, even though it is always useful. For example, see this recent great blog post about it.

I don’t know how Mojo will make these sorts of power vs complexity tradeoff decisions in the fullness of time, but I would really like to “late bind” them. Let’s get more scale experience with a smaller language before we go and fill in all the theoretical bells and whistles that could be syntactically nice. I’d like the Mojo team to focus on building out the power of the generics system (eg. requirements, extensions etc) which are much more fundamental “big rocks” that need to be in place.

-Chris

5 Likes

Hi Chris,

Thank you for your detailed answer and reasoning, which I appreciate very much.

I totally agree that there are more important and fundamental features to be introduced than two more variable binding keywords.

Concerning complexity and teachability of the language I agree that less concepts are a plus.

Nevertheless I still think that having a uniform set of keywords for variable bindings and argument conventions would make Mojo less conceptual complex (read, mut, ref, var). But I agree that this has no priority right now.

-Christoph

PS: I liked to read the post on „Go as a 80/20 language“. Thank you for the link.

3 Likes

I agree. My thinking on this is this - if 80% of the language can be kept simple at the cost of experts working on the extreme writing more code, then it is a good price.

Go has one problem with oversimplicity: the fact that it prevents experts from achieving certain things even if they’re willing to write more code, an issue Mojo doesn’t have.

I think that there are a lot of places where giving that flexability to experts will require making the common path more complex. For example, allocations must have the ability to fail, otherwise databases and other kinds of applications where a crash has a large blast radius aren’t viable in Mojo.

Go’s approach to function coloring, the paint bucket method, has performance costs if you don’t want async in a particular area.

Movable exists because of self-referential structures, something that many languages simply don’t have.

There is an inherent complexity in having advanced features available, and I think it’s better to put energy into making sure the language is internally consistent so that the rules make sense, and then making educational materials so people can learn. There are some features, like async cancellation, where I don’t see a way around needing a bunch of extra code if you want to use it. There is also the issue of libraries which don’t suit expert needs. You can very quickly fragment the ecosystem if popular libraries take shortcuts that render them unusable for some set of usecases. For instance, Tokio needing a heap allocator has forced an enter alternative async ecosystem to exist for embedded Rust.

I’d personally rather have a bit of a learning curve on the language if it means that once you’re over that curve it’s nice to use. Rust is an example of this where once you learn the borrow checker and the type system it’s quite nice to use even for things that don’t have strict performance requirements.

2 Likes

For some things here, I agree; for others, I don’t necessarily agree.

For example, what do you mean by “allocation must fail”? Or should I say, what do you expect should happen when an allocation fails? Should the allocator raise an exception? return an error code? Return a null pointer?

Secondly, I don’t think it is an issue to have two async runtimes targeting different use cases? I think a problem arises when code written for one runtime doesn’t work on another; in that case, it is an API design failure. Whether Tokio uses a heap allocator or not should be an implementation detail that isn’t reflected in its API; async code shouldn’t care about runtimes. For example, in Python, I can switch async runtime from Asyncio to Uvloop without changing any existing code.

Finally, while I agree that some problems are inherently complex and require complex libraries, I do not think the core language itself has to be complicated. I’m in the camp of “push the complexity into libraries”; everyone has to learn the core language, but not everyone needs to learn some libraries.

Hi,

I hope everyone is doing well.

I believe the core idea is to make the language more consistent, by aligning value-declaration semantics with function-signature semantics.

I think the language needs to be consistent even between application code and library code, meaning that a library-user should be able to look at library-code and still be able to reasonably understand how things are implemented.

I believe (reasonably) advanced features should be available as part of the core language, so that:

  1. library authors don’t need to jump through strange hoops to get complex work done.
  2. library users can explore a library’s implementation details, without encountering library-specific dialects of the langue that are needed to accomplish complex work. This would also allow more people to contribute to libraries, growing the community.

Strong agree.

It all comes down to consistency, in my opinion.

That being said, I agree with, and respect that, Modular’s priorities are rightly elsewhere at the moment, with regards to variable bindings.

Thanks,
Monté.

1 Like

I think that, given how Mojo is currently doing error handling, raising is the most sensible option.

Secondly, I don’t think it is an issue to have two async runtimes targeting different use cases? I think a problem arises when code written for one runtime doesn’t work on another; in that case, it is an API design failure. Whether Tokio uses a heap allocator or not should be an implementation detail that isn’t reflected in its API; async code shouldn’t care about runtimes. For example, in Python, I can switch async runtime from Asyncio to Uvloop without changing any existing code.

The inability to use libraries portably is part of the problem I’m trying to solve. That’s part of why I had to throw out familiar APIs older than I am when I wrote my IO Engine proposal. However, the mechanism by which we create those abstractions does increase the complexity even if you’re using them in a way the old abstractions could handle.

Finally, while I agree that some problems are inherently complex and require complex libraries, I do not think the core language itself has to be complicated. I’m in the camp of “push the complexity into libraries”; everyone has to learn the core language, but not everyone needs to learn some libraries.

I think that the language itself has to have the capabilities to support that. The less stuff you can make “compiler magic”, the more complex the language itself has to be. Golang makes more or less all of async IO into “compiler magic”, at the cost of making it impossible to use the language for many usecases or even to “roll your own IO”. I want the ability to push stuff out into libraries as well, but that means we need to aggressively cut down how much “compiler magic” exists in Mojo. For instance, right now any interaction with the async runtime is magic. If we undo that and expose how complex async runtimes really are, the complexity of the language goes up.

As for making the stdlib complex, I think it has to be because the stdlib is the place to host universal abstractions. It defines the protocols by which other libraries communicate. When you ask for a Stringable, that’s the stdlib providing a portable abstraction. The stdlib also has to make great efforts to not be wrong, because it is the least flexible part of the ecosystem. If the stdlib ships a bad abstraction and people use it, that bad abstraction is forever part of the language. This is why my IO Engines proposal looks like such a gigantic mess of an API, it is designed to limit the blast radius of both mistakes on the part of stdlib contributors and the inevitable march of technology. C++ tried to have one, simple abstraction in the form of streams and it has been a massive mess.

I firmly believe that languages have not only incidental complexity, such as some messy APIs or Golang’s if err != nil {, but also inherent complexity, like “memory is a limited resource” and “not all IO is POSIX API shaped”. Incidental complexity can be freely disposed of, but inherent complexity is dangerous to try to hide. When you hide that inherent complexity, you need to either have a way to “pull back the curtain” so to speak, or you will make your language unusable for some subset of users. My opinion is that software development could actually be much easier if our tools would stop lying to us about how complex things are and actually told us what could go wrong. I would rather learn about a failure mode while reading the documentation for a library then via a late-night phone call. Making users think about whether they care about certain failure modes will mean that, when things do go wrong, they know where to look.

2 Likes

Agree on all points. And yes, for some things parts of the language may have to be more complicated. I’m all about thinking very hard about:

  1. What complexity is introduced?
  2. How is this complexity factored?

To me it’s less about hiding complex things and more about showing it only to people who should care about it.

The problem is that if you make it invisible until you use it, people won’t accommodate it in libraries, it’s a recipe for massive ecosystem fragmentation. I don’t want “systems programmer Mojo libraries” and “web developer Mojo libraries” to be two separate groups. I don’t think that learning is a bad thing, and as long as we have sensible defaults, most people should be fine.

1 Like

The consistency between variable bindings and argument conventions just improved with the introduction of the var argument convention. Thanks, Chris!

More consistency means less (mental) complexity.

I think this is great too,
it is an massive move forward,
there was an mental baggage with owned values :smiley:

Now this pattern reads nicer:

fn f(var n: Int):
  n += 1
  ...

I’ve always found owned a bit unprincipled to the eyes. Now it feels more like syntactical sugar for ref n; var n = n (not exactly, of course).