Variable Bindings proposal discussion

Maybe defaulting to ref (inferred mutability) would be a good option:

  • no copies by default
  • allows for mutation if the list is mutable
  • no „surprise“ because it is the same as if you access the list entries directly outside the for loop

fn test(list: List[Int]):
  list[0] = 1  # error
  for i in list:
    i = 1  # error

fn test(read list: List[Int]):
  list[0] = 1  # error
  for i in list:
    i = 1  # error

fn test(mut list: List[Int]):
  list[0] = 1  # ok, mutation inside list
  for i in list:
    i = 1  # ok, mutation inside list

fn test(owned list: List[Int]):
  list[0] = 1  # ok, mutation inside list
  for i in list:
    i = 1  # ok, mutation inside list

fn test():
  var list = [0]
  list[0] = 1  # ok, mutation inside list
  for i in list:
    i = 1  # ok, mutation inside list

fn test(list: List[Int]):
  var x = list[0]
  x = 1  # ok, mutating copy
  for var i in list:
    i = 1  # ok, mutating copy

Does leaning into PEPs make sense here?

As an aside, it’s my understanding that language decisions here have large implications for system-programming features (i.e., “fearless concurrency” a la Rust). I’ll gladly defer to smarter people than me on this one, though.


1 Scoping should be explicit

(PEP 622 §5 “Scoping rules for pattern variables”; PEP 572 §5 “Comprehension scopes”)

# PEP 622 precedent: every binding introduced by a pattern is local
for var item in collection:      # Owns a copy (local to loop body)
    item += 1                    # Mutates the copy only

# PEP 572 precedent: comprehension targets are also scoped, avoiding leakage
for ref item in collection:      # Alias into the collection
    item += 1                    # Mutates the original element

2 Mutation should be explicit

(PEP 3107 “Function annotations” + typing conventions;
PEP 622 §5.2 “Pattern variables are read-only”)

# Mojo borrows Python’s “annotate intent, enforce by tool / runtime” idea.
# `read` parallels an immutable Sequence; `mut` parallels a MutableSequence.
fn process(read data: List[Int]):   # Caller passes read-only view
fn modify(mut  data: List[Int]):    # Caller opts-in to mutation

PEP 3107 establishes the general “put the semantic contract next to the parameter” pattern that Mojo is extending with first-class read/mut keywords.


3 Variable binding should be explicit

(PEP 526 “Variable annotations”; PEP 622 §5 again)

fn test(cond: Bool):
    if cond:
        var a: Int    = 42     # Fresh, explicitly typed binding
        use(a)
    if cond:
        var a: String = "foo"  # Independent binding with a new type
        use(a)

PEP 526 justifies the explicit var + type syntax (“declarations make intent clear”), while PEP 622 shows the same “new name = new scope” rule for pattern variables.


PEP Notes

  • PEP 622 → names introduced by patterns are local → motivates var vs ref loop targets and block-scoped re-bindings.
  • PEP 572 (scope section only) → comprehension targets don’t leak → reinforces explicit loop scoping.
  • PEP 3107 + typing → annotate the mutation contract at the parameter site → Mojo’s read/mut.
  • PEP 526 → explicit variable declarations reduce surprises → Mojo’s var re-declarations with different types.
1 Like

Thank you all for the thoughts and the feedback. After reading this in depth and discussing with the rest of the Mojo team, we’re going to explore something like the “read” binding by default (rather than copying by default), which is what most folks are suggesting. This should eliminate the opportunity to accidentally mutate the local copy, and eliminate the need to use ‘ref’ to avoid performance impact of copies.

I’ll move to an approach where:

  1. var is allowed and is always a copy into a mutable owned value.
  2. ref is allowed when the iterator returns a reference, and allows you to bind the reference (including if it is mutable)
  3. No marker will provide the same behavior as “read”: you’ll get something that behaves like a read-only reference, regardless of whether the iterator returns a ref or a value.

You won’t be able to explicitly write “read”, but we can add that in the future if there is a reason to allow that.

To follow up on one other topic:

Thank you for raising this, I’ll add this to the document in the “alternatives to consider” section. The benefit of the current approach is that it fits naturally with the grammar of both Python and Mojo. In the current compiler, var is not a statement, it is an expression pattern: var x : T = foo() is the same as (var x) : T = (foo()). Adding the above syntactic convenience would break the composibility of the grammar.

-Chris

2 Likes

Thanks, but I’m not sure I understand your point. I get that it is an expression, but I’m not sure what is wrong with using it in an expression.
Do you mean that you want to prevent using it for casting in expressions? e.g. in for loops:

for (var x: String) in ["a", "b", "c"]:
    use_string(x)

for (x: String, y) in [("a", 1), ("b", 2), ("c", 3)]:
    use_string_and_int(x, y)

or in deeply nested unpacking:

(ref x), (var y: String, z), *rest = ("a", ("b", 1), "c", "d") 

# the alternative is much more complex:
(ref x), (var y, z), *rest  : StaticString, (String, Int), List[StaticString] = ("a", ("b", 1), "c", "d") 

Python does not allow annotating tuples, so I don’t understand how it fits the grammar, do you mean it is a natural extension of the python grammar?

x, y: int, int
    ^
#SyntaxError: only single target (not tuple) can be annotated

Sorry, I think I misunderstood you. Yes, you’re right - I think in principle we could make things like this work:

This doesn’t work today because Mojo follows the Python grammar, where type annotations are not part of the recursive expression grammar, they are associated with the = grammar.

That said, this seems arbitrary [1], we could incorporate that into the expression grammar at the right precedence and it would work, along with things like:

var a2: Int, b2: String = fn_returning_int_and_string()

The thing that is more complicated is the next adjacent things:

var a2: Int = fn_returning_int(), b2: String = fn_returning_string()
var a2 = fn_returning_int(), b2 = fn_returning_string()

These are a pretty fundamentally different design, because now we’re taking = and using it as an expression instead of a statement. Given this, it isn’t clear we want to support things like:

var a2: Int, b2: String = fn_returning_int_and_string()

because reasonable people could be confused about whether b2 is being initialized or a2/b2 together (it needs to be both). Given that, it isn’t clear to me that we want the var a2: Int, b2: String example to work: it seems inconsistent if we can’t scale this all the way.

Put all together, I don’t think we should allow this. We should keep things simple and narrow, rather than providing a partially paved path until and unless we decide that we want to complete it.

-Chris

[1] BTW, IMO, the Python grammar tries to force way too much semantic checking into the formal grammar, making it overly complicated, rather than doing simple semantic checks. YMMV, but I would have gone for a simpler formal grammar :slight_smile:

I agree with you that = must remain a statement, we have the walrus operator := which is an assignment expression. so that example should not be allowed IMO.
Also at this point you can just write it with 2 different statements (potentially using ; to write them on the same line):

var a2: Int = fn_returning_int(); var b2: String = fn_returning_string()
var a2 = fn_returning_int(); var b2 = fn_returning_string()
a2 = fn_returning_int(); b2 = fn_returning_string() # or without the var if it will be made equivalent.

So the the comma , should only be associated with packing/unpacking, maybe even function calls and definitions can be seen as using packing/unpacking tuples. I guess “with” statement doesn’t exactly fit the pattern because it allows for special syntax using the ‘as’ keyword, but you can’t use = there.

Yeah I guess for people coming from C this will be confusing as C uses the comma as a separator and doesn’t have packing/unpacking.

Thanks, this is a very helpful writeup and (again) an insightful discussion.

One thing I have not seen being covered in the proposal is how to return multiple refs – independent of the syntax that is eventually chosen.

Say I want to do something along the following lines:

l = [[1, 2], [3, 4]]

for ref a, ref b in l:
    # mutate the list content
    a += 1
    b += 1

or

@fieldwise_init
struct Struct:
    var a: Int
    var b: Int
    
    # Here we might also use the same syntax as 
    # as when supporting the `*` packing operator
    fn get(ref self) -> ref [self.a] Int, ref [self.b] Int:
        return self.a, self.b
        
s = Struct(1, 2)
a, b = s.get()

# or even better
# a, b = s

# mutate s
a += 1
b += 1

I see several use cases in collections (my motivation would be a nice query syntax in an ECS). Could this be implemented along with the other pattern matching features?

Is that like a tuple of refs?

In a sense. Though I might prefer the refs not being stored at some intermediate place but directly being assigned to the target variables. Maybe a tuple of refs could be constructed implicitly if the values cannot be unpacked, but this would already be the second step imo.

There is one thing I find surprising / not intuitive / inconsistent:

a1, ref b1 = result # var, ref
ref a2, b2 = result # ref, ref!
(ref a3), b3 = result # ref, var
ref a4, ref b4 = result # ref, ref
ref a5, b5, var c5 = result2 # ref, ref!, var

b2 is ref, not var the default convention.
but if you change the order or use parens it works.

I understand that the grammar is defined in a way that var and ref could have multiple bindings which are comma separated, but we have another default convention which doesn’t have a keyword associated with it and it can be easily confused with this “grammar feature” of multiple bindings.

IMO it is better to remove this multiple bindings option and require the use of a keyword for each binding, or use no keyword choosing the default convention.

The default arg convention for function args, for loops, and with statements is read.
The default arg convention for variable declarations is var.

I think this is a good opportunity review and decide if the default arg convention for variable declarations should also be read.

The advantages of using read for variable declarations include:

  • Better performance by default, read is a reference and will not make copies, you can easily find all copies by searching for var.
  • Single argument convention used for everything, improving language consistency and simplicity.
  • When the user wants to mutate the variable, making him specify explicitly whether he wants to mutate an owned copy (var) or to mutate the reference (ref). This will prevent mistakes of accidentally mutating the copy and not the original variable, or unintentionally making copies.
  • I think immutability by default is considered good practice.
  • since mojo can report unused variables and unused mutations, it is less of a concern to require an explicit keyword for variable declarations.

Maybe it would be interesting to gather a statistic somehow of which convention in theory will be used more often.

1 Like

It would be clearer if we add back the optional parentheses for Tuples:

a1, ref b1 = (a1, ref b1) = result # var, ref
ref a2, b2 = ref (a2, b2) = result # ref, ref
(ref a3), b3 = ((ref a3), b3) = result # ref, var
ref a4, ref b4 = (ref a4, ref b4) = result # ref, ref
ref a5, b5, var c5 = ref (a5, b5, var c5) = result2 # ref, ref, var

Specifying multiple bindings at once can be useful at times; perhaps the parentheses should be required in such cases:

ref (a, b) = ...

Namely, to make ref bind more tightly than ,.

The only advantage I see for it is for saving characters not having to type var/ref multiple times.

The question is how useful and how commonly used will it be?
How many variable you will be unpacking?

  • 2 vars saving 3 characters but if you force parenthesis you save only 1 character.
  • 3 vars saving 6 characters or 4 with parenthesis.
  • 4 vars - this seems very rare and I would encourage returning a struct with named results.

Does it really worth adding the complexity and the confusion with the default arg convention?

You’re still thinking in Python. In Mojo this is simply a special case of irrefutable patterns, e.g.

ref SomeStruct(a, b, c) = expr

Not supporting it would only add unnecessary complexity. The ref/var pat = ... form always introduces fresh names. It would be highly inconsistent to forbid destructuring only when pat is a tuple.

We don’t have this yet, but IIUC it’s the same question:

SomeStruct(a1, ref b1) = ... # var, ref
SomeStruct(ref a2, b2) = ... # ref, ref!
SomeStruct((ref a3), b3) = ... # ref, var
SomeStruct(ref a4, ref b4) = ... # ref, ref
SomeStruct(ref a5, b5, var c5) = ... # ref, ref!, var

but now it can be even more complex:

ref SomeStruct(a, var b, c, AnotherStruct(d, ref e, f)) = ...
# I'm guessing it is the same as the explicit version:
SomeStruct(ref a, var b, var c, AnotherStruct(var d, ref e, ref f)) = ...

Not sure if I got it right, but it is confusing, and might not be worth the characters saved.
Now if we put this expression in a place which the default notation is read, and we want to change the variable d to use the default notation read, how do we do it? with the explicit version it’s obvious and easy.

Hi,

@yinon I strongly agree with you in regards to making read the default.

[Edit: I misunderstood yinan’s original point]

The idea of making immutability the default was discussed as part of the let-proposal.

(I highly recommend reading the proposal, it was very insightful) I think the following makes sense from the proposal: > [making immutable values the default] This cuts very hard against a lot of the design center of Python, which doesn’t even have this concept at all: it would be weird to make it the default

Note that Mojo does “warn about unneeded mutability”, according to the discussion.
I think that if Mojo can be configured to make this warning more strict, on a per-project basis, that could be a great solution.

I do personally think there is a lot of value in being able to indicate that a value (not known at compile time) should not be changed at runtime.
The reason being that it communicates intent.
In a large team working on a large project, clarity of intent is important.

I asked for feedback on the Discord in this regard, and got the following detailed answer:

User @duck_tape

You can create immutable refs with helper functions:

fn as_read[
    T: AnyType, origin: ImmutableOrigin
](ref [origin]arg: T) -> Pointer[T, origin]:
    return Pointer(to=arg)

A tuple pattern is just a pattern–like a single variable name–and banning something like ref a would be absurd. I think the grammar quirk is for the following reasons: ref has higher precedence than ,, and parentheses serve both grouping and tuple construction, so:

  1. Prefix vs. comma
ref a2, b2  # parsed as ref (a2, b2)
  1. Ambiguous parentheses (not a tuple literal)
(ref a2, b2)  # parsed as (ref (a2, b2))
  1. Constructor patterns (unambiguous)
SomeStruct(ref a2, b2)  # parsed as SomeStruct((ref a2), b2)

Hence your examples become:

# default to var
SomeStruct(a1, ref b1)          = ...  # var, ref
SomeStruct(ref a2, b2)          = ...  # ref, var
SomeStruct((ref a3), b3)        = ...  # ref, var
SomeStruct(ref a4, ref b4)      = ...  # ref, ref
SomeStruct(ref a5, b5, var c5)  = ...  # ref, var, var

# ref everything unless otherwise stated
ref SomeStruct(a, var b, c, AnotherStruct(d, ref e, f)) = ...
SomeStruct(ref a, var b, ref c, AnotherStruct(ref d, ref e, ref f)) = ...

There may still be a precedence bug lurking. The following example currently doesn’t work:

fn main():
  (a, var b, c) = (1, 2, 3)

I think it’s now parsed as (a, var (b, c)) which is wrong. GH issue.

1 Like
  1. Big fan of removing the [] from all my loops. Also cursor / ai tool is going to like this a lot. I think 95% of the cases of iterating through elements for me have been read only ops. The remainder I pop ref in the loop. (edited)

  2. Some semantic weirdness maybe someone can explain to me:

  • For loops return immute/read references,
  • We make them mutable by using for ref element in someelements
    • I still get thrown off by this syntax since element is already a reference. Adding ref seems redundant? Doesn’t it make more semantic sense to do for mut element in someelements ?
1 Like

I’m not suggesting bringing back let which is an immutable owned variable which will make copies. I’m suggesting to use read which an immutable ref and will prevent accidental copies e.g. when reading from a list:

a = my_list[x] # this will be `read` instead of `var` preventing an accidental copy.
a += 1 # error: do you want to modify a local copy (var) or the list item (ref)?

The let discussion is old, and just now mojo has decided to go on a route of transforming python code to mojo code, where the transformer will add owned to function args, so the question is whether it can also add var to local variables?
I’m just saying it is worth a discussion.

1 Like

Just to be clear, I’m not suggesting removing ref a, I’m suggesting that ref and var will only be associated with a single name binding, so If you want 2 refs just type ref twice, once for each name like this:

a1, ref b1 = ... # default, ref
ref a2, b2 = ... # ref, default
ref a3, ref b3 = ... # ref, ref
ref a4, b4, var c4 = ... # ref, default, var

Now if you convert the tuple to a struct it remains the same:

SomeStruct(a1, ref b1) = ... # default, ref
SomeStruct(ref a2, b2) = ... # ref, default
SomeStruct(ref a3, ref b3) = ... # ref, ref
SomeStruct(ref a4, b4, var c4) = ... # ref, default, var

(**BTW this also follows the zen of python principle “There should be one — and preferably only one — obvious way to do it”, right now there are 2 ways:

ref a, b # ref, ref
ref a, ref b # ref ref

But I’ve mentioned above better arguments, the complexity/confusion is not worth IMHO the characters saved)

1 Like