I like this proposal, some minor points:
- I think using the same default arg convention of functions can be used for the “for” bindings which is:
- read in fn
- read in def unless written to then var.
I think it is better to optimize for the common case when you only want to read and don’t want to copy, this will be more performant by default and shorter.
This doesn’t necessarily mean we need to allow explicit read/mut as all of the variations can be achieved:
- default is read
- use ref when you need to mutate instead of mut
- use var when you need a copy.
- About allowing explicit mut/read in fn var declarations, the only use case I see is to prevent accidental mutation, for example when passing to another function, but the same case could be made for var vs let declaration which mojo has already decided is not necessary, so for consistency and simplicity I think it is better to only allow ref and potentially use some casting to prevent accidental mutation if really needed similar to var.
var x = my_list[0]
external_function(x) # could mutate
external_function(read x) # immutable cast - not sure about syntax or if needed
ref y = my_list[0]
external_function(y) # could mutate
external_function(read y) # immutable cast - not sure about syntax or if needed
read z = my_list[0]
external_function(z) # prevent accidental mutation
- I’m not convinced about this syntax:
# Declare two uninitialized variables
var a2, b2 : Int, String
The other alternative is to put the annotation next to the variable:
# Declare two uninitialized variables
var a2: Int, b2: String
The advantages are:
- it is easier to see which type belong to which variable when there are many of them and potentially unpacking in the future.
- it can allow for partial type definitions:
var x: String, y, z = ["a", "b", "c"] # y and z are auto inferred
var x, y, z : String, StaticString, StaticString = ["a", "b", "c"] # forced to also annotate y and z
As a minor note python disallow this:
x, y: int, int
^
#SyntaxError: only single target (not tuple) can be annotated
I like the „value binding“ proposal.
+1 for read
as default in for
loops:
In addition, I think it would be helpful to allow for declaring immutable runtime value bindings in function bodies like read x = 12
or read start_time = perf_counter_ns()
. So you are sure that this value will not be mutated later on.
BTW: This can already be accomplished with a helper function:
fn read_only[T: AnyType](x: T) -> ref [ImmutableAnyOrigin] T:
return x
fn main() raises:
ref x = read_only(12)
x += 1 # error
Tbh I prefer the current default implementation as it is consistent with ownership and value semantics.
However, it is worth exploring whether ref captures can be more specialized - e.g into readonly captures or mut captures rather than only allowing the current generalised form. I imagine it may have readability improvement but not much on functionality
The only issue I have with the current default implementation is that it can lead to unnecessary copies and therefore performance degradation.
Example:
for s in listOfLongStrings:
print(s)
Every string in the listOfLongStrings
gets copied without warning.
If the default would be changed to read
no copy is done in the example. One would have to explicitly opt-in to copying with a var
declaration if needed. read
is the default in fn
argument conventions too.
If read is the default, what would be the syntax if I wanted to mutate s?
I have the same question.
Being able to update a list in-place is obviously important
Copying by default (current approach) also does not solve this problem. Access to value itself (by reference) is much more desired. If the goal is to mutate the copy of value without touching the original, this copy can be made at caller side, or via supporting value semantic.
The current approach is value semantic. To mutate the list in-place you have to capture the reference, which is what Chris just implemented. This is better for progressive disclosure of complexity.
If you want a copy of the list items:
for var item in list:
If you want to modify list items inside the list:
for ref item in list:
This already works in the current implementation.
And this would stay the same if the default would be changed to read
access.
With immutable reference by default, it would be consistent with function args, where this is also the default. And IMO the right one. In most case I need only to read value, without mutating. So value semantic (copying original) is not the right defualt for most common case.
This is IMO the right approach. read
(default) - immutable reference, ref
-mutable reference, var
- copy
I’m thinking about how much friction this introduces in a casual program, because now it is introducing ceremony into what should be a natural and intuitive model. More important is when this friction is introduced; 90% of the time, the owned value option is sufficient, in the 10% of the time when it’s not, you’re most likely already deep into the language, and so you know why it doesn’t work for your use case.
This fixes a problem I have with Rust where you’re forced to deal with complexity without understanding how it helps you, at it start, it just seems like complexity for complexity’s sake. Something that can add up if we’re not careful.
I wouldn’t want Mojo to go through the way of Rust, where we constantly have to tell beginners to “just clone”, when the language could have been value semantic from the start, which is a lot more natural in most cases. It is a lot better for experienced programmers to take on more complexity, especially when it just involves three extra characters.
In conclusion, when proposing changes, we also have to reason about when any complexity would become visible to a new learner. Mojo aims to be both performant and approachable. Your proposed changes would improve performance in loops, no doubt, but I’m wondering about whether it hurts approachability. I’m also wondering whether these proposed changes would also carry over to def
functions or just in fn
. And if it is just in fn
functions, we’d have to be careful about creating two languages while trying to solve a two-language problem.
default is read
use ref when you need to mutate instead of mut
use var when you need a copy.
I think this is the least surprising behavior we could go with. If we try to do mutable by default, there could be “fun aliasing issues”.
I also agree that we need some kind of disambiguation syntax for passing a mutable reference vs an immutable one.
Also agree on putting annotations next to the variable. Large tuples and unpacks get messy fast.
In def
functions the same approach could be used as is already implemented for default argument conventions.
The default argument convention in def
functions is read
unless you modify the value, then the compiler automatically provides a copy and you will modify the copy.
So nothing special if the same happens in for
loops inside def
functions.
def test():
for item in list:
print(item)
would be the same as in fn
functions (no copies, just read access):
fn test():
for item in list:
print(item)
But in def
functons the compiler would automatically provide a copy if a mutation is done. So this would be fine in def
functions:
def test():
for item in list:
item += 1
In fn
functions one would need to explicitly ask for a copy to do the same:
fn test():
for var item in list:
item += 1
I don’t think the default for loops should be based on how functions take arguments, because by that logic loops in fn
functions would default to immutable references, while loops in def
functions would also default to immutable references, unless an attempt is made to mutate the reference, in which case it would turn to an owned value.
The way the default works now is consistent across both def
and fn
functions, and across value and (in the future) reference types. And it is also a lot easier to teach, use, and comprehend. No special casing when certain features are required and when they’re not, like we currently have with functions.
I think it’s important that we make the default not have a huge performance footgun. Copy by default will mean that if you do a naive matmul with a 2d array, you will copy the inputs. This is most likely not what people want. Doing copy by default also leads to confusing behavior like not being able to iterate over a list of lists and append an item to each inner list. If we make the default read, the compiler can tell people “Use ref foo
instead of foo
if you want to mutate foo
” when it detects a borrow checker violation on a loop variable. This is a tiny bit of complexity in exchange for stopping a fairly large performance footgun. “You’re writing your loops wrong” is not something we want developers new to systems languages to hear from forums when they ask why their code is slow.
However, after thinking about it a bit more, I think I would prefer mut
and read
instead of ref
, leaving var
for value-semantics. People may have difficulty remembering which ref
is, and “default to read
” is consistent with the rest of the language. The default could also be “the compiler tries to figure it out between mut
and read
”, which might reduce porting friction and make it nicer to write in some cases.
I think that asking people to think about what they want is helpful here. Cloning isn’t always the right option for what someone wants to do, so we want people to think about what they actually want from iterating over a collection.
Hmm, that could work, assuming it doesn’t make the compiler more complicated.
I am against leveling down to be in sync with def
. Paying performance cost for system level language in default cases, where most of the time I need to just read the value is IMO the wrong approach.
There is one inconsistency with only allowing ref and not mut:
def my_append(my_list: List[Int]):
my_list.append(0)
# user is surprised this made a copy silently and didn't modify the list, we tell him to use mut:
def my_append(mut my_list: List[Int]):
my_list.append(0) # now it works
def my_append(ref my_list: List[Int]):
my_list.append(0) # error my_list could be immutable
# where as in for loops
def f():
for my_list in nested:
my_list.append(0) # user is surprised this made a copy silently and didn't modify the list, we tell him to use ref instead of mut?
for ref my_list in nested:
my_list.append(0) # now it works and the ref is mutable
So the inconsistency is that in arg convention: ref means parametric mutability where as in for loop it means inferred mutability (it could be parametric if you infer another parametric mutable ref)
Not sure how much does it matter though.