so I’m trying to figure out a principle to decide between these two, to decide what to put into signatures.
@clattner, what did you mean by “The thing we’re talking about in this proposal is not an ‘operation’, it is a declarative specification of how the self argument of a method is processed.” ?
That’s a good distillation of the issue at hand @Verdagon. The issue is only syntax, not semantics. My belief is that the arg-based syntax is going to cause significant confusion.
I should also explain why I like the idea of putting the finalization step last:
fn dismantle(var self):
foo(self.x^)
deinit self
Rather than first:
fn dismantle(var self):
deinit self
foo(self.x^)
The idea is to mirror how destruction works. Ordinarily, var x is cleaned up by the implicit invocation of x^.__del__ after the last use of x. The idea with deinit x is to be able to explain the behaviour as “instead of finishing with a del, we are finishing with a deinit”. Put simply: we are choosing an alternative fate for the variable.
This detail isn’t important though; I’d be satisfied if the deinit statement were invoked prior to the move operations. My main concern is whether or not the keyword is put in the function signature.
Thank you again for the extra feedback, I’m collecting responses to good points to respond in a batch again:
On statement vs declarative spec
You did a great job explaining how we can limit a statement to prevent memory safety violations, and I agree with you completely.
That said, I don’t understand why the proposal of a statement is better than a declarative spec, some concerns:
Most obviously, this adds boilerplate to every __del__ , and in practice, this is almost always going to be deconstruct self at the bottom of the function.
I don’t see why it is important or useful to deconstruct values other than self, we want to limit this intentionally right? Why are you trying to expand access to something else we might have private access to? We haven’t designed access control yet, so I assume you’re predicting a friend mechanism, but this seems redundant with adding a private named destructor on that type.
This is generally a more complicated (for users) way to expose this functionality, so I’d love to understand the concrete use cases they enable, as well as how common they occur, in order to understand whether the complexity added pays for itself.
I’m not really sure of a way to do this other than to get experience with the current approach. Let’s play forward the current approach and if we run into limitations of the feature (e.g. when access control comes around) then we can better know how to solve the problem, rather than trying to predict problems and proactively solve for them.
Power user features
+1, I’m very happy to provide power-users like you with advanced features for exotic use-cases. However, we also have to balance the experience of more traditional Mojo users, and we want that use-case to be safe, easy to teach, and convenient.
You’re already quite deep into unsafe code for your use-case, so why isn’t an unsafe construct enough to satisfy your requirements?
On U/X and teachability
There are various discussions about teachability, which is fantastic!
I agree with Nick. Whatever the design is, we can have the compiler generate good error messages for the normal __del__ case here. Moreover, we can expect code completion and AI coding tools to write the deinit most of the time anyway.
Putting the convention into the argument signature
I’m not sure if “good” is the qualifier in question here, but I’ll toss out a few reasons:
This behavior is a declarative aspect of the implementation of the function, so putting it on the function makes sense. Though, this could be done as a decorator or some other ways as I mentioned before.
I don’t see any benefit or utility to allowing “destruct” any place other than end of the function, because of how ASAP destruction works. See an example below at the bottom of this. If “where” the operation happens doesn’t matter, then don’t need users to think about this lexically/positionally, and a statement doesn’t make sense.
We don’t have to have a statement. All of the objections I point out above apply here - we don’t want to give choice of “what to disable”, I don’t think we want people to have flexibility to “disable at the start of the function” (just because you “could” do it doesn’t mean additional choice is good for anyone"), and a statement is boilerplate for the majority use-case. I don’t see motivation for a statement, as described above.
We already have the syntactic space for this.
This is a “soft” keyword instead (like “mut”) and not a “hard” keyword like a statement.
“declarative specification of how the self argument of a method is processed”
I think I covered this above, but the desirable thing is to say “this self argument is not destroyed with del at the end of the function”. This is literally what the implementation does.
Furthermore, (and again, commenting against the “make this a statement” argument) - I don’t see any value in allowing people to write code like this that “destructs” the argument at the start of the function or anywhere else:
There is literally no behavior change because fields are harvested ASAP already. All the “non-engine” fields get destroyed before “self.engines” is explicitly destroyed because they aren’t used.
This already works in tree and does exactly what you want - give it a try.
I don’t see why it is important or useful to deconstruct values other than self, we want to limit this intentionally right? Why are you trying to expand access to something else we might have private access to? We haven’t designed access control yet, so I assume you’re predicting a friend mechanism, but this seems redundant with adding a private named destructor on that type.
One of the easier to demo examples is compile-time refcounting with linear types:
Maybe @Verdagon has other ideas on how to make this work, but I haven’t figured out a way to make comptime refcounting work that doesn’t look like one of those two forms, since we have to use linear types to make the user eventually deinit things.
You’re already quite deep into unsafe code for your use-case, so why isn’t an unsafe construct enough to satisfy your requirements?
If you mean an unsafe construct in terms of __disable_del letting me break the guarantees of linear types, I can be perfectly happy with __horribly_unsafe_here_be_dragons_ub_ahead_disable_del , however, you removed the escape hatch in this commit.
If you mean “Why can’t Owen use UnsafePointer for all of this?”, my goal for library code I write is to provide a “gradient” of APIs where you have an API where the compiler can manage everything for the user and nothing that the user can do can invoke UB, leak memory, etc, and on the other side the fully unsafe, trust the programmer API that lets the user bend/break the rules however they need to. Ideally, everyone can use the first API, but that’s not going to happen so I want to provide “less unsafe” options. This is a habit I’ve taken from Rust of trying to minimize the unsafe API boundary of code I write, even if internally I makes liberal use of unsafe constructs, since I want the library to be easy to use, meaning the compiler should verify as much as possible.
I don’t see why it is important or useful to deconstruct values other than self
I don’t really understand the full example, so I simplified it
I agree that something like this can be useful in niche cases, but the behavior is already supported, and used in (eg) Coroutine._take_handle. You can implement stuff like this using this pattern:
struct YourThing:
var whatever: State
fn _destruct(deinit self) -> State:
# don't destroy the local state, caller might want to use it.
return self.whatever^
fn merge_refs(deinit self, var other: Self) -> Stuff:
use(other.state)
stuff = other^._destruct()
return ...something with stuff...
Such an approach gives you basically the same expressive capability of the statement, and I think it covers your comptime refcounting thing. Maybe I’m misunderstanding it though.
You’re right that allowing deinit on other arguments could reduce a bit of boilerplate (defining a _destruct method like the above) but keeping it simple and limited cuts of foot-guns, and doesn’t reduce expressive capability.
If we found this to be a significant burden in practice, then we could open the aperture, e.g. switch the restriction to be an arguments of Self type, which would directly support your use-case.
I think I can live prove out the concept with that boilerplate provided that there’s a good way to do “scoped privacy” (ex: file scope) at some point in the future so that _destruct can’t be called by outside code without some acknowledgement that encapsulation is being broken. I was trying to avoid introducing a way for the user to misuse APIs like that, but I can work with a warning for now.
I’m in your camp: I want to control access to APIs and provide type authors the right tools to do that. Coming from its dynamic language roots, Python doesn’t have that (any checks would have to be done at runtime which would be expensive) and thus Mojo doesn’t currently either.
That said, we’ll definitely need to tackle and close this by introducing "access control” modifiers of some sort, e.g. “private”. I just hope we can do something simpler than what swift ended up with!
Thanks @clattner for your detailed response. You make some good points. Please allow me to offer my final thoughts.
Here is my summary of your beliefs:
An explicit statement (deconstruct self) would be a bit verbose/boilerplate-y.
A statement forces users to decide where to invoke it, but in practice, it will almost always be invoked at the very end (or very beginning) of a function definition. Users shouldn’t be forced to think about htis.
We don’t have many compelling use cases for “deconstructing” arguments other than self.
We can always start by limiting this feature to self, and if we want more flexibility, in the future, we can revisit the design.
I agree with all of these points. Allow me to augment those points with a belief of my own:
We should avoid altering a function’s signature (everything between the fn keyword and the final :) as part of specifying that a function “immediately destroys self”.
I explained in my previous posts why altering a function’s signature would be a pedagogical nightmare.
With all of our points in mind, I would propose that we use the decorator-based syntax that you previously proposed:
@deinit
fn __del__(var self): ...
This syntax addresses all of our points. I would be very happy with this syntax, at least as a starting point!
UPDATE: Oh wait… has anyone else noticed that __moveinit__deinitializes the OTHER argument, not self?
This is evidence that even in today’s Mojo, we have a use case for deinit applying to more than just the self argument.
Of course, we could reverse the signature of __moveinit__, to make it a “deconstructor” rather than a “constructor”. Funnily enough, I suggested that we reverse the signatures a few months ago, in this open issue.
Oh wait… has anyone else noticed that __moveinit__deinitializes the OTHER argument, not self?
This is just an artifact of how out arguments work. They are syntax for a return value, so the “actual” signature of moveinit is (deinit self) → Self. It is consistent.
It is worth exploring replacing the names __copyinit__ and __moveinit__ with named constructors anyway, e.g.:
Oh, I can see how that interpretation of __moveinit__ makes sense. Still… the fact that the out argument is named self makes its signature quite confusing!
I have one more alternative proposal for the syntax of this “named destructors” feature, which I think people will like. I just need to find ~15 minutes to write it up. Be back soon.
Okay Chris, I’m back! Here’s one more alternative that IMO is promising.
Brief recap
Our goals are to:
Support named de-constructors. (Which will make __del__ and __moveinit__ less special.)
Ensure that only functions with privileged access to a variable are able to behave as a de-constructor. (i.e. preserve encapsulation.)
Minimize boilerplate.
(Nick’s goal) Ensure that Mojo programmers understand that whether a function is a de-constructor is just an implementation detail. It’s not something that the caller needs to think about. (A function should be free to factor out de-construction to a helper function, without needing to update its signature.)
Chris’s current proposal is to:
Use a deinit keyword to mark arguments as being de-constructed inside the function body.
Restrict deinit to the self argument.
Here are my concerns with Chris’s proposal:
Putting deinit in the function signature violates goal #4.
With functions such as __moveinit__, deinit is used on the second argument, rather than than the first argument. If we are restricting deinit to the self argument, then this is super confusing. Chris’s rationale is that “the second argument is technically self, because the first argument is the return value”. Unfortunately… this seems like it will be a headache to explain to Mojo learners!
Here’s a new proposal
To resolve the aforementioned issues, I need to diverge from Chris’s design in the following ways:
Rather than restricting this feature to self arguments (that’s confusing for __moveinit__), restrict this feature to functions that have access to the private members of the struct.
This makes my proposal dependent on Mojo’s design for access control, but that’s ok. We can stick with Chris’s current implementation of de-constructors until access control is added to Mojo.
Rather than introducing a deinit keyword, stick with the var keyword.
Once we’ve made these changes, we’ll need a new way for a function to signal that it is de-constructing a var argument, rather than invoking __del__ on it.
I propose that we use fieldwise deinitialization of the argument to signal to the reader (and the compiler) that the argument is being de-constructed. The precise logic would be as follows:
If the programmer has deinitialized EVERY field of a var argument, then its destructor is disabled.
If the programmer has deinitialized SOME fields of a var argument, then the compiler would complain that the variable was left in a partially-initialized state.
If the programmer has deinitialized NO fields of a var argument, then __del__ is implicitly invoked on it.
Here is a concrete example:
struct Composite:
var x: Foo
var y: Foo
fn __del__(var self):
del self.x # assume this desugars to self.x^.__del__()
del self.y
# All of the fields of `self` have been deinitialized,
# so its destructor has been disabled.
fn __moveinit__(out self, var existing: Self):
self.x = existing.x^
self.y = existing.y^
# All of the fields of `existing` have been deinitialized,
# so its destructor has been disabled.
Some important notes:
Only functions with access rights to the fields of the struct can deinitialize the fields one-by-one. Therefore, this design preserves encapsulation.
I am proposing that to de-construct a var, each field of the var must be explicitly deinitialized (either using ^ or by invoking del), since this makes de-construction very explicit. If you miss any of the fields, you will get a compile-time error.
In today’s Mojo, if a variable’s fields have been deinitialized, its destructor won’t run. The same is true in my proposal. The difference is that in today’s Mojo, you get a compile-time error (“field was destroyed, preventing the overall value from being destroyed”), whereas in my proposal, this is how you de-construct a variable.
Edge cases
If we want to allow for the de-construction of a var argument that has zero fields, we would need to figure out how to prevent the invocation of its destructor. An obvious “hack solution” would be to add a marker field of type None to the struct. The deletion or transfer (^) of that field constitutes de-construction.
Summary
This design has the following advantages:
It reduces the number of keywords in Mojo. (No need for deinit.)
It is low boilerplate.
It ensures that de-constructors are just ordinary functions that act upon var arguments. By keeping de-construction out of function signatures, we avoid distracting (and potentially confusing) the users of de-constructors. All they need to know is that the argument is var.
Another major advantage of this design is that we can extend this design to support named constructors.
In today’s Mojo, __init__ is “magic” in that its out self argument is considered initialized as soon as its fields are initialized. It we want to eliminate this magic, and support named constructors, we can make out-initialization the REVERSE of var-deinitialization. By that I mean: we can treat fieldwise initialization of an out argument as a valid way to construct the argument. To preserve enscapsulation, we can use the same trick we used for var de-construction: we can use access control to limit who can fieldwise-initialize an out argument.
This model of out arguments would just be a “bonus”. It should be considered separately from the de-constructors proposal.
I see two problems: the first is that this is an implicit behavior, the second is that it relies on compiler magic.
I also don’t think making it implicit solves the problem of teachability. Yes there is less syntax, but there is still some behavior there that still needs to be taught, but now the learner has to internalize other rules as well - like needing private access, and making sure to properly denit all fields to get the same behavior.
Finally, I’m not sure I understand your argument about requiring denit to be an implementation detail. Personally I would prefer to tell with one glance at the API that a denit happens. Additionally, from a user’s POV it does not behave differently to var.
It’s true that my proposal requires the reader to look inside the function body to determine whether a var argument is being de-constructed or not. In that sense, it’s “more implicit”.
However:
De-construction is still “obvious” when you look around inside the function body. If you see the fields of the var argument being del’ed or transferred systematically, it should be clear that the variable is being de-constructed.
Most functions should have docstrings, and the docstring of a named de-constructor would probably say something like “transfers the data of self into other”, which will make it clear what’s happening.
In a few ways, my proposal is more explicit than the deinit proposal, not less!
To expand upon the last point: my proposal requires that to deconstruct a var, you need to explicitly delete or transfer each field. Consider the following struct:
struct Foo:
var x: String
var y: String
fn to_list(var self) -> List[String]:
return List(self.x^, self.y^)
This is a valid struct definition under my proposal. What happens if we add a third field?
struct Foo:
var x: String
var y: String
var z: String
fn to_list(var self) -> List[String]:
return List(self.x^, self.y^)
In my proposal, you would get a compile-time error, saying that self is partially initialized, and that you either should “finish the job”, or restore the values. We forgot to handle self.z, and the compiler has saved us.
Unfortunately, with the deinit proposal (implemented in Nightly), the compiler would implicitly destroy self.z, which is not what we wanted:
struct Foo:
var x: String
var y: String
var z: String
fn to_list(deinit self) -> List[String]:
# Compiler inserts a call to self.z^.__del__()
return List(self.x^, self.y^)
In summary, my proposal is not necessarily more implicit than the deinit proposal. It’s true that we’re no longer signposting de-construction in the function signature, but we’ve made the process of de-constructing more explicit, which might actually help us catch a few bugs!
I don’t like implicitly disabling the dtor when all fields are moved away. That’s the kind of “magic” behavior we’re trying to avoid. It also treats structs as nothing more than the sum of their parts, which is very untrue in many cases.
public/private, or some other system for enforcing encapsulation, is going to be necessary even if there is an override burred somewhere in the language. I’ve already seen things in normal python code that make me thing that it’s a good language feature to have, and in a systems language you can more easily cause problems when breaking encapsulation.
I agree that I would like to know what functions are destructors, but I don’t think that arguing against encapsulation is a good idea.
To be clear, I’m not arguing against public/private; I think they’re a necessary language feature. However, I’m not really a fan of one language feature depending on another to work. Language features should be able to stand on their own, as well as compose with other language features.
Besides, encapsulation is already being enforced here for this specific denit proposal.
I think one of the „issues“ discussed concerning the current implementation is that this would be the same as
fn __init__(init self, var other):
which has to be used If the actual deinitialization is done indirectly by __init__ calling another function. Therefore the deinitialization is more of an implementation detail as an argument convention.