[proposal] Make UnsafePointer Non-nullable

Proposal from @nate

2 Likes

I like the stronger invariant.

Appreciate you making the forum post - I was just about to do so after the nightly just released.

TLDR;

UnsafePointer - can no longer be null.

  • Removes UnsfaePointer.__init__(out self) (default constructor) & Defaultable conformance
  • Removes UnsafePointer.__bool__() & Boolable conformance

To represent “null-ness” you must use Optional[UnsafePointer].

note: This proposal is not meant to discuss certain hardwares where 0x0 is a valid address. This is likely a separate discussion particularly for how to represent Optional[UnsafePointer] since that would no longer allow the None discriminant to hold 0x0. (I just realized this wasn’t addressed in the proposal so it’s something we can tackle post-1.0 if needed).

While I agree with the goal of this proposal (null safety), I don’t think this is the right way to go about it.

Mojo’s current UnsafePointer inherits C-style nullability. Any
UnsafePointer can be null, but the type does not communicate that fact.
Whether a null value is valid must be inferred from local conventions,
documentation, or defensive checks.

I would argue that, given that we have NonNullPointer, any usage of UnsafePointer which does not handle it being null is unsound, since the API contract is signaling that Null is a valid state. I think that instead moving that construct out of MAX and into the stdlib makes a lot more sense.

The problem is not that these patterns exist. The problem is that the type
system does not distinguish between:

  • a pointer that is guaranteed to be present

NonNullPointer

  • a pointer that is optional and must be checked before use

UnsafePointer or Optional[NonNullPointer]

This proposal does not attempt to make raw pointers generally safe.
UnsafePointer remains unsafe in the usual ways: it may still dangle, alias
incorrectly, point to invalid memory, or violate lifetime assumptions. This
proposal addresses only one dimension of pointer unsafety: nullability.

My opinion is that UnsafePointer should be left fully unsafe, as a primitive to build other types with. For instance, if I have a pointer which is null in certain states, but I track that with a typestate variable and disable any methods that dereference that pointer in those states, it’s still a safe API. If UnsafePointer makes null an invalid state, applying that pattern would become UB since a member would be in an invalid state. If we see friction because too many things want UnsafePointer directly, perhaps due to the limited API of NonNullPointer, that may be a sign that Mojo needs to have a PointerLike trait for all of the various kinds of pointers (NonNullPointer, OverAlignedNonNullPointer, CompressedPointer, etc).

4. Allocation

UnsafePointer.alloc() continues to return UnsafePointer (non-null). If the
underlying allocator fails, the program traps rather than returning a null
pointer. This matches the behavior of most modern allocators (including the
default system allocator on Linux, which overcommits) and avoids forcing every
allocation site to handle a nullable return.

  • Note: We can provide a fallible allocator interface via try_alloc, however
    most mojo programs today assuming infallible allocation, so this is a fine
    default.
  • Code that needs to handle allocation failure explicitly should call the
    underlying allocator (e.g. malloc) directly and use
    Optional[UnsafePointer] for the result.

In this case, what does the external_call to the underlying allocator return if null becomes an invalid bit pattern for UnsafePointer (required to implement size_of[Optional[UnsafePointer[…]]]() == size_of[UnsafePointer[…]])? UnsafeNullablePointer is transitional in this proposal, but we need a permanent solution to this issue.

Additionally, overcommit is optional on Linux, and is often disabled on safety critical systems, including robots where having a safety-critical process get OOM killed is unacceptable and where the performance loss from suddenly entering the kernel may cause realtime safety violations. However, you don’t even need to turn it off to get the system allocator to return null.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    void* c = malloc(PTRDIFF_MAX);
    printf("%p\n", c);
}

The following program should reliably print (nil) on Linux, and likely on other platforms as well.

Using the following program, we can look for how many bits were available and what the maximum allocation size is before glibc yells at you:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    void* c;

    unsigned long alloc_size = PTRDIFF_MAX;
    size_t num_bits_used = 63; // PTRDIFF_MAX is (1 << 63) - 1 on a 64-bit system
    
    while ((c= malloc(alloc_size)) == NULL) {
        alloc_size /= 2;
        num_bits_used--;
    }
    printf("%p %lu %lu\n", c, alloc_size, num_bits_used);
}

On my laptop (64 GB of memory): 0x7ef8ce4fb010 68719476735 36
On a spare server (128 GB of memory): 0x7423ab5ff010 137438953471 37

Notice a pattern?

First, any allocation much than the physical memory of the system made using malloc is rejected. Second is that while this is a lot of memory for a consumer system, it is less memory than many leading models require. I think someone could very easily do this:

def main():
    var model_size: Int = 200000000000; // bit of extra padding
    var kvcache_size: Int = 74877906942; // Lots of spare memory for kv cache
    var buffer_size = model_size + kvcache_size. 
    var model_buffer: UnsafePointer[UInt8, ...] = malloc(); // UnsafePointer is never null, so don't check it
    load_model_and_run_inference(model_buffer, model_size); // crash

Why a crash? Because the user ran this program on a spare desktop with 128 GB of memory in it and llama.cpp can do it (unknown to the user, it uses mmap by default and thus swaps).

I think I’m fairly on the record in saying that we the default should be that the allocator raises when getting any kind of error from the allocator, allowing users to pass a define that makes allocators abort on error. Given the utility of bump allocators for web serving and other common workloads, I think it’s not unlikely for someone to get an actual “out of memory” from a custom allocator, especially if they were on a GPU where memory is likely to be somewhat constrained even on proper GPU servers. I’m very strongly against further codifying “it’s okay to just crash” for this reason.

Layout Requirement

This proposal requires that Optional[UnsafePointer[T, origin]] have the same
size and layout as UnsafePointer[T, origin]. The expected representation uses
the null pointer value as the None discriminant, so nullable pointers remain
zero-overhead. This matters for performance and FFI.

I strongly agree here, any solution to this problem which has performance overheads for FFI is a non-starter.

Prior Art

All of the mentioned languages have 2 different types for Nullable and Non-Nullable pointers. I think that if Mojo defines 00 00 00 00 00 00 00 00 as an invalid bit pattern for a pointer on a 64-bit target, then we need a way to express that. While this proposal includes UnsafeNullablePointer, once again that is listed as temporary.

Source Compatibility and Migration

I agree that fixing this needs to be a breaking change, but I think that going in the other direction, and migrating everything that can be migrated over to a UnsafeNonNullPointer. It’s the same big find and replace operation, but I think that we are at the point where we can create abstractions on top of UnsafePointer to remove most of its uses, especially inside of MAX where compile-time information is much more plentiful. Migrating in this direction also means that the the migration can be gradual, instead of requiring a batch find and replace to make everything an UnsafeNullablePointer, then carefully auditing the code to determine where null the pointers are actually null safe. I personally see much of this issue as stemming from the lack of other options, since UnsafePointer acts as a type which holds far too many APIs together, and to me the desire for alternative types that share 90% of their API with the current UnsafePointer means that a trait should be created.

Alternatives Considered

Keep UnsafePointer nullable and add a separate non-null pointer type

This keeps the current behavior but makes the safer choice less obvious. The
default type tends to be used most often. If UnsafePointer remains nullable,
the unsafe behavior is the path of least resistance. This is the approach Rust
took with NonNull<T>, and in practice most Rust code uses raw nullable
pointers rather than opting in to the safer wrapper.

I think part of the problem is that Rust has something that looks like what C and C++ programmers are familiar with. I think not having that alongside making it more characters to name the nullable one (UnsafeNullablePointer vs UnsafeNonNullPointer) should help push people towards the latter. If we want to be really annoying and push people to build safe wrappers, make constructing an UnsafeNullablePointer emit a compiler warning you have to suppress.

What if we had something like this?

trait PointerLike:
    comptime Nullable: Bool
    comptime PointeeType: AnyType
    comptime origin: Origin # Is this strictly necessary here?
    comptime address_space: AddressSpace

    # AddressSpace should be an extension trait since most uses of pointers are asking for "Pointer I can dereference right now"

    def __getitem__(self) -> ref[Self.origin, Self.address_space] Self.PointeeType: ...
    def __setitem__(mut self, var value: Self.PointeeType) where Self.origin.mut: ...

    # Whatever else we consider "essential primitive pointer operations" and then extension traits for everything else. 

# Strictly like a C or C++ pointer, subject to generics of course. Also calls out it is nullable to remind people to check it. Deeply unsafe and should be converted to something else ASAP. Every time we find a property which it is technically possible to violate in C, this type gets more unsafe. If someone must use this for something other than building other pointer types, it means there's a language issue.
struct UnsafeNullablePointer[mut: Bool, //, type: AnyType, origin: Origin[mut=mut], *, address_space: AddressSpace = AddressSpace.GENERIC](PointerLike) : ... 

# Same behavior as the proposed new behavior of `UnsafePointer`, use still discouraged in favor of using references or even higher level constructs. 
struct UnsafeNonNullPointer[mut: Bool, //, type: AnyType, origin: Origin[mut=mut], *, address_space: AddressSpace = AddressSpace.GENERIC](PointerLike):
    comptime _InnerPtrType: UnsafeNullablePointer[Self.type, Self.origin, address_space=Self.address_space]

    var _inner: _InnerPtrType

# This one gives the compiler extra information to do help get codegen for `(V)MOVAPS` or similar. 
struct UnsafeNonNullAlignedPointer[mut: Bool, //, type: AnyType, origin: Origin[mut=mut], alignment: UInt, *, address_space: AddressSpace = AddressSpace.GENERIC](PointerLike):
    comptime _InnerPtrType: UnsafeNonNullPointer[Self.type, Self.origin, address_space=Self.address_space]

    var _inner: _InnerPtrType

# Custom allocator handling without polluting the default pointer type. 
struct UnsafeNonNullCustomAllocatorPointer[mut: Bool, //, type: AnyType, origin: Origin[mut=mut], alignment: UInt, FreeHandleType: AllocatorFreeHandle, *, address_space: AddressSpace = AddressSpace.GENERIC](PointerLike):
    comptime _InnerPtrType: UnsafeNonNullPointer[Self.type, Self.origin, address_space=Self.address_space]

    var _inner: _InnerPtrType

    ...

    fn free(deinit self, mut free_handle: FreeHandleType): ...
        
# and also rename the current `Pointer` back to `Reference` since the attempt to get people to use that instead of `UnsafePointer` has clearly failed.

Introduce a dedicated NullablePointer type

This would work, but it adds another pointer type to the language for a problem
that Optional already solves. Optional[UnsafePointer[T]] is simpler and
more general.

If we rely on Optional[UnsafePointer[T]], does that mean that kgen.pointer will also inherit non-nullability? I don’t think I like pushing such an important part of representing the system into the compiler.

1 Like