Unions vs sum types

Some people, myself included, are concerned with efficiently representing data, while others are more concerned with type theory. To ensure that no important facts are overlooked, I think GitHub is probably a better platform for discussing this topic.
Why do you want to write less flexible and efficient code?
From a technical standpoint this is not required.

var optional: InlineArray[UInt8, 10_000]
if optional is .some(var optional_some): 
    # the compiler inserts a memcpy here
    # this is a reference to a new stack allocated array 
    if is_common_case_read(optional_some):
        return 
    # for mutable references, you don't have access to the origin allocation
    handle_some_edge_case_mut(optional_some)
       
    
    do_something_with_var(optional_some^) #let's not memcpy again
if optiona is .some(ref optional_some);
if optional is .some(optional_some): # read

The problem is, if you bind the optional to a new identifier or whatever this cool pattern matching is called you have decide the binding mode. You only really have 3 choices, read, mut and var.

How you plan to address this serious limitation.

I don’t see how this relates to sum types at all. Suppose your val is a struct with a single field var arr: InlineArray[UInt8, 10_000], we should be able to find a ?? (ref I suppose?) where the following works:

?? v = val.arr
if is_common_case_read(v):
  return 
handle_some_edge_case_mut(v)
do_something_with_var(v^)

If not, I don’t see how your encoding can solve anything. For instance, your some returns an owned value, which will copy?

In the mojo enum model , there is no notion of a sum type.

Enums are structs with 2 fields that fulfill the enum interface.

A NanTagged JavaScript Value ( 8 Bytes)
which has a lot of niche optimizations could fulfill the enum interface.

_union: UnionType
_tag: IntType
which have compile time checked properties
for field in field_reflection(self_union):
    comptime_eval(f"""
        @overloaded_property
        fn {field.name}(self) -> {field.type}:
            constrained[union_access_is_safe(self
             {field.tag}), "field .{field.name} is not compile time to access"]()
            return self._union.{field.name}

  """
    )

Some people, myself included, are concerned with efficiently representing data

I don’t think we’ve talked a lot, but I have a bit of a reputation as the person off doing weird high performance things in the corner. I very much do care about efficiently representing data, and not forcing the existence of a tag is part of that. If I can find some way to use states that shouldn’t be present in the type to represent the None case of optional or similar other things, then I should be able to do that without needing to stick an extra byte on the front of the type. Sum types also give the compiler more freedom as far as rearranging fields to make the layout more compact.

The problem is, if you bind the optional to a new identifier or whatever this cool pattern matching is called you have decide the binding mode. You only really have 3 choices, read, mut and var.

You don’t necessarily need to memcpy there. The compiler can see that you are just inspecting the value in place and you can thus avoid actually transfering the memory of the type until you have do_something_with_var, at which point the compiler will still try fairly hard to just hand over a reference. The compiler has no reason to memcpy until you force it to, so why would it bother?

In the mojo enum model , there is no notion of a sum type.

That’s because Mojo doesn’t have a model of enums yet. It currently has structs with a big pile of associated aliases.

Enums are structs with 2 fields that fulfill the enum interface.

There is no enum interface. As I’ve said before, niche optimizations let you play games with layouts, so there don’t have to be 2 fields. Most of the rest of that post doesn’t make any sense because you haven’t defined what all of that stuff does. Please avoid reaching for reflection or features that don’t exist when discussing other features.

1 Like

Unfortunately, I have too many discussions to answer them all in two weeks.
Do you consider the union approach to be more Pythonic and idiomatic than full-blown sum types?

def next(mut self) raises StopIteration | OutOfMemory -> Self.Element

I think we need to actually define things. I propose the following:

  • Union: C-style unsafe union
  • Sum type: A tagged union in the style of Rust’s enum.

What you’re proposing here is something like an ad-hoc sum type, similar to Zig’s “error unions”, which are actually sum types. The reason I’m not a fan of them unless they can be inferred is because it makes it hard to name the type that is raised by next, which means it makes generic programming more difficult. You can replicate that text, but it causes a lot of code churn when you add a new error type compared to adding a new variant to a sum type.

That’s a very unpopular definition nowadays. In Python and TypeScript, which are more popular than C, a “union” of two types is written T1|T2 and is tagged by the nature of dynamic typing, and type checkers will verify that you check the type before attempting to access a field.

Unions Reimagined for Mojo

In Python, Union refers to typing.Union, a class that implements __getitem__. Since Python 3.10, it also implements __or__ and __ror__ to support the bitwise OR syntax (e.g., int | str).

In Mojo, Union would likely be implemented differently. It would essentially be a generic variadic type that implements specific union/tag built-ins or traits.

Syntax and Implementation

# Typeless Unions | Union or Type Expressions
comptime Optional[T] = Union[T, None]

# --- Named Union ---
union NamedUnion(StopIteration):
    var stop_iteration: StopIteration

# --- Anonymous Union ---
comptime AnonUnion = Union[NamedUnion, StopIteration] # Equivalent to NamedUnion | StopIteration

# --- The Union Type ---
struct Union[*unions: __mlir_type.`!kgen.uniontrait`](*unions): ...

union Optional[T]:
    var bitunion: BitOptional[T]
    var tag: __mlir_type.`!kgen.tag`

bitunion Optional[T]:
    var none 
    var some: Self.T

Why not Rust-style Enums?

I considered sum types (tagged unions like Rust’s enum), but the proposed Union approach is superior for Mojo, as previously discussed and reiterated for clarity.

  • Ergonomics: Rust enums require explicit unwrapping and pattern matching. This is rigid compared to Python’s flow-sensitive typing where isinstance checks feel natural.
  • Borrow Checker: Tagged enums often introduce complex edge cases with the borrow checker that can be frustrating to navigate.
  • Familiarity: Rust-style enums are alien to the Python object model. Our goal is to preserve the straightforward nature of Python typing while offering state-of-the-art performance.

How will this work with the type system?

Type systems are a lie. It is entirely possible to implement this behavior within the compiler infrastructure; the constraints are merely those we choose to enforce.

Why not a ‘bitunion’ (or C-style union) keyword?

I consider “bitunions” (or any equivalent C-style union) to be essentially syntactic sugar for bitcasts. Consequently, I am strongly convinced that they should be implemented via a library decorator.

They are inherently unsafe and represent a niche performance optimization that could usually be achieved through safe wrappers.

For instance, the compiler cannot synthesize the destructor because the tag is not part of the type.

@tagless_union
union _Optional[T]:
    var none 
    var some: Self.T

Further Discussion

I am more than happy to address any other concerns, objections, or questions that I may have missed during this discussion.

I would distinguish “type union” and union. I assign enum to the checked version because if we use union for sum types then what remains for c-style unions?