New `UntypedPointer` for heterogeneous data (idea, proposal)

Hi all,
this is a little idea/proposal for some kind of “UntypedPointer” to use with heterogeneous data.

(Disclaimer: I am new to low level memory management but sometimes this can bring some fresh new ideas, I hope :D)

Intro

What makes mojo great?

  • great abstractions over low level concepts (gpu)
  • great performance
  • great ergonomics
  • great type system
  • familiar pythonic syntax

When working with heterogeneous data, the current UnsafePointer is a “good” but not “great” abstraction.

I think there is room for improvement to make low level memory management much more approachable and ergonomic, especially for newcomers.
My impression is, that lower level memory management is often avoided despite interesting use cases, because the api is “hard to use” and not “fun” (yet)!

Mojo has soooo much potential and making high level and low level coding fun and ergonomic is a great way to attract more people to the language and an absolute game changer! :nerd_face:

Pointers

Pointer (homogeneous data)

When working with homogeneous data (array, list), we need:

  • memory location: Where are we?
  • type: What is the type? The type is the SAME for all elements

Pointer (heterogeneous data)

When working with heterogeneous data (png, apache arrow, etc.), we need:

  • memory location: Where are we?

IMO having a type is not ideal, because the type is not the same for all elements.
Instead, we could specify the type when we read from the pointer, which would allow us to use the same pointer for different types.

Example 1 (from UnsafePointer documentation)

From the docs:

def read_chunks(var ptr: UnsafePointer[UInt8]) -> List[List[UInt32]]:
    chunks = List[List[UInt32]]()
    # A chunk size of 0 indicates the end of the data
    chunk_size = Int(ptr[])
    while (chunk_size > 0):
        # Skip the 1 byte chunk_size and get a pointer to the first
        # UInt32 in the chunk
        ui32_ptr = (ptr + 1).bitcast[UInt32]()
        chunk = List[UInt32](capacity=chunk_size)
        for i in range(chunk_size):
            chunk.append(ui32_ptr[i])
        chunks.append(chunk)
        # Move our pointer to the next byte after the current chunk
        ptr += (1 + 4 * chunk_size)
        # Read the size of the next chunk
        chunk_size = Int(ptr[])
    return chunks

Problem with this approach:

  • :cross_mark: multiple pointers (each type requires its own pointer)
  • :cross_mark: multiple memory locations
  • :cross_mark: bitcasting required
  • :cross_mark: pointer arithmetic is not very ergonomic

The proposed solution would look like this:

def read_chunks2(var ptr: UntypedPointer) -> List[List[UInt32]]:
    chunks = List[List[UInt32]]()
    
    chunk_size = ptr.read[Int, move=True]()  # "move" will move the pointer to the next memory location
    while (chunk_size > 0):
        chunk = List[UInt32](capacity=chunk_size)

        for _ in range(chunk_size):
            chunk.append(ptr.read[UInt32, move=True]())
        
        chunks.append(chunk)

        chunk_size = ptr.read[Int, move=True]()
    return chunks

Benefits of this approach:

  • :white_check_mark: single pointer for all types
  • :white_check_mark: single memory location
  • :white_check_mark: easier to use, read, and understand

Example 2 (many types)

This is just a contrived example to show the difference between the current UnsafePointer and the proposed UntypedPointer.

p = UnsafePointer(...)

p_int8 = p.bitcast[Int8]()
value_int8 = p_int8[]
p_int8 += 1

p_int16 = p_int8.bitcast[Int16]()
value_int16 = p_int16[]
p_int16 += 1

p_int32 = p_int16.bitcast[Int32]()
value_int32 = p_int32[]
p_int32 += 1

Problem:

  • :cross_mark: multiple pointers (each type requires its own pointer)
  • :cross_mark: multiple memory locations

Proposed solution:

p = UntypedPointer(...)
value_int8 = p.read[Int8, move=True]()
value_int16 = p.read[Int16, move=True]()
value_int32 = p.read[Int32, move=True]()

Benefits:

  • :white_check_mark: single pointer for all types
  • :white_check_mark: single memory location
  • :white_check_mark: easier to use, read, and understand

API Design and Comparison

Reading from a pointer

UnsafePointer:

p_int8 = p.bitcast[Int8]()
_ = p[0]

Problem:

  • :cross_mark: no autocompletion: Indexing data structures usuallly does not provide autocompletion
  • :cross_mark: no “documentation” (hover text): Indexing data structures usually does not provide “documentation” (hover text)
  • :cross_mark: potentially multiple pointers: Each type requires its own pointer

Proposed:

p.read[Int8]()

Benefits:

  • :white_check_mark: easier to use
  • :white_check_mark: autocompletion: p.<TAB> shows all available functions including read
  • :white_check_mark: documentation: IDE will provide documentation (hover text) for read

“Moving” a pointer (pointer arithmetic)

UnsafePointer:

p_int8 = p.bitcast[Int8]()
p_int8 += 1
p_int16 = p_int8.bitcast[Int16]()
p_int16 += 4

Problem:

  • :cross_mark: no nice abstraction
  • :cross_mark: required bitcasting
  • :cross_mark: no autocompletion or documentation

Proposed:

p.move[Int8](amount=1)
p.move[Int16](amount=4)

Benefits:

  • :white_check_mark: easier to use
  • :white_check_mark: autocompletion: p.move.<TAB> shows all available functions including move
  • :white_check_mark: documentation: IDE will provide documentation (hover text) for move

Conclusion

Making low level memory management more “fun” would be so great!
The proposed ideas would have the main benefit of:

:star_struck: Marking lower level memory management “fun” and accessible (just like gpu programming in mojo)

Would love to hear your thoughts on this idea!

Thanks for reading! :folded_hands: