Accelerator-side code and stability

As we approach 1.0, I think it needs to be determined what stdlib code is supported on accelerators and what is not, or at least a framework created for discussing this for a particular stdlib feature.

For instance, List’s UnsafePointer uses the generic address space, which means that GPU-side code which wishes to use List can’t place it in shared memory. The same is true of most other stdlib collections. Span also has this address space issue.

GDS and other Magnum IO libraries from NVIDIA mean that users can have a reasonable expectation that std.io.file.open works on a properly configured GPU. print works on many targets, but I expect that potential future targets which are less CPU-like (ex: AMD XDNA) may make implementing it tricky at best. What does stack_allocation mean on a dataflow chip like Groq? What does getenv and setenv do on an accelerator? fstat? Almost the entirety of os? pwd? subprocess? Some parts of pathlib.Path are perfectly fine, others run into the same problems File does. Can I Python.import("max") on a GPU? How can I interact with asyncrt on a GPU?

Especially as parts of the stdlib start to get marked stable, I think there needs to be consideration and documentation of what stable means for different targets. This also applies to MAX kernels, where different kernels are at very different levels of readiness on different targets. Have there been any internal discussions about how to handle this since the last public discussions of stabilizing the stdlib?

Besides documenting, is there a need to formalize this somehow?

I could imagine a framework with trait definitions for common platform archetypes

trait GenericCPU(Platform)
comptime SUPPORTS_SHARED_LIST_ADDRESS_SPACE: Boolean = True

trait NVidiaxxx(Platform)
comptime SUPPORTS_SHARED_LIST_ADDRESS_SPACE: Boolean = False

trait AMD_XDNA(Platform)