Hi @bertaveira,
Sorry if I’ve added to the confusion here. There is more than one issue mentioned in this thread, so it’s a little confusing to keep track.
I guess I will keep coming back every few weeks and trying to see if any changes have been made to mojo to make this possible at all. Let me know if there is a way forward at some point. Alternative solutions like having layouts parametersized or fixed sizes are just workarounds not really tackling the issue exposed here.
The way Mojo handles the example add_const function is not great at the moment. We’re working on improving ctx.enqueue_function so that it should produce a type error, rather than failing silently at runtime. It is unlikely we make any changes in the near term that allow add_const to be used like that with ctx.enqueue_function.
That was my reasoning for closing the associated GitHub ticket. I’m happy to re-open if you feel this is a bug on our side.
Also just saw the related issue ticket on GitHub was closed but this is not fixed at all right? I don’t think this workaround which has drastically different implications merits closing the issue
I did not realize the use of UNKNOWN_VALUE was a must have for your example. It is definitely possible to keep that structure and the kernel use an unknown layout. The best way to do that would be to encode that constraint on the kernel registration itself, rather than just the function passed to ctx.enqueue_function.
from compiler_internal import StaticTensorSpec
@compiler.register("add_const")
struct AddConst:
@staticmethod
fn execute[
target: StaticString,
](
# Outputs
result: OutputTensor[
static_spec=StaticTensorSpec[DType.float32, 1].create_unknown(),
],
# Inputs
x: InputTensor[
static_spec=StaticTensorSpec[DType.float32, 1].create_unknown(),
],
# Context
ctx: DeviceContextPtr,
) raises:
# Rest of the code stays the same
Fair warning, you may need to update your modular dependency. The code above was tripping a different bug in the compiler that has since been fixed.
This could be made easier if we had some convenience functions to ‘erase’ the static layout of LayoutTensor or ManagedTensorSlice, but we don’t have those at the moment. Erasing the layout on the InputTensor and OutputTensor arguments will ensure that no code gets specialized on layouts, rather than just the GPU kernel.