hey looking at GPU puzzles, LeetGPU and tensara
These are great
think these should be simple
but spinning a bit on how to handle the shared memory
doing
var shared_kernel = tbdtype.row_majorkernel_size.shared().alloc()
var shared_input = tbdtype.row_majorinput_size.shared().alloc()
but missing something on how to handle the parameterized functions
solve gives us input_size: Int32, kernel_size: Int32
seems simple but I am missing something. seems like I need to convert to Int? or how do we handle the tb shared alloc?
I did a brute force which passed the initial tests but fails submit so hoping to fix up like the GPU puzzles shared memory approach
on leetgpu brute force passed for me today and got a shared memory version working.
Keeping it basic with just stack allocation shared memory worked fine. I did create an alias for the kernel size and curious how we can improve more.