Im looking for help on how to stream out data from Device to Host asynchronously. Specifically, every n iterations in my GPU kernel, I want to copy a vector from Decive to Host. I found DeviceStream
, but that only seems to be intended for calling it from the Host.