Async Streaming from Device to Host

Im looking for help on how to stream out data from Device to Host asynchronously. Specifically, every n iterations in my GPU kernel, I want to copy a vector from Decive to Host. I found DeviceStream, but that only seems to be intended for calling it from the Host.