Streaming API with async/.await
https://github.com/rust-accel/accel/issues/65
Wrap [CUDA steraming API](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html) using async/await
TODO
-----
- [x] kernel launch !88
- once introduced in !52, removed in !69
- [x] memcpy !85
- [ ] memset
Problems
-------
How to handle memories used in streams?
```rust
let a = vec![1,2,3];
let mem = DeviceMemory::zeros(3);
mem.copy_from_stream(&a, &mut stream); // will wait until previous jobs in streams
drop(a); // a drops before memcpy starts?
```
from https://gitlab.com/termoshtt/accel/-/issues/41#note_331562896
issue