@uecker @tauon @mntmn So when you communicate, the reason that's bad is partially due to the bandwidth, but mostly due to the huge delays you incur in order to wait for execution to complete on the GPU, and then to start back up again once you produce more GPU commands. Readback at a fixed latency which exceeds that queuing delay is totally fine, and bandwidth-wise you can yeet something like 20 4k uncompressed images across the pcie link per 60Hz frame, if you really really wanted to.