> There are exotic CPU architectures explicitly designed for latency hiding, mostly used in supercomputing.
I don’t know much about supercomputers, but what you described is precisely how all modern GPUs deal with VRAM latency. Each core runs multiple threads, the count is bound by resources used: the more registers and group shared memory a shader uses, the less threads of the shader can be scheduled on the same core. The GPU then switches threads instead of waiting for that latency.
That’s how GPUs can saturate their RAM bandwidth, which exceeds 500 GB/second in modern high-end GPUs.
I don’t know much about supercomputers, but what you described is precisely how all modern GPUs deal with VRAM latency. Each core runs multiple threads, the count is bound by resources used: the more registers and group shared memory a shader uses, the less threads of the shader can be scheduled on the same core. The GPU then switches threads instead of waiting for that latency.
That’s how GPUs can saturate their RAM bandwidth, which exceeds 500 GB/second in modern high-end GPUs.