Extended Description
All current AMD CPUs capable of AVX 'double pump' 256-bit vector operations through 128-bit vector ALUs.
On Jaguar this means while the latency is typically (but not always) the same for a xmm and ymm variants, the throughput is half what is predicted by the scheduler/cost-models. This needs to be accounted for.