Exploiting Register Structure
Intel 32-bit architecture has 8 additional 64-bit registers
called MMX and 8 128-bit registers called XMM.
Can load 4 single precision floating point numbers or 2
double precision floating point numbers.
A single operation like
   add xmm1 xmm2 xmm1
Will simultaneously add the numbers in xmm1 to xmm2
and store it in xmm2.
Can give in principle spped up by 4 for single precision and
2 for double precision.