Copying data using the FPU | ASM/Pentium+FPU |
; ; copying data using the fpu ; ; input: ; esi = source ; edi = destination ; ecx = number of 16-byte chunks to move ; ; output: ; none (data from esi is copied to edi) ; ; destorys: ; esi, edi, ecx ; flags, fp flags ; topofloop: fild qword ptr [esi] fild qword ptr [esi+8] fxch fistp qword ptr [edi] fistp qword ptr [edi+8] add esi,16 add edi,16 dec ecx jnz topofloopThe loop is optimal on (a fast) Pentium when both the source and destination are aligned on 64-bit boundaries and the destination is not in the cache. (Additionally the loop can be optimal on PPro if the destination does not permit write-combining.)
REP MOVSD
will be faster.REP MOVSD
, because it does half as many writes to external memory (with the noted exceptions). External memory is usually very slow compared to the execution time of the loop. Consequently after a few iterations of the loop the write buffers of the CPU become filled and subsequent iterations of the loop will execute at the speed of external memory. For small memory blocks you should use a simple DWORD
copy loop, because the overhead of the FPU copy loop is much higher than that of most other memory copy loops.FLD/FSTP
instead of FILD/FISTP
. Unfortunately FLD/FSTP
would not work very well, because all 64-bit values are not normal floating point values. The handling of denormal floating point numbers is very slow.FLD/FSTP
copying slow, but it will still be functionally correct. But, if the data represents an SNAN (see notes), it will be quietly converted to a QNAN (see notes) if IE
is masked (CW.IM = 1
), or you will get an exception if IE
is unmasked (CW.IM = 0
).FLD/FSTP
for memory copy loops.For related information see Agner Fog's Pentium optimization manual (you can find it at http://www.agner.org/assem and Intel's Pentium Pro developer's manual volume 3 for information on write buffers, caches, write-combining etc... (it can be found at Intel's developer WWW site).
notes:
SNANs are all the numbers where bits <62:52> = 7FFh
, and bit <51> = 0
and bits <50:0> !=0
. An SNAN is converted to a QNAN by setting bit<51>.
Denormals are numbers when the exponent field has all bit set to 0
and the mantissa is non-zero. Or in the copy process the bits 62-52
(exponent field) of each aligned 64-bit entitiy is zero.