About the XMM SSE floating point instructions
These are the SSE (streaming SIMD extensions) floating
point instructions which use the 128-bit XMM registers. The SSE instructions
handle single-precision (32-bit) floating point values. Support for SSE
is found in the Intel Pentium III processors and above. See also
XMM SSE2 floating point instructions which
handle double-precision (64-bit) floating point values.
Before using these instructions in your code you need to check if they
are available on the processor which is running your program. This is done
by calling CPUID having set EAX=1. Then test bit 25 of EDX. The bit will be
set if the SSE instructions can be used.
In the tests the following data declarations are used:-
SINGLEFP1 DD 1.1
DD 2.2
DD 3.3
DD 4.4
SINGLEFP2 DD 10.66
DD 20.66
DD 30.66
DD 40.66
SINGLEFPN DD -1.1
DD +2.2
DD +3.3
DD -4.4
DINTEGER DD 23,24
DRESULT DD 0
Since it is possible that the labels which point to the floating point
values may not be on a 16-bit boundary, the MOVUPS instruction must be used
to transfer the data from memory into an XMM register. MOVUPS (move four
unaligned packed single-precision) does not care about alignment. If you
specify ALIGN 16 immediately before the relevant data declaration, however,
then the assembler will make sure the data is on a 16-byte boundary and the
faster MOVAPS (move four aligned packed single-precision) can be used instead.
If you get this wrong your program will cause an exception. See
more about this (in the case of MOVDQA
and MOVDQU which work in the same way). When transferring from between
registers either MOVUPS or MOVAPS may be used.
The instructions we are looking at here tend largely to be two types.
The first type of instruction deals with packed floating point numbers. These
instructions have "PS" in their mnemonic name, referring to "packed single-precision"
and they work on more than one single-precision (32-bit) floating point value
at once. The second type of instruction deals with just one floating point
value. These instructions have "SS" in their mnemonic name referring to
"scalar single-precision". They work on the lowest part of the XMM register
only, that is to say the first 32 bits of the register (bits 0 to 31).
To watch these tests properly you need to set the appropriate breakpoint,
start the test and then single step through the instructions. You can then
watch how they change the XMM registers. Using GoBug you can make the XMM
registers appear in their floating point SSE format using the
appropriate button on the toolbar.
Data movement instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Shuffle and unpack instructions
Conversion instructions
SSE Data movement instructions
This demonstrates moving data into the registers and
between the registers. MOVUPS and MOVAPS (aligned version), MOVSS, MOVLPS
and MOVHPS can also be
used to get values out and into memory. MOVMSKPS can be used after a
comparison instruction to get the result of the compare
into eax for analysis.
The breakpoint is XMMSSE_FPDATA:-
XMMSSE_FPDATA:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L20 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L20:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move four fp values into XMM0
MOVAPS XMM1,XMM0 ;copy to XMM1
MOVHPS XMM3,[SINGLEFP1] ;move two fp values into XMM3 (high)
MOVLPS XMM3,[SINGLEFP1] ;move two fp values into XMM3 (low)
MOVLHPS XMM4,XMM0 ;move two fp values low to high
MOVHLPS XMM4,XMM0 ;move two fp values high to low
MOVSS XMM5,[SINGLEFP1] ;move one fp value into XMM5 (lowest)
MOVSS XMM6,XMM0 ;move one fp value into XMM6 (lowest)
MOVUPS XMM0,[SINGLEFPN] ;move two -ve, two +ve values into XMM0
MOVMSKPS EAX,XMM0 ;get all sign bits in XMM0 into eax
RET
SSE Arithmetic instrunctions
This demonstrated the arithmetic instructions which can
work in the XMM registers using single-precision (32-bit) numbers.
The breakpoint is XMMSSE_FPARITH:-
XMMSSE_FPARITH:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L22 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L22:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0 ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1 ;copying to XMM3
ADDPS XMM0,XMM1 ;add all fp values result in XMM0
MOVAPS XMM0,XMM2 ;restore value in XMM0
SUBPS XMM0,XMM1 ;subtract all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
ADDSS XMM0,XMM1 ;add lowest fp value result in XMM0
SUBSS XMM0,XMM1 ;subtract lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MULPS XMM0,XMM1 ;multiply all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MULSS XMM0,XMM1 ;multiply lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
DIVPS XMM0,XMM1 ;divide all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
DIVSS XMM0,XMM1 ;divide lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
RCPPS XMM0,XMM1 ;get reciprocals of all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
RCPSS XMM0,XMM1 ;get reciprocal of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
SQRTPS XMM0,XMM1 ;get square roots of all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
SQRTSS XMM0,XMM1 ;get square root of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
RSQRTPS XMM0,XMM1 ;get reciprocals of square roots of all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
RSQRTSS XMM0,XMM1 ;get square root of lowest fp value result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MAXPS XMM0,XMM1 ;get numerically greater fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MAXSS XMM0,XMM1 ;get numerically greater of low fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MINPS XMM0,XMM1 ;get numerically smaller fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
MINSS XMM0,XMM1 ;get numerically smaller of low fp values result in XMM0
RET
SSE Logical instructions
This demonstrates the logical instructions which can
work in the XMM registers using single-precision (32-bit) numbers.
The breakpoint is XMMSSE_FPLOGIC:-
XMMSSE_FPLOGIC:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L24 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L24:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0 ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1 ;copying to XMM3
ANDPS XMM0,XMM1 ;perform AND on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
ANDNPS XMM0,XMM1 ;perform AND NOT on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
ORPS XMM0,XMM1 ;perform OR on all fp values result in XMM0
;*******
MOVAPS XMM0,XMM2 ;restore value in XMM0
XORPS XMM0,XMM1 ;perform XOR on all fp values result in XMM0
RET
SSE Comparison instructions
This demonstrates the comparison instructions which can
work in the XMM registers using single-precision (32-bit) numbers.
You tell CMPPS and CMPSS what to do by specifying an immediate value in the
third operand. It is not easy to remember what value does what, so some
assemblers (including GoAsm) also provide psuedo mnemonics in the form
recommended by Intel (given here in the comment). Somewhat easier to use,
because they use the ordinary flags are COMISS and UCOMISS although they
only work on one floating point value in the XMM register (contained in
bits 0-31).
The breakpoint is XMMSSE_FPCOMP:-
XMMSSE_FPCOMP:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L26 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L26:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVSS XMM0,XMM1 ;make lowest of XMM0 and XMM1 the same
MOVAPS XMM2,XMM0 ;copying to XMM2
MOVAPS XMM3,XMM1 ;copying to XMM3
;********************* compare instructions working on all four fp values
CMPPS XMM0,XMM1,0 ;=CMPEQPS see whether equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,1 ;=CMPLTPS see whether less than, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,2 ;=CMPLEPS see whether less than or equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,3 ;=CMPUNORDPS see unordered, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,4 ;=CMPNEQPS see whether not equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,5 ;=CMPNLTPS see whether not less than, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,6 ;=CMPNLEPS see whether not less than or equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPPS XMM0,XMM1,7 ;=CMPORDPS see whether ordered, result in XMM0
;********************* compare instructions working on lowest only
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,0 ;=CMPEQPS see whether equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,1 ;=CMPLTPS see whether less than, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,2 ;=CMPLEPS see whether less than or equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,3 ;=CMPUNORDPS see unordered, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,4 ;=CMPNEQPS see whether not equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,5 ;=CMPNLTPS see whether not less than, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,6 ;=CMPNLEPS see whether not less than or equal, result in XMM0
MOVAPS XMM0,XMM2 ;restore original value to XMM0
CMPSS XMM0,XMM1,7 ;=CMPORDPS see whether ordered, result in XMM0
;********************* compare and give result in eflags
MOVAPS XMM0,XMM2 ;restore original value to XMM0
COMISS XMM0,XMM1 ;look at lowest only result in eflags
UCOMISS XMM0,XMM1 ;(unordered compare)
MOVUPS XMM0,[SINGLEFPN] ;move two -ve, two +ve values into XMM0
COMISS XMM0,XMM1 ;look at lowest only - result in eflags
UCOMISS XMM0,XMM1 ;(unordered compare)
RET
SSE Shuffle and unpack instructions
With these instructions you can move the single-precision
(32-bit) floating point values around the XMM registers.
The breakpoint is XMMSSE_SHUFF:-
XMMSSE_SHUFF:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L28 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L28:
;***** display XMM registers in SSE mode ..
MOVUPS XMM0,[SINGLEFP1] ;move 1st tester fp values into XMM0
MOVAPS XMM2,XMM0 ;copying to XMM2
MOVUPS XMM1,[SINGLEFP2] ;move 2nd tester fp values into XMM1
MOVAPS XMM3,XMM1 ;copying to XMM3
SHUFPS XMM0,XMM1,33h ;shuffle pack into destination
SHUFPS XMM0,XMM0,33h ;shuffle pack in destination
MOVAPS XMM0,XMM2 ;restore original value to XMM0
UNPCKHPS XMM0,XMM1 ;unpack (high) and put into destination
MOVAPS XMM0,XMM2 ;restore original value to XMM0
UNPCKLPS XMM0,XMM0 ;unpack (low) and put into destination
RET
SSE Conversion instructions
The instructions convert dword integers into single-precision
(32-bit) floating point values and vice versa. They should be read together
with the SSE2 conversion instructions.
The breakpoint is XMMSSE_CONV:-
XMMSSE_CONV:
MOV EAX,1 ;request CPU feature flags
CPUID ;0Fh, 0A2h CPUID instruction
TEST EDX,2000000h ;test bit 25 (SSE)
JNZ >L30 ;SSE available
CALL NOSSEFPMESS ;displays message if SSE not available
RET
L30:
;***** display XMM registers in SSE mode ..
CVTPI2PS XMM0,[DINTEGER] ;convert 23 and 24 to single-precision fp values
CVTSI2SS XMM1,[DINTEGER] ;convert 23 only to single-precision fp value
;***** display also MMX registers in dword integer mode ..
CVTPS2PI MM0,XMM0 ;convert 23 and 24 back again from XMM0 into MM0
CVTTPS2PI MM1,XMM0 ;same as above but with truncation
CVTSS2SI EAX,XMM1 ;convert 23 back again from XMM1 into EAX
CVTTSS2SI EDX,XMM1 ;same as above but with truncation
RET