Codebase list fasm / 091f94a
New upstream version 1.71.60 Tomasz Buchert 7 years ago
7 changed file(s) with 548 addition(s) and 428 deletion(s). Raw diff Collapse all Expand all
Binary diff not shown
+531
-420
fasm.txt less more
00
1 ,'''
2 ,,;,, ,,,, ,,,,, ,,, ,,
3 ; ; ; ; ; ;
4 ; ,''''; '''', ; ; ;
5 ; ',,,,;, ,,,,,' ; ; ;
6
7 flat assembler 1.71
8 Programmer's Manual
1 ,'''
2 ,,;,, ,,,, ,,,,, ,,, ,,
3 ; ; ; ; ; ;
4 ; ,''''; '''', ; ; ;
5 ; ',,,,;, ,,,,,' ; ; ;
6
7 flat assembler 1.71
8 Programmer's Manual
99
1010
1111 Table of contents
1313
1414 Chapter 1 Introduction
1515
16 1.1 Compiler overview
17 1.1.1 System requirements
18 1.1.2 Executing compiler from command line
19 1.1.3 Compiler messages
20 1.1.4 Output formats
21
22 1.2 Assembly syntax
23 1.2.1 Instruction syntax
24 1.2.2 Data definitions
25 1.2.3 Constants and labels
26 1.2.4 Numerical expressions
27 1.2.5 Jumps and calls
28 1.2.6 Size settings
16 1.1 Compiler overview
17 1.1.1 System requirements
18 1.1.2 Executing compiler from command line
19 1.1.3 Compiler messages
20 1.1.4 Output formats
21
22 1.2 Assembly syntax
23 1.2.1 Instruction syntax
24 1.2.2 Data definitions
25 1.2.3 Constants and labels
26 1.2.4 Numerical expressions
27 1.2.5 Jumps and calls
28 1.2.6 Size settings
2929
3030 Chapter 2 Instruction set
3131
32 2.1 The x86 architecture instructions
33 2.1.1 Data movement instructions
34 2.1.2 Type conversion instructions
35 2.1.3 Binary arithmetic instructions
36 2.1.4 Decimal arithmetic instructions
37 2.1.5 Logical instructions
38 2.1.6 Control transfer instructions
39 2.1.7 I/O instructions
40 2.1.8 Strings operations
41 2.1.9 Flag control instructions
42 2.1.10 Conditional operations
43 2.1.11 Miscellaneous instructions
44 2.1.12 System instructions
45 2.1.13 FPU instructions
46 2.1.14 MMX instructions
47 2.1.15 SSE instructions
48 2.1.16 SSE2 instructions
49 2.1.17 SSE3 instructions
50 2.1.18 AMD 3DNow! instructions
51 2.1.19 The x86-64 long mode instructions
52 2.1.20 SSE4 instructions
53 2.1.21 AVX instructions
54 2.1.22 AVX2 instructions
55 2.1.23 Auxiliary sets of computational instructions
56 2.1.24 AVX-512 instructions
57 2.1.25 Other extensions of instruction set
58
59 2.2 Control directives
60 2.2.1 Numerical constants
61 2.2.2 Conditional assembly
62 2.2.3 Repeating blocks of instructions
63 2.2.4 Addressing spaces
64 2.2.5 Other directives
65 2.2.6 Multiple passes
66
67 2.3 Preprocessor directives
68 2.3.1 Including source files
69 2.3.2 Symbolic constants
70 2.3.3 Macroinstructions
71 2.3.4 Structures
72 2.3.5 Repeating macroinstructions
73 2.3.6 Conditional preprocessing
74 2.3.7 Order of processing
75
76 2.4 Formatter directives
77 2.4.1 MZ executable
78 2.4.2 Portable Executable
79 2.4.3 Common Object File Format
80 2.4.4 Executable and Linkable Format
32 2.1 The x86 architecture instructions
33 2.1.1 Data movement instructions
34 2.1.2 Type conversion instructions
35 2.1.3 Binary arithmetic instructions
36 2.1.4 Decimal arithmetic instructions
37 2.1.5 Logical instructions
38 2.1.6 Control transfer instructions
39 2.1.7 I/O instructions
40 2.1.8 Strings operations
41 2.1.9 Flag control instructions
42 2.1.10 Conditional operations
43 2.1.11 Miscellaneous instructions
44 2.1.12 System instructions
45 2.1.13 FPU instructions
46 2.1.14 MMX instructions
47 2.1.15 SSE instructions
48 2.1.16 SSE2 instructions
49 2.1.17 SSE3 instructions
50 2.1.18 AMD 3DNow! instructions
51 2.1.19 The x86-64 long mode instructions
52 2.1.20 SSE4 instructions
53 2.1.21 AVX instructions
54 2.1.22 AVX2 instructions
55 2.1.23 Auxiliary sets of computational instructions
56 2.1.24 AVX-512 instructions
57 2.1.25 Other extensions of instruction set
58
59 2.2 Control directives
60 2.2.1 Numerical constants
61 2.2.2 Conditional assembly
62 2.2.3 Repeating blocks of instructions
63 2.2.4 Addressing spaces
64 2.2.5 Other directives
65 2.2.6 Multiple passes
66
67 2.3 Preprocessor directives
68 2.3.1 Including source files
69 2.3.2 Symbolic constants
70 2.3.3 Macroinstructions
71 2.3.4 Structures
72 2.3.5 Repeating macroinstructions
73 2.3.6 Conditional preprocessing
74 2.3.7 Order of processing
75
76 2.4 Formatter directives
77 2.4.1 MZ executable
78 2.4.2 Portable Executable
79 2.4.3 Common Object File Format
80 2.4.4 Executable and Linkable Format
8181
8282
8383
145145 destination file.
146146 The following is an example of the compilation summary:
147147
148 flat assembler version 1.70 (16384 kilobytes memory)
148 flat assembler version 1.70 (16384 kilobytes memory)
149149 38 passes, 5.3 seconds, 77824 bytes.
150150
151151 In case of error during the compilation process, the program will display an
152152 error message. For example, when compiler can't find the input file, it will
153153 display the following message:
154154
155 flat assembler version 1.70 (16384 kilobytes memory)
155 flat assembler version 1.70 (16384 kilobytes memory)
156156 error: source file not found.
157157
158158 If the error is connected with a specific part of source code, the source line
159159 that caused the error will be also displayed. Also placement of this line in
160160 the source is given to help you finding this error, for example:
161161
162 flat assembler version 1.70 (16384 kilobytes memory)
162 flat assembler version 1.70 (16384 kilobytes memory)
163163 example.asm [3]:
164 mob ax,1
164 mob ax,1
165165 error: illegal instruction.
166166
167167 It means that in the third line of the "example.asm" file compiler has
169169 contains a macroinstruction, also the line in macroinstruction definition
170170 that generated the erroneous instruction is displayed:
171171
172 flat assembler version 1.70 (16384 kilobytes memory)
172 flat assembler version 1.70 (16384 kilobytes memory)
173173 example.asm [6]:
174 stoschar 7
174 stoschar 7
175175 example.asm [3] stoschar [1]:
176 mob al,char
176 mob al,char
177177 error: illegal instruction.
178178
179179 It means that the macroinstruction in the sixth line of the "example.asm" file
258258 | xword | 128 | 16 |
259259 | qqword | 256 | 32 |
260260 | yword | 256 | 32 |
261 | dqqword | 512 | 64 |
262 | zword | 512 | 64 |
261263 \-------------------------/
262264
263265 Table 1.2 Registers
264266 /-----------------------------------------------------------------\
265 | Type | Bits | |
267 | Type | Bits | |
266268 |=========|======|================================================|
267 | | 8 | al cl dl bl ah ch dh bh |
268 | General | 16 | ax cx dx bx sp bp si di |
269 | | 32 | eax ecx edx ebx esp ebp esi edi |
269 | | 8 | al cl dl bl ah ch dh bh |
270 | General | 16 | ax cx dx bx sp bp si di |
271 | | 32 | eax ecx edx ebx esp ebp esi edi |
270272 |---------|------|------------------------------------------------|
271 | Segment | 16 | es cs ss ds fs gs |
273 | Segment | 16 | es cs ss ds fs gs |
272274 |---------|------|------------------------------------------------|
273 | Control | 32 | cr0 cr2 cr3 cr4 |
275 | Control | 32 | cr0 cr2 cr3 cr4 |
274276 |---------|------|------------------------------------------------|
275 | Debug | 32 | dr0 dr1 dr2 dr3 dr6 dr7 |
277 | Debug | 32 | dr0 dr1 dr2 dr3 dr6 dr7 |
276278 |---------|------|------------------------------------------------|
277 | FPU | 80 | st0 st1 st2 st3 st4 st5 st6 st7 |
279 | FPU | 80 | st0 st1 st2 st3 st4 st5 st6 st7 |
278280 |---------|------|------------------------------------------------|
279 | MMX | 64 | mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 |
281 | MMX | 64 | mm0 mm1 mm2 mm3 mm4 mm5 mm6 mm7 |
280282 |---------|------|------------------------------------------------|
281283 | SSE | 128 | xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 |
282284 |---------|------|------------------------------------------------|
283285 | AVX | 256 | ymm0 ymm1 ymm2 ymm3 ymm4 ymm5 ymm6 ymm7 |
286 |---------|------|------------------------------------------------|
287 | AVX-512 | 512 | zmm0 zmm1 zmm2 zmm3 zmm4 zmm5 zmm6 zmm7 |
288 |---------|------|------------------------------------------------|
289 | Opmask | 64 | k0 k1 k2 k3 k4 k5 k6 k7 |
290 |---------|------|------------------------------------------------|
291 | Bounds | 128 | bnd0 bnd1 bnd2 bnd3 |
284292 \-----------------------------------------------------------------/
285293
286294
331339 | Size | Define | Reserve |
332340 | (bytes) | data | data |
333341 |=========|========|=========|
334 | 1 | db | rb |
335 | | file | |
342 | 1 | db | rb |
343 | | file | |
336344 |---------|--------|---------|
337 | 2 | dw | rw |
338 | | du | |
345 | 2 | dw | rw |
346 | | du | |
339347 |---------|--------|---------|
340 | 4 | dd | rd |
348 | 4 | dd | rd |
341349 |---------|--------|---------|
342 | 6 | dp | rp |
343 | | df | rf |
350 | 6 | dp | rp |
351 | | df | rf |
344352 |---------|--------|---------|
345 | 8 | dq | rq |
353 | 8 | dq | rq |
346354 |---------|--------|---------|
347 | 10 | dt | rt |
355 | 10 | dt | rt |
348356 \----------------------------/
349357
350358
454462 /-------------------------\
455463 | Priority | Operators |
456464 |==========|==============|
457 | 0 | + - |
465 | 0 | + - |
458466 |----------|--------------|
459 | 1 | * / |
467 | 1 | * / |
460468 |----------|--------------|
461 | 2 | mod |
469 | 2 | mod |
462470 |----------|--------------|
463 | 3 | and or xor |
471 | 3 | and or xor |
464472 |----------|--------------|
465 | 4 | shl shr |
473 | 4 | shl shr |
466474 |----------|--------------|
467 | 5 | not |
475 | 5 | not |
468476 |----------|--------------|
469 | 6 | bsf bsr |
477 | 6 | bsf bsr |
470478 |----------|--------------|
471 | 7 | rva plt |
479 | 7 | rva plt |
472480 \-------------------------/
473481
474482
553561 operand are the same. Below are the examples for each of the allowed
554562 combinations:
555563
556 mov bx,ax ; general register to general register
564 mov bx,ax ; general register to general register
557565 mov [char],al ; general register to memory
558566 mov bl,[char] ; memory to general register
559 mov dl,32 ; immediate value to general register
567 mov dl,32 ; immediate value to general register
560568 mov [char],32 ; immediate value to memory
561 mov ax,ds ; segment register to general register
569 mov ax,ds ; segment register to general register
562570 mov [bx],ds ; segment register to memory
563 mov ds,ax ; general register to segment register
571 mov ds,ax ; general register to segment register
564572 mov ds,[bx] ; memory to segment register
565573 mov eax,cr0 ; control register to general register
566574 mov cr3,ebx ; general register to control register
570578 important. The operands may be two general registers, or general register
571579 with memory. For example:
572580
573 xchg ax,bx ; swap two general registers
581 xchg ax,bx ; swap two general registers
574582 xchg al,[char] ; swap register with memory
575583
576584 "push" decrements the stack frame pointer (ESP register), then transfers
584592 spaces, not commas), compiler will assemble chain of the "push" instructions
585593 with these operands. The examples are with single operands:
586594
587 push ax ; store general register
588 push es ; store segment register
589 pushw [bx] ; store memory
590 push 1000h ; store immediate value
595 push ax ; store general register
596 push es ; store segment register
597 pushw [bx] ; store memory
598 push 1000h ; store immediate value
591599
592600 "pusha" saves the contents of the eight general register on the stack.
593601 This instruction has no operands. There are two version of this instruction,
606614 follow in the same line, compiler will assemble chain of the "pop"
607615 instructions with these operands.
608616
609 pop bx ; restore general register
610 pop ds ; restore segment register
611 popw [si] ; restore memory
617 pop bx ; restore general register
618 pop ds ; restore segment register
619 popw [si] ; restore memory
612620
613621 "popa" restores the registers saved on the stack by "pusha" instruction,
614622 except for the saved value of SP (or ESP), which is ignored. This instruction
634642 extension. The source operand can be general register or memory, while the
635643 destination operand must be a general register. For example:
636644
637 movsx ax,al ; byte register to word register
638 movsx edx,dl ; byte register to double word register
639 movsx eax,ax ; word register to double word register
640 movsx ax,byte [bx] ; byte memory to word register
645 movsx ax,al ; byte register to word register
646 movsx edx,dl ; byte register to double word register
647 movsx eax,ax ; word register to double word register
648 movsx ax,byte [bx] ; byte memory to word register
641649 movsx edx,byte [bx] ; byte memory to double word register
642650 movsx eax,word [bx] ; word memory to double word register
643651
650658 register or memory, the source operand can be general register or immediate
651659 value, it can also be memory if the destination operand is register.
652660
653 add ax,bx ; add register to register
661 add ax,bx ; add register to register
654662 add ax,[si] ; add memory to register
655663 add [di],al ; add register to memory
656 add al,48 ; add immediate value to register
664 add al,48 ; add immediate value to register
657665 add [char],48 ; add immediate value to memory
658666
659667 "adc" sums the operands, adds one if CF is set, and replaces the destination
664672 general register or memory, and the size of the operand can be byte, word or
665673 double word.
666674
667 inc ax ; increment register by one
675 inc ax ; increment register by one
668676 inc byte [bx] ; increment memory by one
669677
670678 "sub" subtracts the source operand from the destination operand and replaces
720728 because, whether the operands are signed or unsigned, the lower half of the
721729 product is the same. Below are the examples for all three forms:
722730
723 imul bl ; accumulator by register
731 imul bl ; accumulator by register
724732 imul word [si] ; accumulator by memory
725 imul bx,cx ; register by register
733 imul bx,cx ; register by register
726734 imul bx,[si] ; register by memory
727 imul bx,10 ; register by immediate value
735 imul bx,10 ; register by immediate value
728736 imul ax,bx,10 ; register by immediate value to register
729737 imul ax,[si],10 ; memory by immediate value to register
730738
805813 1, "btr" resets the selected bit to 0, "btc" changes the bit to its
806814 complement. The first operand can be word or double word.
807815
808 bt ax,15 ; test bit in register
816 bt ax,15 ; test bit in register
809817 bts word [bx],15 ; test and set bit in memory
810 btr ax,cx ; test and reset bit in register
818 btr ax,cx ; test and reset bit in register
811819 btc word [bx],cx ; test and complement bit in memory
812820
813821 "bsf" and "bsr" instructions scan a word or double word for first set bit
820828 order to low order (starting from bit index 15 of a word or index 31 of a
821829 double word).
822830
823 bsf ax,bx ; scan register forward
831 bsf ax,bx ; scan register forward
824832 bsr ax,[si] ; scan memory reverse
825833
826834 "shl" shifts the destination operand left by the number of bits specified
830838 side of the operand as bits exit from the left side. The last bit that exited
831839 is stored in CF. "sal" is a synonym for "shl".
832840
833 shl al,1 ; shift register left by one bit
841 shl al,1 ; shift register left by one bit
834842 shl byte [bx],1 ; shift memory left by one bit
835 shl ax,cl ; shift register left by count from cl
843 shl ax,cl ; shift register left by count from cl
836844 shl word [bx],cl ; shift memory left by count from cl
837845
838846 "shr" and "sar" shift the destination operand right by the number of bits
879887 bits 16 through 23. This instruction is provided for converting little-endian
880888 values to big-endian format and vice versa.
881889
882 bswap edx ; swap bytes in register
890 bswap edx ; swap bytes in register
883891
884892
885893 2.1.6 Control transfer instructions
903911 variable, the operand should be general register or memory. See also 1.2.5 for
904912 some more details.
905913
906 jmp 100h ; direct near jump
914 jmp 100h ; direct near jump
907915 jmp 0FFFFh:0 ; direct far jump
908 jmp ax ; indirect near jump
916 jmp ax ; indirect near jump
909917 jmp pword [ebx] ; indirect far jump
910918
911919 "call" transfers control to the procedure, saving on the stack the address
942950
943951 Table 2.1 Conditions
944952 /-----------------------------------------------------------\
945 | Mnemonic | Condition tested | Description |
953 | Mnemonic | Condition tested | Description |
946954 |==========|=======================|========================|
947 | o | OF = 1 | overflow |
955 | o | OF = 1 | overflow |
948956 |----------|-----------------------|------------------------|
949 | no | OF = 0 | not overflow |
957 | no | OF = 0 | not overflow |
950958 |----------|-----------------------|------------------------|
951 | c | | carry |
952 | b | CF = 1 | below |
953 | nae | | not above nor equal |
959 | c | | carry |
960 | b | CF = 1 | below |
961 | nae | | not above nor equal |
954962 |----------|-----------------------|------------------------|
955 | nc | | not carry |
956 | ae | CF = 0 | above or equal |
957 | nb | | not below |
963 | nc | | not carry |
964 | ae | CF = 0 | above or equal |
965 | nb | | not below |
958966 |----------|-----------------------|------------------------|
959 | e | ZF = 1 | equal |
960 | z | | zero |
967 | e | ZF = 1 | equal |
968 | z | | zero |
961969 |----------|-----------------------|------------------------|
962 | ne | ZF = 0 | not equal |
963 | nz | | not zero |
970 | ne | ZF = 0 | not equal |
971 | nz | | not zero |
964972 |----------|-----------------------|------------------------|
965 | be | CF or ZF = 1 | below or equal |
966 | na | | not above |
973 | be | CF or ZF = 1 | below or equal |
974 | na | | not above |
967975 |----------|-----------------------|------------------------|
968 | a | CF or ZF = 0 | above |
969 | nbe | | not below nor equal |
976 | a | CF or ZF = 0 | above |
977 | nbe | | not below nor equal |
970978 |----------|-----------------------|------------------------|
971 | s | SF = 1 | sign |
979 | s | SF = 1 | sign |
972980 |----------|-----------------------|------------------------|
973 | ns | SF = 0 | not sign |
981 | ns | SF = 0 | not sign |
974982 |----------|-----------------------|------------------------|
975 | p | PF = 1 | parity |
976 | pe | | parity even |
983 | p | PF = 1 | parity |
984 | pe | | parity even |
977985 |----------|-----------------------|------------------------|
978 | np | PF = 0 | not parity |
979 | po | | parity odd |
986 | np | PF = 0 | not parity |
987 | po | | parity odd |
980988 |----------|-----------------------|------------------------|
981 | l | SF xor OF = 1 | less |
982 | nge | | not greater nor equal |
989 | l | SF xor OF = 1 | less |
990 | nge | | not greater nor equal |
983991 |----------|-----------------------|------------------------|
984 | ge | SF xor OF = 0 | greater or equal |
985 | nl | | not less |
992 | ge | SF xor OF = 0 | greater or equal |
993 | nl | | not less |
986994 |----------|-----------------------|------------------------|
987 | le | (SF xor OF) or ZF = 1 | less or equal |
988 | ng | | not greater |
995 | le | (SF xor OF) or ZF = 1 | less or equal |
996 | ng | | not greater |
989997 |----------|-----------------------|------------------------|
990 | g | (SF xor OF) or ZF = 0 | greater |
991 | nle | | not less nor equal |
998 | g | (SF xor OF) or ZF = 0 | greater |
999 | nle | | not less nor equal |
9921000 \-----------------------------------------------------------/
9931001
9941002 The "loop" instructions are conditional jumps that use a value placed in
10371045 operand should be AL, AX, or EAX register. The source operand should be an
10381046 immediate value in range from 0 to 255, or DX register.
10391047
1040 in al,20h ; input byte from port 20h
1041 in ax,dx ; input word from port addressed by dx
1048 in al,20h ; input byte from port 20h
1049 in ax,dx ; input word from port addressed by dx
10421050
10431051 "out" transfers a byte, word, or double word to an output port from AL, AX,
10441052 or EAX. The program can specify the number of the port using the same methods
10461054 in range from 0 to 255, or DX register. The source operand should be AL, AX,
10471055 or EAX register.
10481056
1049 out 20h,ax ; output word to port 20h
1050 out dx,al ; output byte to port addressed by dx
1057 out 20h,ax ; output word to port 20h
1058 out dx,al ; output byte to port addressed by dx
10511059
10521060
10531061 2.1.8 Strings operations
10771085
10781086 movs byte [di],[si] ; transfer byte
10791087 movs word [es:di],[ss:si] ; transfer word
1080 movsd ; transfer double word
1088 movsd ; transfer double word
10811089
10821090 "cmps" subtracts the destination string element from the source string
10831091 element and updates the flags AF, SF, PF, CF and OF, but it does not change
10871095 second operand should be the destination string element addressed by DI or
10881096 EDI.
10891097
1090 cmpsb ; compare bytes
1098 cmpsb ; compare bytes
10911099 cmps word [ds:si],[es:di] ; compare words
10921100 cmps dword [fs:esi],[edi] ; compare double words
10931101
10961104 PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
10971105 The operand should be the destination string element addressed by DI or EDI.
10981106
1099 scas byte [es:di] ; scan byte
1100 scasw ; scan word
1107 scas byte [es:di] ; scan byte
1108 scasw ; scan word
11011109 scas dword [es:edi] ; scan double word
11021110
11031111 "stos" places the value of AL, AX, or EAX into the destination string
11061114 should be the source string element addressed by SI or ESI with any segment
11071115 prefix.
11081116
1109 lods byte [ds:si] ; load byte
1110 lods word [cs:si] ; load word
1111 lodsd ; load double word
1117 lods byte [ds:si] ; load byte
1118 lods word [cs:si] ; load word
1119 lodsd ; load double word
11121120
11131121 "ins" transfers a byte, word, or double word from an input port addressed
11141122 by DX register to the destination string element. The destination operand
11151123 should be memory addressed by DI or EDI, the source operand should be the DX
11161124 register.
11171125
1118 insb ; input byte
1126 insb ; input byte
11191127 ins word [es:di],dx ; input word
1120 ins dword [edi],dx ; input double word
1128 ins dword [edi],dx ; input double word
11211129
11221130 "outs" transfers the source string element to an output port addressed by
11231131 DX register. The destination operand should be the DX register and the source
11241132 operand should be memory addressed by SI or ESI with any segment prefix.
11251133
1126 outs dx,byte [si] ; output byte
1127 outsw ; output word
1134 outs dx,byte [si] ; output byte
1135 outsw ; output word
11281136 outs dx,dword [gs:esi] ; output double word
11291137
11301138 The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify
11411149 the execution when the ZF is zero, "repne" and "repnz" terminate the execution
11421150 when the ZF is set.
11431151
1144 rep movsd ; transfer multiple double words
1145 repe cmpsb ; compare bytes until not equal
1152 rep movsd ; transfer multiple double words
1153 repe cmpsb ; compare bytes until not equal
11461154
11471155
11481156 2.1.9 Flag control instructions
11691177 and "popfd" forces restoring from the double word.
11701178
11711179
1172 2.1.10 Conditional operations
1180 2.1.10 Conditional operations
11731181
11741182 The instructions obtained by attaching the condition mnemonic (see table
11751183 2.1) to the "set" mnemonic set a byte to one if the condition is true and set
11761184 the byte to zero otherwise. The operand should be an 8-bit be general register
11771185 or the byte in memory.
11781186
1179 setne al ; set al if zero flag cleared
1187 setne al ; set al if zero flag cleared
11801188 seto byte [bx] ; set byte if overflow
11811189
11821190 "salc" instruction sets the all bits of AL register when the carry flag is
12081216 cmpxchg8b [bx] ; compare and exchange 8 bytes
12091217
12101218
1211 2.1.11 Miscellaneous instructions
1219 2.1.11 Miscellaneous instructions
12121220
12131221 "nop" instruction occupies one byte but affects nothing but the instruction
12141222 pointer. This instruction has no operands and doesn't perform any operation.
12691277 enter 2048,0 ; enter and allocate 2048 bytes on stack
12701278
12711279
1272 2.1.12 System instructions
1280 2.1.12 System instructions
12731281
12741282 "lmsw" loads the operand into the machine status word (bits 0 through 15 of
12751283 CR0 register), while "smsw" stores the machine status word into the
12771285 general register or memory, for "smsw" it can also be 32-bit general
12781286 register.
12791287
1280 lmsw ax ; load machine status from register
1281 smsw [bx] ; store machine status to memory
1288 lmsw ax ; load machine status from register
1289 smsw [bx] ; store machine status to memory
12821290
12831291 "lgdt" and "lidt" instructions load the values in operand into the global
12841292 descriptor table register or the interrupt descriptor table register
12861294 table register or the interrupt descriptor table register in the destination
12871295 operand. The operand should be a 6 bytes in memory.
12881296
1289 lgdt [ebx] ; load global descriptor table
1297 lgdt [ebx] ; load global descriptor table
12901298
12911299 "lldt" loads the operand into the segment selector field of the local
12921300 descriptor table register and "sldt" stores the segment selector from the
13001308 The source operand should be a 16-bit general register or memory.
13011309
13021310 lar ax,[bx] ; load access rights into word
1303 lar eax,dx ; load access rights into double word
1311 lar eax,dx ; load access rights into double word
13041312
13051313 "lsl" loads the segment limit from the segment descriptor specified by the
13061314 selector in source operand into the destination operand and sets the ZF flag.
13201328 destination operand. The destination operand can be a word general register
13211329 or memory, the source operand must be a general register.
13221330
1323 arpl bx,ax ; adjust RPL of selector in register
1331 arpl bx,ax ; adjust RPL of selector in register
13241332 arpl [bx],ax ; adjust RPL of selector in memory
13251333
13261334 "clts" clears the TS (task switched) flag in the CR0 register. This
13661374 instructions are stored in MSRs. These instructions have no operands.
13671375
13681376
1369 2.1.13 FPU instructions
1377 2.1.13 FPU instructions
13701378
13711379 The FPU (Floating-Point Unit) instructions operate on the floating-point
13721380 values in three formats: single precision (32-bit), double precision (64-bit)
13821390 format.
13831391
13841392 fld dword [bx] ; load single prevision value from memory
1385 fld st2 ; push value of st2 onto register stack
1393 fld st2 ; push value of st2 onto register stack
13861394
13871395 "fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
13881396 commonly used contants onto the FPU register stack. The loaded constants are
14001408 getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
14011409 and can also store value in the 80-bit memory.
14021410
1403 fst st3 ; copy value of st0 into st3 register
1411 fst st3 ; copy value of st0 into st3 register
14041412 fstp tword [bx] ; store value in memory and pop stack
14051413
14061414 "fist" converts the value in ST0 to a signed integer and stores the result
14291437 must be an FPU register and the source operand must be the ST0. When no
14301438 operands are specified, ST1 is used as a destination operand.
14311439
1432 faddp ; add st0 to st1 and pop the stack
1440 faddp ; add st0 to st1 and pop the stack
14331441 faddp st2,st0 ; add st0 to st2 and pop the stack
14341442
14351443 "fiadd" instruction converts an integer source operand into double extended
14401448
14411449 "fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd",
14421450 have the same rules for operands and differ only in the perfomed computation.
1443 "fsub" substracts the source operand from the destination operand, "fsubr"
1444 substract the destination operand from the source operand, "fmul" multiplies
1451 "fsub" subtracts the source operand from the destination operand, "fsubr"
1452 subtract the destination operand from the source operand, "fmul" multiplies
14451453 the destination and source operands, "fdiv" divides the destination operand by
14461454 the source operand and "fdivr" divides the source operand by the destination
14471455 operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
14551463 "fchs" complements its sign bit, "fabs" clears its sign to create the absolute
14561464 value, "frndint" rounds it to the nearest integral value, depending on the
14571465 current rounding mode. "f2xm1" computes the exponential value of 2 to the
1458 power of ST0 and substracts the 1.0 from it, the value of ST0 must lie in the
1466 power of ST0 and subtracts the 1.0 from it, the value of ST0 must lie in the
14591467 range -1.0 to +1.0. All these instructions store the result in ST0 and have no
14601468 operands.
14611469 "fsincos" computes both the sine and the cosine of the value in ST0
14831491 operand can be a single or double precision value in memory or the FPU
14841492 register. When no operand is specified, ST1 is used as a source operand.
14851493
1486 fcom ; compare st0 with st1
1487 fcomp st2 ; compare st0 with st2 and pop stack
1494 fcom ; compare st0 with st1
1495 fcomp st2 ; compare st0 with st2 and pop stack
14881496
14891497 "fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
14901498 word according to the results and pops the register stack twice. This
15121520 should be ST0 register and the second operand specifies the source FPU
15131521 register.
15141522
1515 fcomi st2 ; compare st0 with st2 and set flags
1523 fcomi st2 ; compare st0 with st2 and set flags
15161524 fcmovb st0,st2 ; transfer st2 to st0 if below
15171525
15181526 Table 2.2 FPU conditions
15191527 /------------------------------------------------------\
1520 | Mnemonic | Condition tested | Description |
1528 | Mnemonic | Condition tested | Description |
15211529 |==========|==================|========================|
1522 | b | CF = 1 | below |
1523 | e | ZF = 1 | equal |
1524 | be | CF or ZF = 1 | below or equal |
1525 | u | PF = 1 | unordered |
1526 | nb | CF = 0 | not below |
1527 | ne | ZF = 0 | not equal |
1528 | nbe | CF and ZF = 0 | not below nor equal |
1529 | nu | PF = 0 | not unordered |
1530 | b | CF = 1 | below |
1531 | e | ZF = 1 | equal |
1532 | be | CF or ZF = 1 | below or equal |
1533 | u | PF = 1 | unordered |
1534 | nb | CF = 0 | not below |
1535 | ne | ZF = 0 | not equal |
1536 | nbe | CF and ZF = 0 | not below nor equal |
1537 | nu | PF = 0 | not unordered |
15301538 \------------------------------------------------------/
15311539
15321540 "ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
15681576 "ffree" sets the tag associated with specified FPU register to empty. The
15691577 operand should be an FPU register.
15701578 "fincstp" and "fdecstp" rotate the FPU stack by one by adding or
1571 substracting one to the pointer of the top of stack. These instructions have no
1579 subtracting one to the pointer of the top of stack. These instructions have no
15721580 operands.
15731581
15741582
1575 2.1.14 MMX instructions
1583 2.1.14 MMX instructions
15761584
15771585 The MMX instructions operate on the packed integer types and use the MMX
15781586 registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
15971605 source and destination operand and stored in the data elements of the
15981606 destination operand. "paddb", "paddw" and "paddd" perform the addition of
15991607 packed bytes, packed words, or packed double words. "psubb", "psubw" and
1600 "psubd" perform the substraction of appropriate types. "paddsb", "paddsw",
1601 "psubsb" and "psubsw" perform the addition or substraction of packed bytes
1608 "psubd" perform the subtraction of appropriate types. "paddsb", "paddsw",
1609 "psubsb" and "psubsw" perform the addition or subtraction of packed bytes
16021610 or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
16031611 "psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
16041612 performs a signed multiplication of the packed words and store the high or low
16441652 used before using the FPU instructions if any MMX instructions were used.
16451653
16461654
1647 2.1.15 SSE instructions
1655 2.1.15 SSE instructions
16481656
16491657 The SSE extension adds more MMX instructions and also introduces the
16501658 operations on packed single precision floating point values. The 128-bit
16951703 must be a SSE register and the operation is performed on single precision
16961704 values, only low double words of SSE registers are used in this case, the
16971705 result is stored in the low double word of destination register. "addps" and
1698 "addss" add the values, "subps" and "subss" substract the source value from
1706 "addss" add the values, "subps" and "subss" subtract the source value from
16991707 destination value, "mulps" and "mulss" multiply the values, "divps" and
17001708 "divss" divide the destination value by the source value, "rcpps" and "rcpss"
17011709 compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
17281736
17291737 Table 2.3 SSE conditions
17301738 /-------------------------------------------\
1731 | Code | Mnemonic | Description |
1739 | Code | Mnemonic | Description |
17321740 |======|==========|=========================|
1733 | 0 | eq | equal |
1734 | 1 | lt | less than |
1735 | 2 | le | less than or equal |
1736 | 3 | unord | unordered |
1737 | 4 | neq | not equal |
1738 | 5 | nlt | not less than |
1739 | 6 | nle | not less than nor equal |
1740 | 7 | ord | ordered |
1741 | 0 | eq | equal |
1742 | 1 | lt | less than |
1743 | 2 | le | less than or equal |
1744 | 3 | unord | unordered |
1745 | 4 | neq | not equal |
1746 | 5 | nlt | not less than |
1747 | 6 | nle | not less than nor equal |
1748 | 7 | ord | ordered |
17411749 \-------------------------------------------/
17421750
17431751 "comiss" and "ucomiss" compare the single precision values and set the ZF,
18591867 of no specified size.
18601868
18611869
1862 2.1.16 SSE2 instructions
1870 2.1.16 SSE2 instructions
18631871
18641872 The SSE2 extension introduces the operations on packed double precision
18651873 floating point values, extends the syntax of MMX instructions, and adds also
19801988 is introduced, which performs the same operation as "pshufw", but on the
19811989 double words instead of words, it allows only the extended syntax.
19821990
1983 psubb xmm0,[esi] ; substract 16 packed bytes
1991 psubb xmm0,[esi] ; subtract 16 packed bytes
19841992 pextrw eax,xmm0,7 ; extract highest word into eax
19851993
19861994 "paddq" performs the addition of packed quad words, "psubq" performs the
1987 substraction of packed quad words, "pmuludq" performs an unsigned
1995 subtraction of packed quad words, "pmuludq" performs an unsigned
19881996 multiplication of low double words from each corresponding quad words and
19891997 returns the results in packed quad words. These instructions follow the same
19901998 rules for operands as the general MMX operations described in 2.1.14.
20202028 "lfence" instructions. These instructions have no operands.
20212029
20222030
2023 2.1.17 SSE3 instructions
2031 2.1.17 SSE3 instructions
20242032
20252033 Prescott technology introduced some new instructions to improve the performance
20262034 of SSE and SSE2 - this extension is called SSE3.
20412049 cacheline boundary. The destination operand has to be SSE register, the source
20422050 operand must be 128-bit memory location.
20432051 "addsubps" performs single precision addition of second and fourth pairs and
2044 single precision substracion of the first and third pairs of floating point
2052 single precision subtracion of the first and third pairs of floating point
20452053 values in the operands. "addsubpd" performs double precision addition of the
2046 second pair and double precision substraction of the first pair of floating
2054 second pair and double precision subtraction of the first pair of floating
20472055 point values in the operand. "haddps" performs the addition of two single
20482056 precision values within the each quad word of source and destination operands,
20492057 and stores the results of such horizontal addition of values from destination
20682076 destination register). They operate on 16-bit or 32-bit chunks, respectively.
20692077 "phaddsw" performs the same operation on signed 16-bit packed values, but the
20702078 result of each addition is saturated. "phsubw" and "phsubd" analogously
2071 perform the horizontal substraction of 16-bit or 32-bit packed value, and
2072 "phsubsw" performs the horizontal substraction of signed 16-bit packed values
2079 perform the horizontal subtraction of 16-bit or 32-bit packed value, and
2080 "phsubsw" performs the horizontal subtraction of signed 16-bit packed values
20732081 with saturation.
20742082 "pabsb", "pabsw" and "pabsd" calculate the absolute value of each signed
20752083 packed signed value in source operand and stores them into the destination
20992107 is the only SSSE3 instruction that takes three arguments.
21002108
21012109
2102 2.1.18 AMD 3DNow! instructions
2110 2.1.18 AMD 3DNow! instructions
21032111
21042112 The 3DNow! extension adds a new MMX instructions to those described in 2.1.14,
21052113 and introduces operation on the 64-bit packed floating point values, each
21162124 double word in source operand are used. "pf2iw" converts packed floating
21172125 point values to packed word integers, results are extended to double words
21182126 using the sign extension. "pfadd" adds packed floating point values. "pfsub"
2119 and "pfsubr" substracts packed floating point values, the first one substracts
2120 source values from destination values, the second one substracts destination
2127 and "pfsubr" subtracts packed floating point values, the first one subtracts
2128 source values from destination values, the second one subtracts destination
21212129 values from the source values. "pfmul" multiplies packed floating point
21222130 values. "pfacc" adds the low and high floating point values of the destination
21232131 operand, storing the result in the low double word of destination, and adds
21242132 the low and high floating point values of the source operand, storing the
2125 result in the high double word of destination. "pfnacc" substracts the high
2133 result in the high double word of destination. "pfnacc" subtracts the high
21262134 floating point value of the destination operand from the low, storing the
2127 result in the low double word of destination, and substracts the high floating
2135 result in the low double word of destination, and subtracts the high floating
21282136 point value of the source operand from the low, storing the result in the high
2129 double word of destination. "pfpnacc" substracts the high floating point value
2137 double word of destination. "pfpnacc" subtracts the high floating point value
21302138 of the destination operand from the low, storing the result in the low double
21312139 word of destination, and adds the low and high floating point values of the
21322140 source operand, storing the result in the high double word of destination.
21562164 operands.
21572165
21582166
2159 2.1.19 The x86-64 long mode instructions
2167 2.1.19 The x86-64 long mode instructions
21602168
21612169 The AMD64 and EM64T architectures (we will use the common name x86-64 for them
21622170 both) extend the x86 instruction set for the 64-bit processing. While legacy
21742182
21752183 Table 2.4 New registers in long mode
21762184 /--------------------------------------------------\
2177 | Type | General | SSE | AVX |
2185 | Type | General | SSE | AVX |
21782186 |------|---------------------------|-------|-------|
2179 | Bits | 8 | 16 | 32 | 64 | 128 | 256 |
2187 | Bits | 8 | 16 | 32 | 64 | 128 | 256 |
21802188 |======|======|======|======|======|=======|=======|
2181 | | | | | rax | | |
2182 | | | | | rcx | | |
2183 | | | | | rdx | | |
2184 | | | | | rbx | | |
2185 | | spl | | | rsp | | |
2186 | | bpl | | | rbp | | |
2187 | | sil | | | rsi | | |
2188 | | dil | | | rdi | | |
2189 | | r8b | r8w | r8d | r8 | xmm8 | ymm8 |
2190 | | r9b | r9w | r9d | r9 | xmm9 | ymm9 |
2191 | | r10b | r10w | r10d | r10 | xmm10 | ymm10 |
2192 | | r11b | r11w | r11d | r11 | xmm11 | ymm11 |
2193 | | r12b | r12w | r12d | r12 | xmm12 | ymm12 |
2194 | | r13b | r13w | r13d | r13 | xmm13 | ymm13 |
2195 | | r14b | r14w | r14d | r14 | xmm14 | ymm14 |
2196 | | r15b | r15w | r15d | r15 | xmm15 | ymm15 |
2189 | | | | | rax | | |
2190 | | | | | rcx | | |
2191 | | | | | rdx | | |
2192 | | | | | rbx | | |
2193 | | spl | | | rsp | | |
2194 | | bpl | | | rbp | | |
2195 | | sil | | | rsi | | |
2196 | | dil | | | rdi | | |
2197 | | r8b | r8w | r8d | r8 | xmm8 | ymm8 |
2198 | | r9b | r9w | r9d | r9 | xmm9 | ymm9 |
2199 | | r10b | r10w | r10d | r10 | xmm10 | ymm10 |
2200 | | r11b | r11w | r11d | r11 | xmm11 | ymm11 |
2201 | | r12b | r12w | r12d | r12 | xmm12 | ymm12 |
2202 | | r13b | r13w | r13d | r13 | xmm13 | ymm13 |
2203 | | r14b | r14w | r14d | r14 | xmm14 | ymm14 |
2204 | | r15b | r15w | r15d | r15 | xmm15 | ymm15 |
21972205 \--------------------------------------------------/
21982206
21992207 In general any instruction from x86 architecture, which allowed 16-bit or
22032211 registers. Below are the samples of new operations possible in long mode on the
22042212 example of "mov" instruction:
22052213
2206 mov rax,r8 ; transfer 64-bit general register
2214 mov rax,r8 ; transfer 64-bit general register
22072215 mov al,[rbx] ; transfer memory addressed by 64-bit register
22082216
22092217 The long mode uses also the instruction pointer based addresses, you can
22832291 and "wrmsr" instructions.
22842292
22852293
2286 2.1.20 SSE4 instructions
2294 2.1.20 SSE4 instructions
22872295
22882296 There are actually three different sets of instructions under the name SSE4.
22892297 Intel designed two of them, SSE4.1 and SSE4.2, with latter extending the
24202428 destination operand, the source can be 64-bit memory or SSE register.
24212429
24222430 pmovzxbq xmm0,word [si] ; zero-extend bytes to quad words
2423 pmovsxwq xmm0,xmm1 ; sign-extend words to quad words
2431 pmovsxwq xmm0,xmm1 ; sign-extend words to quad words
24242432
24252433 "movntdqa" loads double quad word from the source operand to the destination
24262434 using a non-temporal hint. The destination operand should be SSE register,
24502458 also be a 64-bit general purpose register, and the source operand in such case
24512459 can be a byte or quad word register or memory location.
24522460
2453 crc32 eax,dl ; accumulate CRC32 on byte value
2461 crc32 eax,dl ; accumulate CRC32 on byte value
24542462 crc32 eax,word [ebx] ; accumulate CRC32 on word value
24552463 crc32 rax,qword [rbx] ; accumulate CRC32 on quad word value
24562464
24602468 the same size as source operand. The 64-bit variant is available only in long
24612469 mode.
24622470
2463 popcnt ecx,eax ; count bits set to 1
2471 popcnt ecx,eax ; count bits set to 1
24642472
24652473 The SSE4a extension, which also includes the "popcnt" instruction introduced
24662474 by SSE4.2, at the same time adds the "lzcnt" instruction, which follows the
24752483 is no third operand in such case), which should contain position value in bits
24762484 8-13 and length of bit string in bits 0-5.
24772485
2478 extrq xmm0,8,7 ; extract 8 bits from position 7
2479 extrq xmm0,xmm5 ; extract bits defined by register
2486 extrq xmm0,8,7 ; extract 8 bits from position 7
2487 extrq xmm0,xmm5 ; extract bits defined by register
24802488
24812489 "insertq" writes the sequence of bits from the low quad word of the source
24822490 operand into specified position in low quad word of the destination operand,
24882496 string in bits 64-69.
24892497
24902498 insertq xmm1,xmm0,4,2 ; insert 4 bits at position 2
2491 insertq xmm1,xmm0 ; insert bits defined by register
2499 insertq xmm1,xmm0 ; insert bits defined by register
24922500
24932501 "movntss" and "movntsd" store single or double precision floating point
24942502 value from the source SSE register into 32-bit or 64-bit destination memory
24952503 location respectively, using non-temporal hint.
24962504
24972505
2498 2.1.21 AVX instructions
2506 2.1.21 AVX instructions
24992507
25002508 The Advanced Vector Extensions introduce instructions that are new variants
25012509 of SSE instructions, with new scheme of encoding that allows extended syntax
25122520 the remaining bits of first source SSE register are copied into the the
25132521 destination register.
25142522
2515 vsubss xmm0,xmm2,xmm3 ; substract two 32-bit floats
2523 vsubss xmm0,xmm2,xmm3 ; subtract two 32-bit floats
25162524 vmulsd xmm0,xmm7,qword [esi] ; multiply two 64-bit floats
25172525
25182526 In case of packed operations, each instruction can also operate on the 256-bit
25262534 with three operands, however they are only allowed to operate on 128-bit
25272535 packed types and thus cannot use the whole AVX registers.
25282536
2529 vpavgw xmm3,xmm0,xmm2 ; average of 16-bit integers
2530 vpslld xmm1,xmm0,1 ; shift double words left
2537 vpavgw xmm3,xmm0,xmm2 ; average of 16-bit integers
2538 vpslld xmm1,xmm0,1 ; shift double words left
25312539
25322540 If the SSE version of instruction had a syntax with three operands, the third
25332541 one being an immediate value, the AVX version of such instruction takes four
25342542 operands, with immediate remaining the last one.
25352543
25362544 vshufpd ymm0,ymm1,ymm2,10010011b ; shuffle 64-bit floats
2537 vpalignr xmm0,xmm4,xmm2,3 ; extract byte aligned value
2545 vpalignr xmm0,xmm4,xmm2,3 ; extract byte aligned value
25382546
25392547 The promotion to new syntax according to the rules described above has been
25402548 applied to all the instructions from SSE extensions up to SSE4, with the
25452553 "vrsqrtps", which can operate on 256-bit data size, but retained the syntax
25462554 with only two operands, because they use data from only one source:
25472555
2548 vsqrtpd ymm1,ymm0 ; put square roots into other register
2556 vsqrtpd ymm1,ymm0 ; put square roots into other register
25492557
25502558 In a similar way "vroundpd" and "vroundps" retained the syntax with three
2551 operands, the last one being immediate value.
2559 operands, the last one being immediate value.
25522560
25532561 vroundps ymm0,ymm1,0011b ; round toward zero
2554
2562
25552563 Also some of the operations on packed integers kept their two-operand or
25562564 three-operand syntax while being promoted to AVX version. In such case these
25572565 instructions follow exactly the same rules for operands as their SSE
25742582 syntax from SSE without any changes, and also allows a new form with 256-bit
25752583 operands in place of 128-bit ones.
25762584
2577 vmovups [edi],ymm6 ; store unaligned 256-bit data
2585 vmovups [edi],ymm6 ; store unaligned 256-bit data
25782586
25792587 "vmovddup" has the identical 128-bit syntax as its SSE version, and it also
25802588 has a 256-bit version, which stores the duplicates of the lowest quad word
26002608 either low or high quad word replaced with value from second source (the
26012609 memory operand).
26022610
2603 vmovhps [esi],xmm7 ; store upper half to memory
2611 vmovhps [esi],xmm7 ; store upper half to memory
26042612 vmovlps xmm0,xmm7,[ebx] ; low from memory, rest from register
26052613
26062614 "vmovss" and "vmovsd" have syntax identical to their SSE equivalents as long
26092617 in destination is then the value copied from first source with lowest data
26102618 element replaced with the lowest value from second source.
26112619
2612 vmovss xmm3,[edi] ; low from memory, rest zeroed
2620 vmovss xmm3,[edi] ; low from memory, rest zeroed
26132621 vmovss xmm0,xmm1,xmm2 ; one value from xmm2, three from xmm1
26142622
26152623 "vcvtss2sd", "vcvtsd2ss", "vcvtsi2ss" and "vcvtsi2d" use the three-operand
26262634 128-bit memory as source. Analogously "vcvtpd2dq", "vcvttpd2dq" and
26272635 "vcvtpd2ps", in addition to variant with syntax identical to SSE version,
26282636 allow a variant with SSE register as destination and AVX register or 256-bit
2629 memory as source.
2637 memory as source.
26302638 "vinsertps", "vpinsrb", "vpinsrw", "vpinsrd", "vpinsrq" and "vpblendw" use
26312639 a syntax with four operands, where destination and first source have to be SSE
26322640 registers, and the third and fourth operand follow the same rules as second
26462654 first source with some data elements replaced, according to mask, by values
26472655 from the second source.
26482656
2649 vblendvps ymm3,ymm1,ymm2,ymm7 ; blend according to mask
2657 vblendvps ymm3,ymm1,ymm2,ymm7 ; blend according to mask
26502658
26512659 "vptest" allows the same syntax as its SSE version and also has a 256-bit
26522660 version, with both operands doubled in size. There are also two new
26562664 "vptest".
26572665
26582666 vptest ymm0,yword [ebx] ; test 256-bit values
2659 vtestpd xmm0,xmm1 ; test sign bits of 64-bit floats
2667 vtestpd xmm0,xmm1 ; test sign bits of 64-bit floats
26602668
26612669 "vbroadcastss", "vbroadcastsd" and "vbroadcastf128" are new instructions,
26622670 which broadcast the data element defined by source operand into all elements
26662674 destination. "vbroadcastf128" requires 128-bit memory as source, and AVX
26672675 register as destination.
26682676
2669 vbroadcastss ymm0,dword [eax] ; get eight copies of value
2677 vbroadcastss ymm0,dword [eax] ; get eight copies of value
26702678
26712679 "vinsertf128" is the new instruction, which takes four operands. The
26722680 destination and first source have to be AVX registers, second source can be
26872695 data (AVX registers). Either destination or second source has to be a memory
26882696 location of appropriate size, the two other operands should be registers.
26892697
2690 vmaskmovps [edi],xmm0,xmm5 ; conditionally store
2691 vmaskmovpd ymm5,ymm0,[esi] ; conditionally load
2698 vmaskmovps [edi],xmm0,xmm5 ; conditionally store
2699 vmaskmovpd ymm5,ymm0,[esi] ; conditionally load
26922700
26932701 "vpermilpd" and "vpermilps" are the new instructions with three operands
26942702 that permute the values from first source according to the control fields from
27132721 instructions. The rules for their operands remain unchanged.
27142722
27152723
2716 2.1.22 AVX2 instructions
2724 2.1.22 AVX2 instructions
27172725
27182726 The AVX2 extension allows all the AVX instructions operating on packed integers
27192727 to use 256-bit data types, and introduces some new instructions as well.
27222730 rules became analogous to AVX instructions operating on packed floating point
27232731 types.
27242732
2725 vpsubb ymm0,ymm0,[esi] ; substract 32 packed bytes
2733 vpsubb ymm0,ymm0,[esi] ; subtract 32 packed bytes
27262734 vpavgw ymm3,ymm0,ymm2 ; average of 16-bit integers
27272735
27282736 However there are some instructions that have not been equipped with the
27342742 amount to be SSE register or 128-bit memory location, use the same rules
27352743 for the third operand in their 256-bit variant.
27362744
2737 vpsllw ymm2,ymm2,xmm4 ; shift words left
2745 vpsllw ymm2,ymm2,xmm4 ; shift words left
27382746 vpsrad ymm0,ymm3,xword [ebx] ; shift double words right
27392747
27402748 There are also new packed shift instructions with standard three-operand AVX
27492757 256-bit variant need memory of that size doubled or SSE register as source and
27502758 AVX register as destination.
27512759
2752 vpmovzxbq ymm0,dword [esi] ; bytes to quad words
2760 vpmovzxbq ymm0,dword [esi] ; bytes to quad words
27532761
27542762 Also "vmovntdqa" has been upgraded with 256-bit variant, so it allows to
27552763 transfer 256-bit value from memory to AVX register, it needs memory address
27712779 element.
27722780
27732781 vpbroadcastb ymm0,byte [ebx] ; get 32 identical bytes
2774
2782
27752783 "vpermd" and "vpermps" are new three-operand instructions, which use each
27762784 32-bit element from first source as an index of element in second source which
27772785 is copied into destination at position corresponding to element containing
27812789 indexes from the immediate value specified as third operand to determine which
27822790 element from source store at given position in destination. The destination
27832791 has to be AVX register, source can be AVX register or 256-bit memory, and the
2784 third operand must be 8-bit immediate value.
2792 third operand must be 8-bit immediate value.
27852793 The family of new instructions performing "gather" operation have special
27862794 syntax, as in their memory operand they use addressing mode that is unique to
27872795 them. The base of address can be a 32-bit or 64-bit general purpose register
28372845 respectively.
28382846
28392847
2840 2.1.23 Auxiliary sets of computational instructions
2848 2.1.23 Auxiliary sets of computational instructions
28412849
28422850 There is a number of additional instruction set extensions related to
28432851 AVX. They introduce new vector instructions (and sometimes also their SSE
28842892 The mnemonic of FMA instruction is obtained by appending to "vf" prefix: first
28852893 either "m" or "nm" to select whether result of multiplication should be taken
28862894 as-is or negated, then either "add" or "sub" to select whether third value
2887 will be added to the product or substracted from the product, then either
2895 will be added to the product or subtracted from the product, then either
28882896 "132", "213" or "231" to select which source operands are multiplied and which
2889 one is added or substracted, and finally the type of data on which the
2897 one is added or subtracted, and finally the type of data on which the
28902898 instruction operates, either "ps", "pd", "ss" or "sd". As it was with SSE
28912899 instructions promoted to AVX, instructions operating on packed floating point
28922900 values allow 128-bit or 256-bit syntax, in former all the operands are SSE
28962904 SSE registers, and the third operand can also be a memory, either 32-bit for
28972905 single precision or 64-bit for double precision.
28982906
2899 vfmsub231ps ymm1,ymm2,ymm3 ; multiply and substract
2900 vfnmadd132sd xmm0,xmm5,[ebx] ; multiply, negate and add
2907 vfmsub231ps ymm1,ymm2,ymm3 ; multiply and subtract
2908 vfnmadd132sd xmm0,xmm5,[ebx] ; multiply, negate and add
29012909
29022910 In addition to the instructions created by the rule described above, there are
29032911 families of instructions with mnemonics starting with either "vfmaddsub" or
29042912 "vfmsubadd", followed by either "132", "213" or "231" and then either "ps" or
29052913 "pd" (the operation must always be on packed values in this case). They add
2906 to the result of multiplication or substract from it depending on the position
2914 to the result of multiplication or subtract from it depending on the position
29072915 of value in packed data - instructions from the "vfmaddsub" group add when the
2908 position is odd and substract when the position is even, instructions from the
2916 position is odd and subtract when the position is even, instructions from the
29092917 "vfmsubadd" group add when the position is even and subtstract when the
29102918 position is odd. The rules for operands are the same as for other FMA
29112919 instructions.
29152923 out, as having separate destination operand makes such selection of operands
29162924 superfluous. The multiplication is always performed on values from the first
29172925 and second source, and then the value from third source is added or
2918 substracted. Either second or third source can be a memory operand, and the
2926 subtracted. Either second or third source can be a memory operand, and the
29192927 rules for the sizes of operands are the same as for FMA instructions.
29202928
2921 vfmaddpd ymm0,ymm1,[esi],ymm2 ; multiply and add
2922 vfmsubss xmm0,xmm1,xmm2,[ebx] ; multiply and substract
2929 vfmaddpd ymm0,ymm1,[esi],ymm2 ; multiply and add
2930 vfmsubss xmm0,xmm1,xmm2,[ebx] ; multiply and subtract
29232931
29242932 The F16C extension consists of two instructions, "vcvtps2ph" and
29252933 "vcvtph2ps", which convert floating point values between single precision and
29422950 on a solitary double precision value and 32-bit for operation on a solitary
29432951 single precision value).
29442952
2945 vfrczps ymm0,[esi] ; load fractional parts
2953 vfrczps ymm0,[esi] ; load fractional parts
29462954
29472955 "vpcmov" copies bits from either first or second source into destination
29482956 depending on the values of corresponding bits in the fourth operand (the
29702978 of comparison encoded within the instruction name by inserting the comparison
29712979 mnemonic after "vpcom".
29722980
2973 vpcomb xmm0,xmm1,xmm2,4 ; test for equal bytes
2974 vpcomgew xmm0,xmm1,[ebx] ; compare signed words
2981 vpcomb xmm0,xmm1,xmm2,4 ; test for equal bytes
2982 vpcomgew xmm0,xmm1,[ebx] ; compare signed words
29752983
29762984 Table 2.5 XOP comparisons
29772985 /-------------------------------------------\
2978 | Code | Mnemonic | Description |
2986 | Code | Mnemonic | Description |
29792987 |======|==========|=========================|
2980 | 0 | lt | less than |
2981 | 1 | le | less than or equal |
2982 | 2 | gt | greater than |
2983 | 3 | ge | greater than or equal |
2984 | 4 | eq | equal |
2985 | 5 | neq | not equal |
2986 | 6 | false | false |
2987 | 7 | true | true |
2988 | 0 | lt | less than |
2989 | 1 | le | less than or equal |
2990 | 2 | gt | greater than |
2991 | 3 | ge | greater than or equal |
2992 | 4 | eq | equal |
2993 | 5 | neq | not equal |
2994 | 6 | false | false |
2995 | 7 | true | true |
29882996 \-------------------------------------------/
29892997
29902998 "vpermil2ps" and "vpermil2pd" set the elements in destination register to
30063014 64-bit results, "vphaddwd" and "vphadduwd" add pairs of words to 32-bit
30073015 results, "vphaddwq" and "vphadduwq" sum all words in each four-word block to
30083016 64-bit results, "vphadddq" and "vphaddudq" add pairs of double words to 64-bit
3009 results. "vphsubbw" substracts in each two-byte block the byte at higher
3017 results. "vphsubbw" subtracts in each two-byte block the byte at higher
30103018 position from the one at lower position, and stores the result as a signed
30113019 16-bit value at the corresponding position in destination, "vphsubwd"
3012 substracts in each two-word block the word at higher position from the one at
3013 lower position and makes signed 32-bit results, "vphsubdq" substract in each
3020 subtracts in each two-word block the word at higher position from the one at
3021 lower position and makes signed 32-bit results, "vphsubdq" subtract in each
30143022 block of two double word the one at higher position from the one at lower
30153023 position and makes signed 64-bit results. Each of these instructions takes
30163024 two operands, the destination being SSE register, and the source being SSE
30173025 register or 128-bit memory.
30183026
3019 vphadduwq xmm0,xmm1 ; sum quadruplets of words
3027 vphadduwq xmm0,xmm1 ; sum quadruplets of words
30203028
30213029 "vpmacsww" and "vpmacssww" multiply the corresponding signed 16-bit values
30223030 from the first and second source and then add the products to the parallel
30543062 memory (or they can be SSE registers both) and the other operands have to be
30553063 SSE registers.
30563064
3057 vpshld xmm3,xmm1,[ebx] ; shift bytes from xmm1
3065 vpshld xmm3,xmm1,[ebx] ; shift bytes from xmm1
30583066
30593067 "vpshab", "vpshaw", "vpshad" and "vpshaq" arithmetically shift bytes, words,
30603068 double words or quad words. These instructions follow the same rules as the
30633071 shifts, but additionally allow third operand to be immediate value, in which
30643072 case the same amount of rotation is specified for all the elements in source.
30653073
3066 vprotb xmm0,[esi],3 ; rotate bytes to the left
3074 vprotb xmm0,[esi],3 ; rotate bytes to the left
30673075
30683076 The MOVBE extension introduces just one new instruction, "movbe", which
30693077 swaps bytes in value from source before storing it in destination, so can
30813089 the first source have to be general registers, the second source can be
30823090 general register or memory.
30833091
3084 andn edx,eax,[ebx] ; bit-multiply inverted eax with memory
3092 andn edx,eax,[ebx] ; bit-multiply inverted eax with memory
30853093
30863094 "bextr" extracts from the first source the sequence of bits using an index
30873095 and length specified by bit fields in the second source operand and stores
30963104 bits in destination to zero. The destination must be a general register,
30973105 the source can be general register or memory.
30983106
3099 blsi rax,r11 ; isolate the lowest set bit
3107 blsi rax,r11 ; isolate the lowest set bit
31003108
31013109 "blsmsk" sets all the bits in the destination up to the lowest set bit in
31023110 the source, including this bit. "blsr" copies all the bits from the source to
31143122 "pdep" performs the reverse operation - it takes sequence of bits from the
31153123 first source and puts them consecutively at the positions where the bits in
31163124 second source are set, setting all the other bits in destination to zero.
3117 These BMI2 instructions follow the same rules for operands as "andn".
3125 These BMI2 instructions follow the same rules for operands as "andn".
31183126 "mulx" is a BMI2 instruction which performs an unsigned multiplication of
31193127 value from EDX or RDX register (depending on the size of specified operands)
31203128 by the value from third operand, and stores the low half of result in the
31223130 it without affecting the flags. The third operand can be general register or
31233131 memory, and both the destination operands have to be general registers.
31243132
3125 mulx edx,eax,ecx ; multiply edx by ecx into edx:eax
3133 mulx edx,eax,ecx ; multiply edx by ecx into edx:eax
31263134
31273135 "shlx", "shrx" and "sarx" are BMI2 instructions, which perform logical or
31283136 arithmetical shifts of value from first source by the amount specified by
31343142 has to be general register, the source operand can be general register or
31353143 memory, and the third operand has to be an immediate value.
31363144
3137 rorx eax,edx,7 ; rotate without affecting flags
3138
3145 rorx eax,edx,7 ; rotate without affecting flags
3146
31393147 The TBM is an extension designed by AMD to supplement the BMI set. The
31403148 "bextr" instruction is extended with a new form, in which second source is
31413149 a 32-bit immediate value. "blsic" is a new instruction which performs the
31503158 "tzmsk" finds the lowest set bit in value from source operand, sets all bits
31513159 below it to 1 and all the rest of bits to zero, then writes the result to
31523160 destination. "t1mskc" finds the least significant zero bit in the value from
3153 source operand, sets the bits below it to zero and all the other bits to 1,
3161 source operand, sets the bits below it to zero and all the other bits to 1,
31543162 and writes the result to destination. These instructions have the same rules
31553163 for operands as "blsi".
31563164
31573165
3158 2.1.24 AVX-512 instructions
3159
3160 [This section has not been written yet.]
3161
3162
3163 2.1.25 Other extensions of instruction set
3166 2.1.24 AVX-512 instructions
3167
3168 The AVX-512 introduces 512-bit vector registers, which extend the 256-bit
3169 registers used by AVX and AVX2. It also extends the set of vector registers
3170 from 16 to 32, with the additional registers "zmm16" to "zmm31", their low
3171 256-bit portions "ymm16" to "ymm31" and their low 128-bit portions "xmm16"
3172 to "xmm31". These additional registers can only be accessed in the long mode.
3173
3174 Table 2.6 New registers available in long mode with AVX-512
3175 /------------------------------------------------------------------\
3176 | Size | Registers |
3177 |---------|--------------------------------------------------------|
3178 | 128-bit | xmm16 xmm17 xmm18 xmm19 xmm20 xmm21 xmm22 xmm23 |
3179 | | xmm24 xmm25 xmm26 xmm27 xmm28 xmm29 xmm30 xmm31 |
3180 |---------|--------------------------------------------------------|
3181 | 256-bit | ymm16 ymm17 ymm18 ymm19 ymm20 ymm21 ymm22 ymm23 |
3182 | | ymm24 ymm25 ymm26 ymm27 ymm28 ymm29 ymm30 ymm31 |
3183 |---------|--------------------------------------------------------|
3184 | 512-bit | zmm16 zmm17 zmm18 zmm19 zmm20 zmm21 zmm22 zmm23 |
3185 | | zmm24 zmm25 zmm26 zmm27 zmm28 zmm29 zmm30 zmm31 |
3186 \------------------------------------------------------------------/
3187
3188 In addition to new operand sizes and registers, the AVX-512 introduces
3189 a number of supplementary settings that can be included in the operands
3190 of AVX instructions.
3191 The destination operand of the most of AVX instructions can be followed
3192 by the name of an opmask register enclosed in braces, this modifier
3193 specifies a mask that decides which units of data in the destination
3194 operand are going to be updated. The "k0" register cannot be used as a
3195 destination mask. This setting can be further followed by "{z}" modifier
3196 to choose that the data units not selected by mask should be zeroed
3197 instead of leaving them unchanged.
3198
3199 vaddpd zmm1{k1},zmm5,zword [rsi] ; update selected floats
3200 vaddps ymm6{k1}{z},ymm12,ymm24 ; update selected, zero other ones
3201
3202 When an instruction that operates on packed data has a source operand
3203 loaded from a memory, the memory location may be just a single unit of data
3204 and the source used for the operation is created by broadcasting this
3205 value into all the units within the required size. To specify that such
3206 broadcasting method is used the memory operand should be followed by one
3207 of the "{1to2}", "{1to4}", "{1to8}", "{1to16}", "{1to32}" and "{1to64}"
3208 modifiers, selecting the appropriate multiply of a unit.
3209
3210 vsubps zmm1,zmm2,dword [rsi] {1to16} ; subtract from all floats
3211
3212 When an instruction does not use a memory operand often an additional
3213 operand may follow the source operands, containing the rounding mode
3214 specifier. When an instruction has variants that operate on different
3215 sizes of data, the rounding mode can be specified only when the
3216 register operands are 512-bit.
3217
3218 vdivps zmm2,zmm3,zmm5,{ru-sae} ; round results up
3219
3220 Table 2.7 AVX-512 rounding modes
3221 /----------------------------------------------------------\
3222 | Operand | Description |
3223 |==========|===============================================|
3224 | {rn-sae} | round to nearest and suppress all exceptions |
3225 | {rd-sae} | round down and suppress all exceptions |
3226 | {ru-sae} | round up and suppress all exceptions |
3227 | {rz-sae} | round toward zero and suppress all exceptions |
3228 \----------------------------------------------------------/
3229
3230 Some of the instructions do not use a rounding mode but still allow
3231 to specify the exception suppression option with "{sae}" modifier in the
3232 additional operand.
3233
3234 vmaxpd zmm0,zmm1,zmm2,{sae} ; suppress all exceptions
3235
3236 The family of "gather" instructions in their AVX-512 variants use a new
3237 syntax with only two operands. The opmask register takes the role which
3238 way played by the third operand in the AVX2 syntax and it is mandatory
3239 in this case.
3240
3241 vgatherdps xmm0{k1},[eax+xmm1] ; gather four floats
3242 vgatherdpd zmm0{k3},[ymm3*8] ; gather eight doubles
3243
3244 The new family of "scatter" instructions perform an operation reverse to
3245 the one of "gather". They also take two operands, the destination is a
3246 memory with vector indexing and opmask modifier, and the source is a vector
3247 register.
3248
3249 vscatterdps [eax+xmm1]{k1},xmm0 ; scatter four floats
3250 vscatterdpd [ymm3*8]{k3},zmm0 ; scatter eight doubles
3251
3252
3253 2.1.25 Other extensions of instruction set
31643254
31653255 There is a number of additional instruction set extensions recognized by flat
31663256 assembler, and the general syntax of the instructions introduced by those
31673257 extensions is provided here. For a detailed information on the operations
31683258 performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE,
3169 RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM
3259 RDRAND, FSGSBASE, INVPCID, HLE, RTM, and MPX extensions) or AMD (for the SVM
31703260 extension).
31713261 The Virtual-Machine Extensions (VMX) provide a set of instructions for the
31723262 management of virtual machines. The "vmxon" instruction, which enters the VMX
32573347 an 8-bit immediate value as its only operand, this value is passed in the
32583348 highest bits of EAX to the fallback routine. "xtest" checks whether there is
32593349 transactional execution in progress, this instruction takes no operands.
3350 The MPX extension adds instructions that operate on new bounds registers
3351 and aid in checking the memory references. For some of these instructions
3352 flat assemblers allows a special syntax that allows a fine control over their
3353 operation, where an address of a memory operand is separated into two parts
3354 with a comma. With "bndmk" instruction the first part of such address specifies
3355 the lower bound and the second one the upper bound. The lower bound can be
3356 either zero or a register, the upper bound can be any address that uses no more
3357 than one register (multiplied by 1, 2, 4, or 8). The addressing registers need
3358 to be 64-bit when in long mode, and 32-bit otherwise.
3359
3360 bndmk bnd0,[rbx,100000h] ; lower bound in register, upper directly
3361 bndmk bnd1,[0,rbx] ; lower bound zero, upper in register
3362
3363 In case of "bndldx" and "bndstx", the first part of memory operand specifies an
3364 address used to access a bound table entry, while the second part is either zero
3365 or a register that plays a role of an additional operand for such instruction.
3366 The address in the first part may use no more than one register and the register
3367 cannot be multiplied by a number other than 1.
3368
3369 bndstx [rcx,rsi],bnd3 ; store bnd3 and rsi at rcx in the bound table
3370 bndldx bnd2,[rcx,rsi] ; load from bound table if entry matches rsi
32603371
32613372
32623373 2.2 Control directives
33573468 defined somewhere in source:
33583469
33593470 if count>0
3360 mov cx,count
3361 rep movsb
3471 mov cx,count
3472 rep movsb
33623473 end if
33633474
33643475 These two assembly instructions will be assembled only if the "count" constant
33653476 is greater than 0. The next sample shows more complex conditional structure:
33663477
33673478 if count & ~ count mod 4
3368 mov cx,count/4
3369 rep movsd
3479 mov cx,count/4
3480 rep movsd
33703481 else if count>4
3371 mov cx,count/4
3372 rep movsd
3373 mov cx,count mod 4
3374 rep movsb
3482 mov cx,count/4
3483 rep movsd
3484 mov cx,count mod 4
3485 rep movsb
33753486 else
3376 mov cx,count
3377 rep movsb
3487 mov cx,count
3488 rep movsb
33783489 end if
33793490
33803491 The first block of instructions gets assembled when the "count" is non zero and
34223533 for example:
34233534
34243535 repeat 8
3425 mov byte [bx],%
3426 inc bx
3536 mov byte [bx],%
3537 inc bx
34273538 end repeat
34283539
34293540 The generated code will store byte values from one to eight in the memory
34363547
34373548 s = x/2
34383549 repeat 100
3439 if x/s = s
3440 break
3441 end if
3442 s = (s+x/s)/2
3550 if x/s = s
3551 break
3552 end if
3553 s = (s+x/s)/2
34433554 end repeat
34443555
34453556 The "while" directive repeats the block of instructions as long as the
34543565
34553566 s = x/2
34563567 while x/s <> s
3457 s = (s+x/s)/2
3458 if % = 100
3459 break
3460 end if
3568 s = (s+x/s)/2
3569 if % = 100
3570 break
3571 end if
34613572 end while
34623573
34633574 The blocks defined with "if", "repeat" and "while" can be nested in any
35033614 generated in current addressing space you can use such block of directives:
35043615
35053616 repeat $-$$
3506 load a byte from $$+%-1
3507 store byte a xor c at $$+%-1
3617 load a byte from $$+%-1
3618 store byte a xor c at $$+%-1
35083619 end repeat
35093620
35103621 and each byte of code will be xored with the value defined by "c" constant.
35213632
35223633 GDTR dp ?
35233634 virtual at GDTR
3524 GDT_limit dw ?
3525 GDT_address dd ?
3635 GDT_limit dw ?
3636 GDT_address dd ?
35263637 end virtual
35273638
35283639 It defines two labels for parts of the 48-bit variable at "GDTR" address.
35303641 register, for example:
35313642
35323643 virtual at bx
3533 LDT_limit dw ?
3534 LDT_address dd ?
3644 LDT_limit dw ?
3645 LDT_address dd ?
35353646 end virtual
35363647
35373648 With such definition instruction "mov ax,[LDT_limit]" will be assembled
35443655 example:
35453656
35463657 virtual at 0
3547 xor eax,eax
3548 and edx,eax
3549 load zeroq dword from 0
3658 xor eax,eax
3659 and edx,eax
3660 load zeroq dword from 0
35503661 end virtual
35513662
35523663 The above piece of code will define the "zeroq" constant containing four bytes
35553666 For example this code:
35563667
35573668 virtual at 0
3558 file 'a.txt':10h,1
3559 load char from 0
3669 file 'a.txt':10h,1
3670 load char from 0
35603671 end virtual
35613672
35623673 loads the single byte from offset 10h in file "a.txt" into the "char"
35763687 has been closed:
35773688
35783689 virtual at 0
3579 hex_digits::
3580 db '0123456789ABCDEF'
3690 hex_digits::
3691 db '0123456789ABCDEF'
35813692 end virtual
35823693 load a byte from hex_digits:10
35833694
36043715 create the alignment yourself, like:
36053716
36063717 virtual
3607 align 16
3608 a = $ - $$
3718 align 16
3719 a = $ - $$
36093720 end virtual
36103721 db a dup 0
36113722
36193730 bits = 16
36203731 display 'Current offset is 0x'
36213732 repeat bits/4
3622 d = '0' + $ shr (bits-%*4) and 0Fh
3623 if d > '9'
3624 d = d + 'A'-'9'-1
3625 end if
3626 display d
3733 d = '0' + $ shr (bits-%*4) and 0Fh
3734 if d > '9'
3735 d = d + 'A'-'9'-1
3736 end if
3737 display d
36273738 end repeat
36283739 display 13,10
36293740
36733784 Consider the following example:
36743785
36753786 if ~ defined alpha
3676 alpha:
3787 alpha:
36773788 end if
36783789
36793790 The "defined" operator gives the true value when the expression following it
36953806 condition may make it possible to get it resolved:
36963807
36973808 if ~ defined alpha | defined @f
3698 alpha:
3699 @@:
3809 alpha:
3810 @@:
37003811 end if
37013812
37023813 The "@f" is always the same label as the nearest "@@" symbol in the source
37083819 look at the blocks that has nothing more than this self-establishing:
37093820
37103821 if defined @f
3711 @@:
3822 @@:
37123823 end if
37133824
37143825 This is an example of source that may have more than one solution, as both
38813992
38823993 macro stos0
38833994 {
3884 xor al,al
3885 stosb
3995 xor al,al
3996 stosb
38863997 }
38873998
38883999 The macroinstruction "stos0" will be replaced with these two assembly
39084019 macro mov op1,op2
39094020 {
39104021 if op1 in <ds,es,fs,gs,ss> & op2 in <cs,ds,es,fs,gs,ss>
3911 push op2
3912 pop op1
4022 push op2
4023 pop op1
39134024 else
3914 mov op1,op2
4025 mov op1,op2
39154026 end if
39164027 }
39174028
39244035 macro mov op1,op2,op3
39254036 {
39264037 if op3 eq
3927 mov op1,op2
4038 mov op1,op2
39284039 else
3929 mov op1,op2
3930 mov op2,op3
4040 mov op1,op2
4041 mov op2,op3
39314042 end if
39324043 }
39334044
39714082
39724083 macro stoschar [char]
39734084 {
3974 mov al,char
3975 stosb
4085 mov al,char
4086 stosb
39764087 }
39774088
39784089 This macroinstruction accepts unlimited number of arguments, and each one
39974108
39984109 macro movstr
39994110 {
4000 local move
4111 local move
40014112 move:
4002 lodsb
4003 stosb
4004 test al,al
4005 jnz move
4113 lodsb
4114 stosb
4115 test al,al
4116 jnz move
40064117 }
40074118
40084119 Each time this macroinstruction is used, "move" will become other unique name
40274138 macro strtbl name,[string]
40284139 {
40294140 common
4030 label name dword
4141 label name dword
40314142 forward
4032 local label
4033 dd label
4143 local label
4144 dd label
40344145 forward
4035 label db string,0
4146 label db string,0
40364147 }
40374148
40384149 First argument given to this macroinstruction will become the label for table
40804191
40814192 macro jif op1,cond,op2,label
40824193 {
4083 cmp op1,op2
4084 j#cond label
4194 cmp op1,op2
4195 j#cond label
40854196 }
40864197
40874198 For example "jif ax,ae,10h,exit" will be assembled as "cmp ax,10h" and
40974208
40984209 macro label name
40994210 {
4100 label name
4101 if ~ used name
4102 display `name # " is defined but not used.",13,10
4103 end if
4211 label name
4212 if ~ used name
4213 display `name # " is defined but not used.",13,10
4214 end if
41044215 }
41054216
41064217 When label defined with such macro is not used in the source, macro will warn
41134224 macro message arg
41144225 {
41154226 if arg eqtype ""
4116 local str
4117 jmp @f
4118 str db arg,0Dh,0Ah,24h
4119 @@:
4120 mov dx,str
4227 local str
4228 jmp @f
4229 str db arg,0Dh,0Ah,24h
4230 @@:
4231 mov dx,str
41214232 else
4122 mov dx,arg
4233 mov dx,arg
41234234 end if
4124 mov ah,9
4125 int 21h
4235 mov ah,9
4236 int 21h
41264237 }
41274238
41284239 The above macro is designed for displaying messages in DOS programs. When the
41474258 {
41484259 macro instr op1,op2,op3
41494260 \{
4150 if op3 eq
4151 instr op1,op2
4152 else
4153 instr op1,op2
4154 instr op2,op3
4155 end if
4261 if op3 eq
4262 instr op1,op2
4263 else
4264 instr op1,op2
4265 instr op2,op3
4266 end if
41564267 \}
41574268 }
41584269
41844295 defines an alternative syntax for defining macroinstructions, which looks like:
41854296
41864297 MACRO stoschar char
4187 mov al,char
4188 stosb
4298 mov al,char
4299 stosb
41894300 ENDM
41904301
41914302 Note that symbol that has such customized definition must be defined with "fix"
42264337
42274338 struc point x,y
42284339 {
4229 .x dw x
4230 .y dw y
4340 .x dw x
4341 .y dw y
42314342 }
42324343
42334344 For example "my point 7,11" will define structure labeled "my", consisting of
42424353 struc db [data]
42434354 {
42444355 common
4245 . db data
4246 .size = $ - .
4356 . db data
4357 .size = $ - .
42474358 }
42484359
42494360 With such definition "msg db 'Hello!',13,10" will define also "msg.size"
42774388
42784389 rept 3 counter
42794390 {
4280 byte#counter db counter
4391 byte#counter db counter
42814392 }
42824393
42834394 will generate lines:
Binary diff not shown
13771377 ret
13781378 data_bytes:
13791379 call define_data
1380 define_data_byte:
13811380 jc instruction_assembled
13821381 lods byte [esi]
13831382 cmp al,'('
14231422 mov [base_code],0
14241423 define_words:
14251424 call define_data
1426 define_data_word:
14271425 jc instruction_assembled
14281426 lods byte [esi]
14291427 cmp al,'('
14631461 ret
14641462 data_dwords:
14651463 call define_data
1466 define_data_dword:
14671464 jc instruction_assembled
14681465 lods byte [esi]
14691466 cmp al,'('
15081505 ret
15091506 data_pwords:
15101507 call define_data
1511 define_data_pword:
15121508 jc instruction_assembled
15131509 lods byte [esi]
15141510 cmp al,'('
15571553 ret
15581554 data_qwords:
15591555 call define_data
1560 define_data_qword:
15611556 jc instruction_assembled
15621557 lods byte [esi]
15631558 cmp al,'('
15791574 ret
15801575 data_twords:
15811576 call define_data
1582 define_data_tword:
15831577 jc instruction_assembled
15841578 lods byte [esi]
15851579 cmp al,'('
640640 cmp ah,16
641641 jne invalid_operand_size
642642 mov [postbyte_register],al
643 avx_movd_reg_ready:
644 test [rex_prefix],8
645 jz nomem_instruction_ready
646 cmp [code_type],64
647 jne illegal_instruction
643648 jmp nomem_instruction_ready
644649 avx_movd_xmmreg:
645650 sub [extended_code],10h
675680 cmp ah,[mmx_size]
676681 jne invalid_operand_size
677682 mov bl,al
678 jmp nomem_instruction_ready
683 jmp avx_movd_reg_ready
679684 avx_movq_xmmreg_xmmreg:
680685 cmp [mmx_size],8
681686 jne invalid_operand
21102115 mov cl,4
21112116 jmp avx_pinsr_instruction_3a
21122117 avx_pinsrq_instruction:
2118 cmp [code_type],64
2119 jne illegal_instruction
21132120 mov cl,8
21142121 or [rex_prefix],8
21152122 avx_pinsr_instruction_3a:
24482455 cmp al,','
24492456 jne invalid_operand
24502457 lods byte [esi]
2458 call get_size_operator
24512459 cmp al,'['
24522460 jne invalid_operand
24532461 call get_address
3232 ; cannot simply be copied and put under another distribution licence
3333 ; (including the GNU Public Licence).
3434
35 VERSION_STRING equ "1.71.59"
35 VERSION_STRING equ "1.71.60"
3636
3737 VERSION_MAJOR = 1
3838 VERSION_MINOR = 71
00
11 Visit http://flatassembler.net/ for more information.
2
3
4 version 1.71.60 (Feb 05, 2017)
5
6 [+] Updated documentation.
7
8 [-] Minor corrections in error detection of some AVX instruction handlers.
29
310
411 version 1.71.59 (Jan 20,2017)