|
0 |
|
|
1 |
flat assembler
|
|
2 |
Symbolic information file format
|
|
3 |
|
|
4 |
|
|
5 |
Table 1 Header
|
|
6 |
/-------------------------------------------------------------------------\
|
|
7 |
| Offset | Size | Description |
|
|
8 |
|========|=========|======================================================|
|
|
9 |
| +0 | dword | Signature 1A736166h (little-endian). |
|
|
10 |
|--------|---------|------------------------------------------------------|
|
|
11 |
| +4 | byte | Major version of flat assembler. |
|
|
12 |
|--------|---------|------------------------------------------------------|
|
|
13 |
| +5 | byte | Minor version of flat assembler. |
|
|
14 |
|--------|---------|------------------------------------------------------|
|
|
15 |
| +6 | word | Length of header. |
|
|
16 |
|--------|---------|------------------------------------------------------|
|
|
17 |
| +8 | dword | Offset of input file name in the strings table. |
|
|
18 |
|--------|---------|------------------------------------------------------|
|
|
19 |
| +12 | dword | Offset of output file name in the strings table. |
|
|
20 |
|--------|---------|------------------------------------------------------|
|
|
21 |
| +16 | dword | Offset of strings table. |
|
|
22 |
|--------|---------|------------------------------------------------------|
|
|
23 |
| +20 | dword | Length of strings table. |
|
|
24 |
|--------|---------|------------------------------------------------------|
|
|
25 |
| +24 | dword | Offset of symbols table. |
|
|
26 |
|--------|---------|------------------------------------------------------|
|
|
27 |
| +28 | dword | Length of symbols table. |
|
|
28 |
|--------|---------|------------------------------------------------------|
|
|
29 |
| +32 | dword | Offset of preprocessed source. |
|
|
30 |
|--------|---------|------------------------------------------------------|
|
|
31 |
| +36 | dword | Length of preprocessed source. |
|
|
32 |
|--------|---------|------------------------------------------------------|
|
|
33 |
| +40 | dword | Offset of assembly dump. |
|
|
34 |
|--------|---------|------------------------------------------------------|
|
|
35 |
| +44 | dword | Length of assembly dump. |
|
|
36 |
|--------|---------|------------------------------------------------------|
|
|
37 |
| +48 | dword | Offset of section names table. |
|
|
38 |
|--------|---------|------------------------------------------------------|
|
|
39 |
| +52 | dword | Length of section names table. |
|
|
40 |
|--------|---------|------------------------------------------------------|
|
|
41 |
| +56 | dword | Offset of symbol references dump. |
|
|
42 |
|--------|---------|------------------------------------------------------|
|
|
43 |
| +60 | dword | Length of symbol references dump. |
|
|
44 |
\-------------------------------------------------------------------------/
|
|
45 |
|
|
46 |
Notes:
|
|
47 |
|
|
48 |
If header is shorter than 64 bytes, it comes from a version that does not
|
|
49 |
support dumping some of the structures. It should then be interpreted
|
|
50 |
that the data for missing structures could not be provided, not that the
|
|
51 |
size of that data is zero.
|
|
52 |
|
|
53 |
Offsets given in header generally mean positions in the file, however
|
|
54 |
input and output file names are specified by offsets in the strings table,
|
|
55 |
so you have to add their offset to the offset of strings table to obtain
|
|
56 |
the positions of those strings in the file.
|
|
57 |
|
|
58 |
The strings table contains just a sequence of ASCIIZ strings, which may
|
|
59 |
be referred to by other parts of the file. It contains the names of
|
|
60 |
main input file, the output file, and the names of the sections and
|
|
61 |
external symbols if there were any.
|
|
62 |
|
|
63 |
The symbols table is an array of 32-byte structures, each one in format
|
|
64 |
specified by table 2.
|
|
65 |
|
|
66 |
The preprocessed source is a sequence of preprocessed lines, each one
|
|
67 |
in format as defined in table 3.
|
|
68 |
|
|
69 |
The assembly dump contains an array of 28-byte structures, each one in
|
|
70 |
format specified by table 4, and at the end of this array an additional
|
|
71 |
double word containing the offset in output file at which the assembly
|
|
72 |
was ended.
|
|
73 |
|
|
74 |
It is possible that file does not contain assembly dump at all - this
|
|
75 |
happens when some error occured and only the preprocessed source was
|
|
76 |
dumped. If error occured during the preprocessing, only the source up to
|
|
77 |
the point of error is provided. In such case (and only then) the field
|
|
78 |
at offset 44 contains zero.
|
|
79 |
|
|
80 |
The section names table exists only when the output format was an object
|
|
81 |
file (ELF or COFF), and it is an array of 4-byte entries, each being an
|
|
82 |
offset of the name of the section in the strings table.
|
|
83 |
The index of section in this table is the same, as the index of section
|
|
84 |
in the generated object file.
|
|
85 |
|
|
86 |
The symbol references dump contains an array of 8-byte structures, each
|
|
87 |
one describes an event of some symbol being used. The first double word
|
|
88 |
of such structure contains an offset of symbol in the symbols table,
|
|
89 |
and the second double word is an offset of structure in assembly dump,
|
|
90 |
which specifies at what moment the symbol was referenced.
|
|
91 |
|
|
92 |
|
|
93 |
Table 2 Symbol structure
|
|
94 |
/-------------------------------------------------------------------------\
|
|
95 |
| Offset | Size | Description |
|
|
96 |
|========|=======|========================================================|
|
|
97 |
| +0 | qword | Value of symbol. |
|
|
98 |
|--------|-------|--------------------------------------------------------|
|
|
99 |
| +8 | word | Flags (table 2.1). |
|
|
100 |
|--------|-------|--------------------------------------------------------|
|
|
101 |
| +10 | byte | Size of data labelled by this symbol (zero means plain |
|
|
102 |
| | | label without size attached). |
|
|
103 |
|--------|-------|--------------------------------------------------------|
|
|
104 |
| +11 | byte | Type of value (table 2.2). Any value other than zero |
|
|
105 |
| | | means some kind of relocatable symbol. |
|
|
106 |
|--------|-------|--------------------------------------------------------|
|
|
107 |
| +12 | dword | Extended SIB, the first two bytes are register codes |
|
|
108 |
| | | and the second two bytes are corresponding scales. |
|
|
109 |
|--------|-------|--------------------------------------------------------|
|
|
110 |
| +16 | word | Number of pass in which symbol was defined last time. |
|
|
111 |
|--------|-------|--------------------------------------------------------|
|
|
112 |
| +18 | word | Number of pass in which symbol was used last time. |
|
|
113 |
|--------|-------|--------------------------------------------------------|
|
|
114 |
| +20 | dword | If the symbol is relocatable, this field contains |
|
|
115 |
| | | information about section or external symbol, to which |
|
|
116 |
| | | it is relative - otherwise this field has no meaning. |
|
|
117 |
| | | When the highest bit is cleared, the symbol is |
|
|
118 |
| | | relative to a section, and the bits 0-30 contain |
|
|
119 |
| | | the index (starting from 1) in the table of sections. |
|
|
120 |
| | | When the highest bit is set, the symbol is relative to |
|
|
121 |
| | | an external symbol, and the bits 0-30 contain the |
|
|
122 |
| | | the offset of the name of this symbol in the strings |
|
|
123 |
| | | table. |
|
|
124 |
|--------|-------|--------------------------------------------------------|
|
|
125 |
| +24 | dword | If the highest bit is cleared, the bits 0-30 contain |
|
|
126 |
| | | the offset of symbol name in the preprocessed source. |
|
|
127 |
| | | This name is a pascal-style string (byte length |
|
|
128 |
| | | followed by string data). |
|
|
129 |
| | | Zero in this field means an anonymous symbol. |
|
|
130 |
| | | If the highest bit is set, the bits 0-30 contain the |
|
|
131 |
| | | offset of the symbol name in the strings table, and |
|
|
132 |
| | | this name is a zero-ended string in this case (as are |
|
|
133 |
| | | all the strings there). |
|
|
134 |
|--------|-------|--------------------------------------------------------|
|
|
135 |
| +28 | dword | Offset in the preprocessed source of line that defined |
|
|
136 |
| | | this symbol (see table 3). |
|
|
137 |
\-------------------------------------------------------------------------/
|
|
138 |
|
|
139 |
|
|
140 |
Table 2.1 Symbol flags
|
|
141 |
/-----------------------------------------------------------------\
|
|
142 |
| Bit | Value | Description |
|
|
143 |
|=====|=======|===================================================|
|
|
144 |
| 0 | 1 | Symbol was defined. |
|
|
145 |
|-----|-------|---------------------------------------------------|
|
|
146 |
| 1 | 2 | Symbol is an assembly-time variable. |
|
|
147 |
|-----|-------|---------------------------------------------------|
|
|
148 |
| 2 | 4 | Symbol cannot be forward-referenced. |
|
|
149 |
|-----|-------|---------------------------------------------------|
|
|
150 |
| 3 | 8 | Symbol was used. |
|
|
151 |
|-----|-------|---------------------------------------------------|
|
|
152 |
| 4 | 10h | The prediction was needed when checking |
|
|
153 |
| | | whether the symbol was used. |
|
|
154 |
|-----|-------|---------------------------------------------------|
|
|
155 |
| 5 | 20h | Result of last predicted check for being used. |
|
|
156 |
|-----|-------|---------------------------------------------------|
|
|
157 |
| 6 | 40h | The prediction was needed when checking |
|
|
158 |
| | | whether the symbol was defined. |
|
|
159 |
|-----|-------|---------------------------------------------------|
|
|
160 |
| 7 | 80h | Result of last predicted check for being defined. |
|
|
161 |
|-----|-------|---------------------------------------------------|
|
|
162 |
| 8 | 100h | The optimization adjustment is applied to |
|
|
163 |
| | | the value of this symbol. |
|
|
164 |
|-----|-------|---------------------------------------------------|
|
|
165 |
| 9 | 200h | The value of symbol is negative number encoded |
|
|
166 |
| | | as two's complement. |
|
|
167 |
|-----|-------|---------------------------------------------------|
|
|
168 |
| 10 | 400h | Symbol is a special marker and has no value. |
|
|
169 |
\-----------------------------------------------------------------/
|
|
170 |
|
|
171 |
Notes:
|
|
172 |
|
|
173 |
Some of those flags are listed here just for completness, as they
|
|
174 |
have little use outside of the flat assembler. However the bit 0
|
|
175 |
is important, because the symbols table contains all the labels
|
|
176 |
that occured in source, even if some of them were in the
|
|
177 |
conditional blocks that did not get assembled.
|
|
178 |
|
|
179 |
|
|
180 |
Table 2.2 Symbol value types
|
|
181 |
/-------------------------------------------------------------------\
|
|
182 |
| Value | Description |
|
|
183 |
|=======|===========================================================|
|
|
184 |
| 0 | Absolute value. |
|
|
185 |
|-------|-----------------------------------------------------------|
|
|
186 |
| 1 | Relocatable segment address (only with MZ output). |
|
|
187 |
|-------|-----------------------------------------------------------|
|
|
188 |
| 2 | Relocatable 32-bit address. |
|
|
189 |
|-------|-----------------------------------------------------------|
|
|
190 |
| 3 | Relocatable relative 32-bit address (value valid only for |
|
|
191 |
| | symbol used in the same place where it was calculated, |
|
|
192 |
| | it should not occur in the symbol structure). |
|
|
193 |
|-------|-----------------------------------------------------------|
|
|
194 |
| 4 | Relocatable 64-bit address. |
|
|
195 |
|-------|-----------------------------------------------------------|
|
|
196 |
| 5 | [ELF only] GOT-relative 32-bit address. |
|
|
197 |
|-------|-----------------------------------------------------------|
|
|
198 |
| 6 | [ELF only] 32-bit address of PLT entry. |
|
|
199 |
|-------|-----------------------------------------------------------|
|
|
200 |
| 7 | [ELF only] Relative 32-bit address of PLT entry (value |
|
|
201 |
| | valid only for symbol used in the same place where it |
|
|
202 |
| | was calculated, it should not occur in the symbol |
|
|
203 |
| | structure). |
|
|
204 |
\-------------------------------------------------------------------/
|
|
205 |
|
|
206 |
Notes:
|
|
207 |
|
|
208 |
The types 3 and 7 should never be encountered in the symbols dump,
|
|
209 |
they are only used internally by the flat assembler.
|
|
210 |
|
|
211 |
If type value is a negative number, it is an opposite of a value
|
|
212 |
from this table and it means that the symbol of a given type has
|
|
213 |
been negated.
|
|
214 |
|
|
215 |
|
|
216 |
Table 2.3 Register codes for extended SIB
|
|
217 |
/------------------\
|
|
218 |
| Value | Register |
|
|
219 |
|=======|==========|
|
|
220 |
| 23h | BX |
|
|
221 |
|-------|----------|
|
|
222 |
| 25h | BP |
|
|
223 |
|-------|----------|
|
|
224 |
| 26h | SI |
|
|
225 |
|-------|----------|
|
|
226 |
| 27h | DI |
|
|
227 |
|-------|----------|
|
|
228 |
| 40h | EAX |
|
|
229 |
|-------|----------|
|
|
230 |
| 41h | ECX |
|
|
231 |
|-------|----------|
|
|
232 |
| 42h | EDX |
|
|
233 |
|-------|----------|
|
|
234 |
| 43h | EBX |
|
|
235 |
|-------|----------|
|
|
236 |
| 44h | ESP |
|
|
237 |
|-------|----------|
|
|
238 |
| 45h | EBP |
|
|
239 |
|-------|----------|
|
|
240 |
| 46h | ESI |
|
|
241 |
|-------|----------|
|
|
242 |
| 47h | EDI |
|
|
243 |
|-------|----------|
|
|
244 |
| 48h | R8D |
|
|
245 |
|-------|----------|
|
|
246 |
| 49h | R9D |
|
|
247 |
|-------|----------|
|
|
248 |
| 4Ah | R10D |
|
|
249 |
|-------|----------|
|
|
250 |
| 4Bh | R11D |
|
|
251 |
|-------|----------|
|
|
252 |
| 4Ch | R12D |
|
|
253 |
|-------|----------|
|
|
254 |
| 4Dh | R13D |
|
|
255 |
|-------|----------|
|
|
256 |
| 4Eh | R14D |
|
|
257 |
|-------|----------|
|
|
258 |
| 4Fh | R15D |
|
|
259 |
|-------|----------|
|
|
260 |
| 80h | RAX |
|
|
261 |
|-------|----------|
|
|
262 |
| 81h | RCX |
|
|
263 |
|-------|----------|
|
|
264 |
| 82h | RDX |
|
|
265 |
|-------|----------|
|
|
266 |
| 83h | RBX |
|
|
267 |
|-------|----------|
|
|
268 |
| 84h | RSP |
|
|
269 |
|-------|----------|
|
|
270 |
| 85h | RBP |
|
|
271 |
|-------|----------|
|
|
272 |
| 86h | RSI |
|
|
273 |
|-------|----------|
|
|
274 |
| 87h | RDI |
|
|
275 |
|-------|----------|
|
|
276 |
| 88h | R8 |
|
|
277 |
|-------|----------|
|
|
278 |
| 89h | R9 |
|
|
279 |
|-------|----------|
|
|
280 |
| 8Ah | R10 |
|
|
281 |
|-------|----------|
|
|
282 |
| 8Bh | R11 |
|
|
283 |
|-------|----------|
|
|
284 |
| 8Ch | R12 |
|
|
285 |
|-------|----------|
|
|
286 |
| 8Dh | R13 |
|
|
287 |
|-------|----------|
|
|
288 |
| 8Eh | R14 |
|
|
289 |
|-------|----------|
|
|
290 |
| 8Fh | R15 |
|
|
291 |
|-------|----------|
|
|
292 |
| 94h | EIP |
|
|
293 |
|-------|----------|
|
|
294 |
| 98h | RIP |
|
|
295 |
\------------------/
|
|
296 |
|
|
297 |
|
|
298 |
Table 3 Preprocessed line
|
|
299 |
/--------------------------------------------------------------------------\
|
|
300 |
| Offset | Size | Value |
|
|
301 |
|========|=================================================================|
|
|
302 |
| +0 | dword | When the line was loaded from source, this field |
|
|
303 |
| | | contains either zero (if it is the line from the main |
|
|
304 |
| | | input file), or an offset inside the preprocessed |
|
|
305 |
| | | source to the name of file, from which this line was |
|
|
306 |
| | | loaded (the name of file is zero-ended string). |
|
|
307 |
| | | When the line was generated by macroinstruction, this |
|
|
308 |
| | | field contains offset inside the preprocessed source to |
|
|
309 |
| | | the pascal-style string specifying the name of |
|
|
310 |
| | | macroinstruction, which generated this line. |
|
|
311 |
|--------|-------|---------------------------------------------------------|
|
|
312 |
| +4 | dword | Bits 0-30 contain the number of this line. |
|
|
313 |
| | | If the highest bit is zeroed, this line was loaded from |
|
|
314 |
| | | source. |
|
|
315 |
| | | If the highest bit is set, this line was generated by |
|
|
316 |
| | | macroinstruction. |
|
|
317 |
|--------|-------|---------------------------------------------------------|
|
|
318 |
| +8 | dword | If the line was loaded from source, this field contains |
|
|
319 |
| | | the position of the line inside the source file, from |
|
|
320 |
| | | which it was loaded. |
|
|
321 |
| | | If line was generated by macroinstruction, this field |
|
|
322 |
| | | contains the offset of preprocessed line, which invoked |
|
|
323 |
| | | the macroinstruction. |
|
|
324 |
| | | If line was generated by instantaneous macro, this |
|
|
325 |
| | | field is equal to the next one. |
|
|
326 |
|--------|-------|---------------------------------------------------------|
|
|
327 |
| +12 | dword | If the line was generated by macroinstruction, this |
|
|
328 |
| | | field contains offset of the preprocessed line inside |
|
|
329 |
| | | the definition of macro, from which this one was |
|
|
330 |
| | | generated. |
|
|
331 |
|--------|-------|---------------------------------------------------------|
|
|
332 |
| +16 | ? | The tokenized contents of line. |
|
|
333 |
\--------------------------------------------------------------------------/
|
|
334 |
|
|
335 |
Notes:
|
|
336 |
|
|
337 |
To determine, whether this is the line loaded from source, or generated by
|
|
338 |
macroinstruction, you need to check the highest bit of the second double
|
|
339 |
word.
|
|
340 |
|
|
341 |
The contents of line is no longer a text, which it was in source file,
|
|
342 |
but a sequence of tokens, ended with a zero byte.
|
|
343 |
Any chain of characters that aren't special ones, separated from other
|
|
344 |
similar chains with spaces or some other special characters, is converted
|
|
345 |
into symbol token. The first byte of this element has the value of 1Ah,
|
|
346 |
the second byte is the count of characters, followed by this amount of
|
|
347 |
bytes, which build the symbol.
|
|
348 |
Some characters have a special meaning, and cannot occur inside the
|
|
349 |
symbol, they split the symbols and are converted into separate tokens.
|
|
350 |
For example, if source contains this line of text:
|
|
351 |
|
|
352 |
mov ax,4
|
|
353 |
|
|
354 |
preprocessor converts it into the chain of bytes, shown here with their
|
|
355 |
hexadecimal values (characters corresponding to some of those values are
|
|
356 |
placed below the hexadecimal codes):
|
|
357 |
|
|
358 |
1A 03 6D 6F 76 1A 02 61 78 2C 1A 01 34 00
|
|
359 |
m o v a x , 4
|
|
360 |
|
|
361 |
The third type of token that can be found in preprocessed line is the
|
|
362 |
quoted text. This element is created from chain of any bytes other than
|
|
363 |
line breaks that are placed between the single or double quotes in the
|
|
364 |
original text. First byte of such element is always 22h, it is followed
|
|
365 |
by double word which specifies the number of bytes that follow, and the
|
|
366 |
value of quoted text comes next. For example, this line from source:
|
|
367 |
|
|
368 |
mov eax,'ABCD'
|
|
369 |
|
|
370 |
is converted into (the notation used is the same as in previous sample):
|
|
371 |
|
|
372 |
1A 03 6D 6F 76 1A 03 65 61 78 2C 22 04 00 00 00 41 42 43 44 00
|
|
373 |
m o v e a x , A B C D
|
|
374 |
|
|
375 |
This data defines two symbols followed by symbol character, quoted text
|
|
376 |
and zero byte that marks end of line.
|
|
377 |
There is also a special case of symbol token with first byte having the
|
|
378 |
value 3Bh instead of 1Ah, such symbol means that all the line elements
|
|
379 |
that follow, including this one, have already been interpreted by
|
|
380 |
preprocessor and are ignored by assembler.
|
|
381 |
|
|
382 |
|
|
383 |
Table 4 Row of the assembly dump
|
|
384 |
/-------------------------------------------------------------------------\
|
|
385 |
| Offset | Size | Description |
|
|
386 |
|========|=======|========================================================|
|
|
387 |
| +0 | dword | Offset in output file. |
|
|
388 |
|--------|-------|--------------------------------------------------------|
|
|
389 |
| +4 | dword | Offset of line in preprocessed source. |
|
|
390 |
|--------|-------|--------------------------------------------------------|
|
|
391 |
| +8 | qword | Value of $ address. |
|
|
392 |
|--------|-------|--------------------------------------------------------|
|
|
393 |
| +16 | dword | Extended SIB for the $ address, the first two bytes |
|
|
394 |
| | | are register codes and the second two bytes are |
|
|
395 |
| | | corresponding scales. |
|
|
396 |
|--------|-------|--------------------------------------------------------|
|
|
397 |
| +20 | dword | If the $ address is relocatable, this field contains |
|
|
398 |
| | | information about section or external symbol, to which |
|
|
399 |
| | | it is relative - otherwise this field is zero. |
|
|
400 |
| | | When the highest bit is cleared, the address is |
|
|
401 |
| | | relative to a section, and the bits 0-30 contain |
|
|
402 |
| | | the index (starting from 1) in the table of sections. |
|
|
403 |
| | | When the highest bit is set, the address is relative |
|
|
404 |
| | | to an external symbol, and the bits 0-30 contain the |
|
|
405 |
| | | the offset of the name of this symbol in the strings |
|
|
406 |
| | | table. |
|
|
407 |
|--------|-------|--------------------------------------------------------|
|
|
408 |
| +24 | byte | Type of $ address value (as in table 2.2). |
|
|
409 |
|--------|-------|--------------------------------------------------------|
|
|
410 |
| +25 | byte | Type of code - possible values are 16, 32, and 64. |
|
|
411 |
|--------|-------|--------------------------------------------------------|
|
|
412 |
| +26 | byte | If the bit 0 is set, then at this point the assembly |
|
|
413 |
| | | was taking place inside the virtual block, and the |
|
|
414 |
| | | offset in output file has no meaning here. |
|
|
415 |
| | | If the bit 1 is set, the line was assembled at the |
|
|
416 |
| | | point, which was not included in the output file for |
|
|
417 |
| | | some other reasons (like inside the reserved data at |
|
|
418 |
| | | the end of section). |
|
|
419 |
|--------|-------|--------------------------------------------------------|
|
|
420 |
| +27 | byte | The higher bits of value of $ address. |
|
|
421 |
\-------------------------------------------------------------------------/
|
|
422 |
|
|
423 |
|
|
424 |
Notes:
|
|
425 |
|
|
426 |
Each row of the assembly dump informs, that the given line of preprocessed
|
|
427 |
source was assembled at the specified address (defined by its type, value
|
|
428 |
and the extended SIB) and at the specified position in output file.
|