A Tiny Guide to GCC Inline Assembly 12 Jun 2008
So I’m working on a real-time operating system for school, and in the process I’ve needed to write a ton of IA32 inline assembly. GCC’s inline assembly syntax isn’t immediately straightforward so it’s been an interesting process of trial, error, and documentation to piece together the specifics. This guide presents my accumulated knowledge on the subject.
Assembly Syntax
GCC uses AT&T assembly syntax. The highlights:
- instruction source, destination
The first operand is the source, the second is the destination. - %register
Register names are prefixed with a percent sign. (Or a%%
in certain circumstances; see the second on operands below.) - $literal
Literal values are prefixed with a dollar sign. The literal$10
specifies decimal 10 while$0x10
specifies hexadecimal 16. - instruction{b,w,l}
The instruction suffix denotes the operand size. Theb
,w
, andl
specify byte (8-bit), word (16-bit), and long word (32-bit) memory references. (Always include the size! If you omit it the GNU assembler will attempt to guess for you which is usually a Bad Idea.) - segment:offset(base, index, scale)
Memory access syntax. Note that the offset and scale constants are not prefixed with$
but the register references still need a%
. - ljmp/lcall $segment, $offset
Control transfer instructions may be prefixed with anl
to indicate a far jump to another code segment. (Similarly, there islret $stackadjust
.) - *branch-address
Branch addressing using literals or registers is prefixed with an asterisk.
Here are a few examples of valid code that illustrate these points.
Inline Syntax
The basic format for GCC inline assembly is as follows.
For example, below is code to turn on bit 1 in flag
then store the value in new_flag
.
Preamble
The __asm__
keyword marks the start of the inline assembly statement. While
using asm
without the underscores is also valid in some contexts, it will not
compile with the -std=c99
option. Moreover, the underscores prevent conflicts
with asm
defined elsewhere in your code.
The optional __volatile__
keyword indicates the assembly code has important
side-effects and guarantees GCC will not delete it if it is reachable. It does
not, however, guarantee that the assembly code will not be moved relative to
other code.
Code
The assembly code specifies the instructions to execute. Each instruction (or label) is enclosed within double quotes and terminated by a newline.
Operands
The general pattern for an operand is "constraint"(expression)
and multiple
operands are separated by commas.
In the assembly code each operand is reference by number, where %0
is the
first output operand, %1
is the second, and so on, and %N-1
is the last
input operand. Because the operands are indicated by a percent sign the
register names must now be prefixed with two percent signs, like %%eax
.
C expressions provide the input and output operands for the assembly code. An output expression (an lvalue) specifies where a result should be stored. An input expression specifies either a location (lvalue) or value (rvalue) as input to the code.
Constraints help to decided the addressing mode and registers used for the input and output operands. Of the many constraints available, only a few are used frequently. These we discuss below.
- m: The operand is stored in memory, at any memory address. (Instructions will operate on the data directly in memory.)
- r: The operand is stored in a general-purpose register. (GCC generates code to transfer the operand to or from memory and the register it chooses.)
- i: The operand is an immediate integer.
- 0,…,9: The operand matches the operand with the specified number. (GCC will use the same variable for both operands. The two operands that match must be one input-only operand and one output-only operand.)
Constraints may also have modifiers which provide additional control over the behavior of the operands. Three common constraints are:
- =: Operand is write-only
- +: Operand is both read and written
- &: Operand is clobbered early (i.e., is modified before the instruction is finished using the input operands, meaning it may not lie in a register used as an input operand or any part of memory)
Clobber List
The clobber list should contain:
- The registers modified, either explicitly or implicitly, by your code.
- If your code modifies the condition code register, “cc”.
- If your code modifies memory, “memory”.
The clobber list informs GCC of the state potentially changed by your code so it won’t make incorrect assumptions about the state and break things (always a Bad Thing).
Examples
To further illustrate all the stuff stuffed into this guide, I’ve pulled a few examples from my operating system.
To load the interrupt descriptor table register:
To set the kernel code segment:
To move bytes:
References
I pulled this information from a variety of sources, chief among them: