A Tiny Guide to GCC Inline Assembly 12 Jun 2008
So I’m working on a real-time operating system for school, and in the process I’ve needed to write a ton of IA32 inline assembly. GCC’s inline assembly syntax isn’t immediately straightforward so it’s been an interesting process of trial, error, and documentation to piece together the specifics. This guide presents my accumulated knowledge on the subject.
GCC uses AT&T assembly syntax. The highlights:
- instruction source, destination
The first operand is the source, the second is the destination.
Register names are prefixed with a percent sign. (Or a
%%in certain circumstances; see the second on operands below.)
Literal values are prefixed with a dollar sign. The literal
$10specifies decimal 10 while
$0x10specifies hexadecimal 16.
The instruction suffix denotes the operand size. The
lspecify byte (8-bit), word (16-bit), and long word (32-bit) memory references. (Always include the size! If you omit it the GNU assembler will attempt to guess for you which is usually a Bad Idea.)
- segment:offset(base, index, scale)
Memory access syntax. Note that the offset and scale constants are not prefixed with
$but the register references still need a
- ljmp/lcall $segment, $offset
Control transfer instructions may be prefixed with an
lto indicate a far jump to another code segment. (Similarly, there is
Branch addressing using literals or registers is prefixed with an asterisk.
Here are a few examples of valid code that illustrate these points.
The basic format for GCC inline assembly is as follows.
For example, below is code to turn on bit 1 in
flag then store the value in
__asm__ keyword marks the start of the inline assembly statement. While
asm without the underscores is also valid in some contexts, it will not
compile with the
-std=c99 option. Moreover, the underscores prevent conflicts
asm defined elsewhere in your code.
__volatile__ keyword indicates the assembly code has important
side-effects and guarantees GCC will not delete it if it is reachable. It does
not, however, guarantee that the assembly code will not be moved relative to
The assembly code specifies the instructions to execute. Each instruction (or label) is enclosed within double quotes and terminated by a newline.
The general pattern for an operand is
"constraint"(expression) and multiple
operands are separated by commas.
In the assembly code each operand is reference by number, where
%0 is the
first output operand,
%1 is the second, and so on, and
%N-1 is the last
input operand. Because the operands are indicated by a percent sign the
register names must now be prefixed with two percent signs, like
C expressions provide the input and output operands for the assembly code. An output expression (an lvalue) specifies where a result should be stored. An input expression specifies either a location (lvalue) or value (rvalue) as input to the code.
Constraints help to decided the addressing mode and registers used for the input and output operands. Of the many constraints available, only a few are used frequently. These we discuss below.
- m: The operand is stored in memory, at any memory address. (Instructions will operate on the data directly in memory.)
- r: The operand is stored in a general-purpose register. (GCC generates code to transfer the operand to or from memory and the register it chooses.)
- i: The operand is an immediate integer.
- 0,…,9: The operand matches the operand with the specified number. (GCC will use the same variable for both operands. The two operands that match must be one input-only operand and one output-only operand.)
Constraints may also have modifiers which provide additional control over the behavior of the operands. Three common constraints are:
- =: Operand is write-only
- +: Operand is both read and written
- &: Operand is clobbered early (i.e., is modified before the instruction is finished using the input operands, meaning it may not lie in a register used as an input operand or any part of memory)
The clobber list should contain:
- The registers modified, either explicitly or implicitly, by your code.
- If your code modifies the condition code register, “cc”.
- If your code modifies memory, “memory”.
The clobber list informs GCC of the state potentially changed by your code so it won’t make incorrect assumptions about the state and break things (always a Bad Thing).
To further illustrate all the stuff stuffed into this guide, I’ve pulled a few examples from my operating system.
To load the interrupt descriptor table register:
To set the kernel code segment:
To move bytes:
I pulled this information from a variety of sources, chief among them: