Introduction
There are a large number of processors out there that are based on the various processor architectures designed by ARM Ltd.. The capabilities of these processor vary greatly. Nevertheless ARM designed the instruction sets for its processor architectures so that the instruction sets of the more capable processors build on the instruction set of the simpler processors. A programmer can thus transfer most of the knowledge of programming for one type of ARM processor to programming another type of ARM processor.
I will be using the GNU variant of assembler directives that are used alongside ARM instructions to create a complete program. Please refer to the GNU Assembler user manual for detailed information on the assembler directives shown here.
Registers
In ARM, there are 16 32-bit registers available. These are
# | name | use | note |
---|---|---|---|
1 | R0 | General Purpose | The return code of a function is read from this register. |
2 | R1 | General Purpose | |
3 | R2 | General Purpose | |
4 | R3 | General Purpose | |
5 | R4 | General Purpose | |
6 | R5 | General Purpose | |
7 | R6 | General Purpose | |
8 | R7 | General Purpose | |
9 | R8 | General Purpose | |
10 | R9 | General Purpose | |
11 | R10 | General Purpose | |
12 | R11 | General Purpose | |
13 | R12 | General Purpose | |
14 | SP | Stack Pointer | |
15 | LR | Link Register | stores the return address from subroutines and interrupt service routines |
16 | PC | Program Counter | contains the start address of the current instruction |
Thumb
The classic ARM instruction set is a RISC instruction set with a fixed length of 32-bit. To achieve better code density, which is especially important in the Cortex-M series processors that are designed for use as microcontrollers, a 16-bit instruction set called Thumb, containing the most frequently used ARM instructions, was created. Thumb was later extended to the hybrid 16-bit/32-bit Thumb-2 instruction set in order to achieve good code density while also offering feature-rich 32-bit instructions.
Keep in mind that in Thumb (not Thumb-2) registers R8 through R12 are not available. While the ARM instruction set uses 32-bit numbers with a clever 12-bit-encoding scheme for its immediate values, Thumb uses plain 8-bit binary encoding for immediate values. Therefore, you can't use numbers larger than 255 as immediate value. Instead, you keep your 32-bit number in a memory pool close to the code and load the number into a register via PC-relative addressing as shown below.
ldr r1,[pc,#0x9] ... // 8 instructions in between large_num: .word 0x8410033a
You can also use the "=" prefix that instructs the GNU assembler to automatically insert a memory pool from which LDR loads the specified value:
ldr r1,=0x8410033a // automatically create a memory pool for storing '0x8410033a' ... // a few instructions bx // jump somewhere .pool // insert the memory pool here, close to the ldr instruction
GNU Assembler
In-code Directives
The following table lists frequently used GNU Assembler directives. The complete list can be found at the official documentation pages for device-independent directives plus ARM directives.
Syntax | Description |
---|---|
.syntax unified | Use modern assembler syntax + auto-generate IT instructions. Place this at the top of your source file. |
.weak <label>{,<label>} | Allow 'label' to be undefined. If it's undefined, it will have the value NULL (0x00000000). |
.weakref <label>,<default label> | Allow label to be undefined. If it's undefined, it will have the value of another label. |
.section <section name> | All output from now on goes into a new section with the given name. Sections are labelled chunks of code or data and are processed by the linker. |
.align <num zero bits> [,<pattern> [,<max skipped bytes>]] | Pad the output up to the first address that has the given number of low-order zero bits. (external documentation) |
.balign <num bytes> [,<pattern> [,<max skipped bytes>]] | Pad the output up to the first address that is a multiple of the given number of bytes. (external documentation) |
.long <value> | output a 32-bit value |
.text | Specifies that subsequent lines contain program code (same as '.section .text') |
.data | Specifies that subsequent lines contain data, no code. |
.global <label> | Makes the symbol with the given label visible to the linker. That is, "exports the symbol. Also used to mark external symbols (defined in another compile unit). |
.func <label> [, <actual label>] | Mark the beginning of function 'label', so the linker may exclude the block if not referenced. |
.endfunc | Marks the end of a function. |
.size <label>,<size> | Tells the linker how long (in bytes) the block is that this symbol points to. |
.thumb | Specifies that the Thumb instruction set is being used. Use this with .syntax unified. |
.thumb_func <label> | Marks the given entry point as a function with Thumb instructions. Thumb is required if the function is called by using 'bx' or 'blx'. |
.type <label>,%<type> | Specifies the type of the symbol. Required if there is a pointer to the function somewhere. |
.cpu <CPU type> | The CPU type can be, for example, cortex-m0, cortex-m3 or cortex-m4. |
.zero <num bytes> | Denotes a block of memory of given size and initialized to zeros everywhere. |
.fill < num of bytes>{, <value>{, <value_size>}} | Denotes a block of memory of given size and initialzed to the given value of the given size value (1, 2, or 4 for byte, half word, or word, respectively). |
.pool, .ltorg | Instructs the assembler to dump the current contents of the literal pool (=constants) to be dumped into the current .text section at the location where this directive is inserted (aligned to a word boundary). |
Sections
The table below lists the most important or frequently encountered sections that the assembler (or GCC) writes into the binary image.
Section Name | Description |
---|---|
.data | Global, static variables are stored here. The C variable declaration "static int i;", for example, will cause an entry in this section. |
.rodata | Read-only, global static variables are stored here. The C code line "static const int i = 8;" will cause an entry in this section. |
.text | This section contains executable machine code. |
.comment | GCC writes information about the compiler into this section. For example, "GCC: (Raspbian4.9.2-10) 4.9.2" |
.ARM.attributes | Contains information about the generated ARM code. For example whether it is EABI compliant. |
Debug Sections
These sections are generated when calling as or gcc with the "-g" option.
Section Name | Description |
---|---|
.debug_info | |
.debug_abbrev | |
.debug_aranges | |
.debug_line | |
.debug_str | |
.debug_frame |
Inline Assembly
Occasionally, there is need for embedding assembly code inside of C/C++ code. This is referred to as inline assembly. Assuming that you are using the GCC toolchain, refer to the GCC manual for embedding assembly in C. An additional online resource is the ARM GCC Inline Assembler Cookbook by Harald Kipp.
Disassembling Binaries
Use the objdump tool (part of GNU Binutils) to view the machine code of an object file in assembly:
objdump -d <executable file name>
Online Tutorials
There is ARM's official documentation site and many tutorials about ARM assembly pogramming by third parties. For example, the tutorial series on the Think In Geek website about ARM assembly programming on Raspberry Pi, written by Roger Ferrer Ibáñez. I also liked the Introduction to ARM written by David Thomas although it's more reference than tutorial.
There are also many nice video tutorials out there. For example, this video tutorial series by Youtube user Arat192 for the Raspberry Pi 1 with ARM11 processor, or the video tutorial series by Derek Banas for assembly programming on the same model Raspberry Pi.
Example Code
The following two assembly programs are based on code from the video tutorials mentioned above and are written for the GNU toolchain and ARMv7-A processor.
Example 1
This simple program performs a bitwise logical operation.
.global _start // _start is the expected entry symbol for a standalone assembly program _start: mov r1, #5 // loads binary 0101 mov r2, #9 // loads binary 1001 //and r0, r1, r2 // bitwise AND //orr r0, r1, r2 // bitwise OR //eor r0, r1, r2 // bitwise XOR bic r0, r1, r2 // bitwise clear. Returns 0 except when bit in r1 is 1 and bit in r2 is 0 end: mov r7, #1 // selects system call for exiting to terminal swi 0 // executes system call via interrupt
Assuming that the above code is saved to the file example1.s, you create the executable via
as -o example1.o example1.s ld -o example1 example1.o
then run the executable and check the return value, which is the value in r0, by entering
./example1; echo $?
where echo $? prints the return value of the last executed terminal command.
Example 2
This program prints the uppercase character corresponding to the input lowercase character.
.global main .func main main: // 'main' is the expected entry point for an executable that links to a library mov r7, #3 // sets system call "read from input stream" mov r0, #0 // selects the keyboard as input stream mov r2, #1 // sets number of characters to read from stream ldr r1, =character // set the address where the character will be stored swi 0 // executes the system call _uppercase: ldr r1, =character ldr r0, [r1] // for example, // a : 0110 0001 // A : 0100 0001 // Bit #6 (= 0x0010 0000 = 32) determines whether an ASCII letter is lowercase or uppercase. bic r0, r0, #32 // converts to uppercase str r0, [r1] // stores the result in the memory location pointed at by r1 b _write_via_printf _write_to_device: mov r7, #4 // system call for printing output mov r0, #1 // selects the monitor as output device mov r2, #1 // outputs only one character ldr r1, =character // loads the character to be printed swi 0 .extern printf _write_via_printf: ldr r0, =out_format ldr r1, [r1] bl printf // Don't forget to link against libc. Use 'gcc' or add '-lc' when using 'ld'. //.endfunc // use GNU Assembler directive to mark the end of the function end: //mov pc, lr // alternatively, return from function call by setting the program counter mov r7, #1 // instead, we exit the program altogether via system call swi 0 // executing the system call .data character: .ascii " " // a single ASCII character is stored in this memory location out_format: .asciz "%c\n" // a null-terminated ASCII string, compatible with printf, is stored here
Note that main is used instead of _start as the entry point due to use of printf and, therefore, linkage with libc via gcc. Assuming that the above code is saved to the file example2.s, you create the executable via
as -o example2.o example2.s gcc -Wl,-s -o example2 example2.o
where gcc will automatically link to libc and generate the code for the _start entry point, which will be calling the function main.
Example 3: Working With Floating Points
In this example, the floating point unit (FPU) is used for calculation. The FPU on ARM Cortex-A processors is called VFP (Vector Foating Point). The most useful instructions in the VFP instruction set are listed on this quick reference card.
Since the FPU is an optional component inn the ARM architecture, the presence of an FPU is passed to the assembler as a command line argument.
as -mfpu=vfpv4 -o example.o example.s
Raspberry Pi 2 (BCM2836 processor) supports VFP version 3 while Raspberry Pi 3 (BCM2837 processor) supports VFP version 4. If you don't enable support for VFP, the assembler will emit the following error message:
selected processor does not support ARM mode `vmrs r4,fpsid'
Debugging
In order to debug an executable, the executable needs to contain debug symbols. Use the -g option of the GNU Assembler to generate object files with debug symbols. The debugger, gdb, will load the symbols automatically. Use the GDB command disassemble to view the assembly instructions for a given function as reconstructed from machine code. The addresses of the memory locations in which the instructions are stored are also shown. An example debug session on a Raspberry Pi is shown below.
pi@raspberrypi:~/asm_test $ as -g -o test8.o test8.s pi@raspberrypi:~/asm_test $ ld -o test8 test8.o pi@raspberrypi:~/asm_test $ gdb test8 GNU gdb (Raspbian 7.7.1+dfsg-5+rpi1) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-linux-gnueabihf". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from test8...done. (gdb) list 1 // looping 2 // r0 = 0 3 // r1 = 1 4 // while (r0 <= 10) 5 // r0 = r0 + 1 6 7 .global _start 8 _start: 9 mov r0, #0 10 mov r1, #1 (gdb) disassemble _start Dump of assembler code for function _start: 0x00010054 <+0>: mov r0, #0 0x00010058 <+4>: mov r1, #1 0x0001005c <+8>: b 0x10064 <_continue_loop> End of assembler dump.
You can set a breakpoint before running the executable in the breakpoint. Execution will stop at the breakpoint where you can inspect the register values with the command info r.
(gdb) b 13 Breakpoint 1 at 0x10060: file test8.s, line 13. (gdb) run Starting program: /home/pi/asm_test/test8 Breakpoint 1, _loop () at test8.s:14 14 add r0, r0, r1 (gdb) info r r0 0x0 0 r1 0x1 1 r2 0x0 0 r3 0x0 0 r4 0x0 0 r5 0x0 0 r6 0x0 0 r7 0x0 0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0x0 0 sp 0x7efff6b0 0x7efff6b0 lr 0x0 0 pc 0x10060 0x10060 <_loop> cpsr 0x80000010 -2147483632
To inspect a region in memory type x followed by a forward slash, followed by the number of memory locations, whitespace, followed by the label or address of the start location. If the number is omitted only the value at the memory location with the given label is displayed. Typing only x will show the 4 memory locations following the last shown location.
(gdb) x/4 info 0x10078 <info>: 10 65656 4929 1634033920 (gdb) x 0x10078 0x10078: 0x0000000a (gdb) x info 0x10078 <info>: 0x0000000a (gdb) x 0x1007c <info+4>: 0x00010078
To inspect the contents of a memory location referenced by a register, type "x $<register name>" like this:
(gdb) x $sp 0x7efff6a8: 0x00000002