(3 intermediate revisions by the same user not shown)
Line 68: Line 68:
</pre>
</pre>
<br />
<br />
==== Instruction Set ====
===== ARMv6-M architecture with modified Thumb instruction set =====
{|
! Mnemonic
! Operands
! Brief description
! Flags
|-
| ADCS || {Rd,} Rn, Rm || Add with Carry || N,Z,C,V
|-
| ADD{S} || {Rd,} Rn, <Rm&#124;#imm> || Add || N,Z,C,V
|-
| ADR || Rd || label PC-relative Address to Register || -
|-
| ANDS || {Rd,} Rn, Rm || Bitwise AND || N,Z
|-
| ASRS || {Rd,} Rm, <Rs&#124;#imm> || Arithmetic Shift Right || N,Z,C
|-
| B{cc} || label || Branch {conditionally} || -
|-
| BICS || {Rd,} Rn, Rm || Bit Clear || N,Z
|-
| BKPT || #imm || Breakpoint || -
|-
| BL || label || Branch with Link || -
|-
| BLX || Rm || Branch indirect with Link || -
|-
| BX || Rm || Branch indirect || -
|-
| CMN || Rn, Rm || Compare Negative || N,Z,C,V
|-
| CMP || Rn, <Rm&#124;#imm> || Compare || N,Z,C,V
|-
| CPSID || i || Change Processor State, Disable Interrupts || -
|-
| CPSIE || i || Change Processor State, Enable Interrupts || -
|-
| DMB || - || Data Memory Barrier || -
|-
| DSB || - || Data Synchronization Barrier || -
|-
| EORS || {Rd,} Rn, Rm || Exclusive OR || N,Z
|-
| ISB || - || Instruction Synchronization Barrier || -
|-
| LDM || Rn{!}, reglist || Load Multiple registers, increment after || -
|-
| LDR || Rt, label || Load Register from PC-relative address || -
|-
| LDR || Rt, [Rn, <Rm&#124;#imm>] || Load Register with word || -
|-
| LDRB || Rt, [Rn, <Rm&#124;#imm>] || Load Register with byte || -
|-
| LDRH || Rt, [Rn, <Rm&#124;#imm>] || Load Register with halfword || -
|-
| LDRSB || Rt, [Rn, <Rm&#124;#imm>] || Load Register with signed byte || -
|-
| LDRSH || Rt, [Rn, <Rm&#124;#imm>] || Load Register with signed halfword || -
|-
| LSLS || {Rd,} Rn, <Rs&#124;#imm> || Logical Shift Left || N,Z,C
|-
| LSRS || {Rd,} Rn, <Rs&#124;#imm> || Logical Shift Right || N,Z,C
|-
| MOV{S } || Rd, Rm || Move || N,Z
|-
| MRS || Rd, spec_reg || Move to general register from special register || -
|-
| MSR || spec_reg, Rm || Move to special register from general register || N,Z,C,V
|-
| MULS || Rd, Rn, Rm || Multiply, 32-bit result || N,Z
|-
| MVNS || Rd, Rm || Bitwise NOT || N,Z
|-
| NOP || - || No Operation || -
|-
| ORRS || {Rd,} Rn, Rm || Logical OR || N,Z
|-
| POP || reglist || Pop registers from stack || -
|-
| PUSH || reglist || Push registers onto stack || -
|-
| REV || Rd, Rm || Byte-Reverse word || -
|-
| REV16 || Rd, Rm || Byte-Reverse packed halfwords || -
|-
| REVSH || Rd, Rm || Byte-Reverse signed halfword || -
|-
| RORS || {Rd,} Rn, Rs || Rotate Right || N,Z,C
|-
| RSBS || {Rd,} Rn, #0 || Reverse Subtract || N,Z,C,V
|-
| SBCS || {Rd,} Rn, Rm || Subtract with Carry || N,Z,C,V
|-
| SEV || - || Send Event || -
|-
| STM || Rn!, reglist || Store Multiple registers, increment after || -
|-
| STR || Rt, [Rn, <Rm&#124;#imm>] || Store Register as word || -
|-
| STRB || Rt, [Rn, <Rm&#124;#imm>] || Store Register as byte || -
|-
| STRH || Rt, [Rn, <Rm&#124;#imm>] || Store Register as halfword || -
|-
| SUB{S} || {Rd,} Rn, <Rm&#124;#imm> || Subtract || N,Z,C,V
|-
| SVC || #imm || Supervisor Call || -
|-
| SXTB || Rd, Rm || Sign extend byte || -
|-
| SXTH || Rd, Rm || Sign extend halfword || -
|-
| TST || Rn, Rm || Logical AND based test || N,Z
|-
| UXTB || Rd, Rm || Zero extend a byte || -
|-
| UXTH || Rd, Rm || Zero extend a halfword || -
|-
| WFE || - || Wait For Event || -
|-
| WFI || - || Wait For Interrupt || -
|}
<br />
=== GNU Assembler ===
=== GNU Assembler ===
==== In-code Directives ====
==== In-code Directives ====
The following table lists frequently used GNU Assembler directives. The complete list can be found at the official documentation pages for [https://sourceware.org/binutils/docs/as/Pseudo-Ops.html device-independent directives] plus [https://sourceware.org/binutils/docs/as/ARM-Directives.html ARM directives].
The following table lists frequently used GNU Assembler directives. The complete list can be found at the official documentation pages for [https://sourceware.org/binutils/docs/as/Pseudo-Ops.html device-independent directives] plus [https://sourceware.org/binutils/docs/as/ARM-Directives.html ARM directives].
<br />
<br />

Latest revision as of 2022-09-16T10:32:39


Introduction

There are a large number of processors out there that are based on the various processor architectures designed by ARM Ltd.. The capabilities of these processor vary greatly. Nevertheless ARM designed the instruction sets for its processor architectures so that the instruction sets of the more capable processors build on the instruction set of the simpler processors. A programmer can thus transfer most of the knowledge of programming for one type of ARM processor to programming another type of ARM processor.

I will be using the GNU variant of assembler directives that are used alongside ARM instructions to create a complete program. Please refer to the GNU Assembler user manual for detailed information on the assembler directives shown here.

Registers

In ARM, there are 16 32-bit registers available. These are

# name use note
1 R0 General Purpose The return code of a function is read from this register.
2 R1 General Purpose
3 R2 General Purpose
4 R3 General Purpose
5 R4 General Purpose
6 R5 General Purpose
7 R6 General Purpose
8 R7 General Purpose
9 R8 General Purpose
10 R9 General Purpose
11 R10 General Purpose
12 R11 General Purpose
13 R12 General Purpose
14 SP Stack Pointer
15 LR Link Register stores the return address from subroutines and interrupt service routines
16 PC Program Counter contains the start address of the current instruction


Thumb

The classic ARM instruction set is a RISC instruction set with a fixed length of 32-bit. To achieve better code density, which is especially important in the Cortex-M series processors that are designed for use as microcontrollers, a 16-bit instruction set called Thumb, containing the most frequently used ARM instructions, was created. Thumb was later extended to the hybrid 16-bit/32-bit Thumb-2 instruction set in order to achieve good code density while also offering feature-rich 32-bit instructions.


Keep in mind that in Thumb (not Thumb-2) registers R8 through R12 are not available. While the ARM instruction set uses 32-bit numbers with a clever 12-bit-encoding scheme for its immediate values, Thumb uses plain 8-bit binary encoding for immediate values. Therefore, you can't use numbers larger than 255 as immediate value. Instead, you keep your 32-bit number in a memory pool close to the code and load the number into a register via PC-relative addressing as shown below.

ldr  r1,[pc,#0x9]
... // 8 instructions in between
large_num: .word 0x8410033a

You can also use the "=" prefix that instructs the GNU assembler to automatically insert a memory pool from which LDR loads the specified value:

ldr  r1,=0x8410033a // automatically create a memory pool for storing '0x8410033a'
... // a few instructions
bx // jump somewhere
.pool // insert the memory pool here, close to the ldr instruction 


Instruction Set

ARMv6-M architecture with modified Thumb instruction set
Mnemonic Operands Brief description Flags
ADCS {Rd,} Rn, Rm Add with Carry N,Z,C,V
ADD{S} {Rd,} Rn, <Rm|#imm> Add N,Z,C,V
ADR Rd label PC-relative Address to Register -
ANDS {Rd,} Rn, Rm Bitwise AND N,Z
ASRS {Rd,} Rm, <Rs|#imm> Arithmetic Shift Right N,Z,C
B{cc} label Branch {conditionally} -
BICS {Rd,} Rn, Rm Bit Clear N,Z
BKPT #imm Breakpoint -
BL label Branch with Link -
BLX Rm Branch indirect with Link -
BX Rm Branch indirect -
CMN Rn, Rm Compare Negative N,Z,C,V
CMP Rn, <Rm|#imm> Compare N,Z,C,V
CPSID i Change Processor State, Disable Interrupts -
CPSIE i Change Processor State, Enable Interrupts -
DMB - Data Memory Barrier -
DSB - Data Synchronization Barrier -
EORS {Rd,} Rn, Rm Exclusive OR N,Z
ISB - Instruction Synchronization Barrier -
LDM Rn{!}, reglist Load Multiple registers, increment after -
LDR Rt, label Load Register from PC-relative address -
LDR Rt, [Rn, <Rm|#imm>] Load Register with word -
LDRB Rt, [Rn, <Rm|#imm>] Load Register with byte -
LDRH Rt, [Rn, <Rm|#imm>] Load Register with halfword -
LDRSB Rt, [Rn, <Rm|#imm>] Load Register with signed byte -
LDRSH Rt, [Rn, <Rm|#imm>] Load Register with signed halfword -
LSLS {Rd,} Rn, <Rs|#imm> Logical Shift Left N,Z,C
LSRS {Rd,} Rn, <Rs|#imm> Logical Shift Right N,Z,C
MOV{S } Rd, Rm Move N,Z
MRS Rd, spec_reg Move to general register from special register -
MSR spec_reg, Rm Move to special register from general register N,Z,C,V
MULS Rd, Rn, Rm Multiply, 32-bit result N,Z
MVNS Rd, Rm Bitwise NOT N,Z
NOP - No Operation -
ORRS {Rd,} Rn, Rm Logical OR N,Z
POP reglist Pop registers from stack -
PUSH reglist Push registers onto stack -
REV Rd, Rm Byte-Reverse word -
REV16 Rd, Rm Byte-Reverse packed halfwords -
REVSH Rd, Rm Byte-Reverse signed halfword -
RORS {Rd,} Rn, Rs Rotate Right N,Z,C
RSBS {Rd,} Rn, #0 Reverse Subtract N,Z,C,V
SBCS {Rd,} Rn, Rm Subtract with Carry N,Z,C,V
SEV - Send Event -
STM Rn!, reglist Store Multiple registers, increment after -
STR Rt, [Rn, <Rm|#imm>] Store Register as word -
STRB Rt, [Rn, <Rm|#imm>] Store Register as byte -
STRH Rt, [Rn, <Rm|#imm>] Store Register as halfword -
SUB{S} {Rd,} Rn, <Rm|#imm> Subtract N,Z,C,V
SVC #imm Supervisor Call -
SXTB Rd, Rm Sign extend byte -
SXTH Rd, Rm Sign extend halfword -
TST Rn, Rm Logical AND based test N,Z
UXTB Rd, Rm Zero extend a byte -
UXTH Rd, Rm Zero extend a halfword -
WFE - Wait For Event -
WFI - Wait For Interrupt -


GNU Assembler

In-code Directives

The following table lists frequently used GNU Assembler directives. The complete list can be found at the official documentation pages for device-independent directives plus ARM directives.

Syntax Description
.syntax unified Use modern assembler syntax + auto-generate IT instructions. Place this at the top of your source file.
.weak <label>{,<label>} Allow 'label' to be undefined. If it's undefined, it will have the value NULL (0x00000000).
.weakref <label>,<default label> Allow label to be undefined. If it's undefined, it will have the value of another label.
.section <section name> All output from now on goes into a new section with the given name. Sections are labelled chunks of code or data and are processed by the linker.
.align <num zero bits> [,<pattern> [,<max skipped bytes>]] Pad the output up to the first address that has the given number of low-order zero bits. (external documentation)
.balign <num bytes> [,<pattern> [,<max skipped bytes>]] Pad the output up to the first address that is a multiple of the given number of bytes. (external documentation)
.long <value> output a 32-bit value
.text Specifies that subsequent lines contain program code (same as '.section .text')
.data Specifies that subsequent lines contain data, no code.
.global <label> Makes the symbol with the given label visible to the linker. That is, "exports the symbol. Also used to mark external symbols (defined in another compile unit).
.func <label> [, <actual label>] Mark the beginning of function 'label', so the linker may exclude the block if not referenced.
.endfunc Marks the end of a function.
.size <label>,<size> Tells the linker how long (in bytes) the block is that this symbol points to.
.thumb Specifies that the Thumb instruction set is being used. Use this with .syntax unified.
.thumb_func <label> Marks the given entry point as a function with Thumb instructions. Thumb is required if the function is called by using 'bx' or 'blx'.
.type <label>,%<type> Specifies the type of the symbol. Required if there is a pointer to the function somewhere.
.cpu <CPU type> The CPU type can be, for example, cortex-m0, cortex-m3 or cortex-m4.
.zero <num bytes> Denotes a block of memory of given size and initialized to zeros everywhere.
.fill < num of bytes>{, <value>{, <value_size>}} Denotes a block of memory of given size and initialzed to the given value of the given size value (1, 2, or 4 for byte, half word, or word, respectively).
.pool, .ltorg Instructs the assembler to dump the current contents of the literal pool (=constants) to be dumped into the current .text section at the location where this directive is inserted (aligned to a word boundary).


Sections

The table below lists the most important or frequently encountered sections that the assembler (or GCC) writes into the binary image.

Section Name Description
.data Global, static variables are stored here. The C variable declaration "static int i;", for example, will cause an entry in this section.
.rodata Read-only, global static variables are stored here. The C code line "static const int i = 8;" will cause an entry in this section.
.text This section contains executable machine code.
.comment GCC writes information about the compiler into this section. For example, "GCC: (Raspbian4.9.2-10) 4.9.2"
.ARM.attributes Contains information about the generated ARM code. For example whether it is EABI compliant.


Debug Sections

These sections are generated when calling as or gcc with the "-g" option.

Section Name Description
.debug_info
.debug_abbrev
.debug_aranges
.debug_line
.debug_str
.debug_frame


Inline Assembly

Occasionally, there is need for embedding assembly code inside of C/C++ code. This is referred to as inline assembly. Assuming that you are using the GCC toolchain, refer to the GCC manual for embedding assembly in C. An additional online resource is the ARM GCC Inline Assembler Cookbook by Harald Kipp.

Disassembling Binaries

Use the objdump tool (part of GNU Binutils) to view the machine code of an object file in assembly:

objdump -d <executable file name>


Online Tutorials

There is ARM's official documentation site and many tutorials about ARM assembly pogramming by third parties. For example, the tutorial series on the Think In Geek website about ARM assembly programming on Raspberry Pi, written by Roger Ferrer Ibáñez. I also liked the Introduction to ARM written by David Thomas although it's more reference than tutorial.

There are also many nice video tutorials out there. For example, this video tutorial series by Youtube user Arat192 for the Raspberry Pi 1 with ARM11 processor, or the video tutorial series by Derek Banas for assembly programming on the same model Raspberry Pi.

Example Code

The following two assembly programs are based on code from the video tutorials mentioned above and are written for the GNU toolchain and ARMv7-A processor.

Example 1

This simple program performs a bitwise logical operation.

.global _start  // _start is the expected entry symbol for a standalone assembly program
_start:
  mov r1, #5          // loads binary 0101
  mov r2, #9          // loads binary 1001
  //and r0, r1, r2    // bitwise AND
  //orr r0, r1, r2    // bitwise OR
  //eor r0, r1, r2    // bitwise XOR
  bic r0, r1, r2      // bitwise clear. Returns 0 except when bit in r1 is 1 and bit in r2 is 0

end:
  mov r7, #1          // selects system call for exiting to terminal
  swi 0               // executes system call via interrupt 

Assuming that the above code is saved to the file example1.s, you create the executable via

as -o example1.o example1.s
ld -o example1 example1.o

then run the executable and check the return value, which is the value in r0, by entering

./example1; echo $?

where echo $? prints the return value of the last executed terminal command.

Example 2

This program prints the uppercase character corresponding to the input lowercase character.

.global main
.func main
main:                 // 'main' is the expected entry point for an executable that links to a library
  mov r7, #3          // sets system call "read from input stream"
  mov r0, #0          // selects the keyboard as input stream
  mov r2, #1          // sets number of characters to read from stream
  ldr r1, =character  // set the address where the character will be stored
  swi 0               // executes the system call

_uppercase:
  ldr r1, =character
  ldr r0, [r1]
// for example,
// a : 0110 0001
// A : 0100 0001
// Bit #6 (= 0x0010 0000 = 32) determines whether an ASCII letter is lowercase or uppercase.
  bic r0, r0, #32     // converts to uppercase
  str r0, [r1]        // stores the result in the memory location pointed at by r1
  b _write_via_printf

_write_to_device:
  mov r7, #4          //  system call for printing output
  mov r0, #1          //  selects the monitor as output device
  mov r2, #1          //  outputs only one character
  ldr r1, =character  // loads the character to be printed
  swi 0

.extern printf
_write_via_printf:
  ldr r0, =out_format
  ldr r1, [r1]
  bl printf          // Don't forget to link against libc. Use 'gcc' or add '-lc' when using 'ld'.

//.endfunc           // use GNU Assembler directive to mark the end of the function

end:
  //mov pc, lr       // alternatively, return from function call by setting the program counter
  mov r7, #1         // instead, we exit the program altogether via system call 
  swi 0              // executing the system call

.data
character:
  .ascii " "         // a single ASCII character is stored in this memory location
out_format:
  .asciz "%c\n"      // a null-terminated ASCII string, compatible with printf, is stored here

Note that main is used instead of _start as the entry point due to use of printf and, therefore, linkage with libc via gcc. Assuming that the above code is saved to the file example2.s, you create the executable via

as -o example2.o example2.s
gcc -Wl,-s -o example2 example2.o

where gcc will automatically link to libc and generate the code for the _start entry point, which will be calling the function main.

Example 3: Working With Floating Points

In this example, the floating point unit (FPU) is used for calculation. The FPU on ARM Cortex-A processors is called VFP (Vector Foating Point). The most useful instructions in the VFP instruction set are listed on this quick reference card.


Since the FPU is an optional component inn the ARM architecture, the presence of an FPU is passed to the assembler as a command line argument.

as -mfpu=vfpv4 -o example.o example.s

Raspberry Pi 2 (BCM2836 processor) supports VFP version 3 while Raspberry Pi 3 (BCM2837 processor) supports VFP version 4. If you don't enable support for VFP, the assembler will emit the following error message:

selected processor does not support ARM mode `vmrs r4,fpsid'



Debugging

In order to debug an executable, the executable needs to contain debug symbols. Use the -g option of the GNU Assembler to generate object files with debug symbols. The debugger, gdb, will load the symbols automatically. Use the GDB command disassemble to view the assembly instructions for a given function as reconstructed from machine code. The addresses of the memory locations in which the instructions are stored are also shown. An example debug session on a Raspberry Pi is shown below.

pi@raspberrypi:~/asm_test $ as -g -o test8.o test8.s
pi@raspberrypi:~/asm_test $ ld -o test8 test8.o
pi@raspberrypi:~/asm_test $ gdb test8
GNU gdb (Raspbian 7.7.1+dfsg-5+rpi1) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test8...done.
(gdb) list
1	// looping
2	// r0 = 0
3	// r1 = 1
4	// while (r0 <= 10)
5	//   r0 = r0 + 1
6	
7	.global _start
8	_start:
9	  mov r0, #0
10	  mov r1, #1
(gdb) disassemble _start
Dump of assembler code for function _start:
   0x00010054 <+0>:	mov	r0, #0
   0x00010058 <+4>:	mov	r1, #1
   0x0001005c <+8>:	b	0x10064 <_continue_loop>
End of assembler dump.


You can set a breakpoint before running the executable in the breakpoint. Execution will stop at the breakpoint where you can inspect the register values with the command info r.

(gdb) b 13
Breakpoint 1 at 0x10060: file test8.s, line 13.
(gdb) run
Starting program: /home/pi/asm_test/test8 

Breakpoint 1, _loop () at test8.s:14
14	  add r0, r0, r1
(gdb) info r
r0             0x0	0
r1             0x1	1
r2             0x0	0
r3             0x0	0
r4             0x0	0
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x0	0
r11            0x0	0
r12            0x0	0
sp             0x7efff6b0	0x7efff6b0
lr             0x0	0
pc             0x10060	0x10060 <_loop>
cpsr           0x80000010	-2147483632


To inspect a region in memory type x followed by a forward slash, followed by the number of memory locations, whitespace, followed by the label or address of the start location. If the number is omitted only the value at the memory location with the given label is displayed. Typing only x will show the 4 memory locations following the last shown location.

(gdb) x/4 info
0x10078 <info>:	10	65656	4929	1634033920
(gdb) x 0x10078
0x10078:                0x0000000a
(gdb) x info
0x10078 <info>:         0x0000000a
(gdb) x
0x1007c <info+4>:	0x00010078


To inspect the contents of a memory location referenced by a register, type "x $<register name>" like this:

(gdb) x $sp
0x7efff6a8:	0x00000002



Debug data: