Get to Know ARM Assembly Language

By Jeff Tranter Wednesday, May 11, 2022

For many decades writing software in high-level languages has been the norm, and more recently even low-end microcontrollers are almost exclusively programmed in languages like C and C++.

Of course, the native language of processors is machine code, and ultimately all programs need to run as native machine code (with some rare exceptions like processors that can directly run Java byte codes).

While you can write code in machine language (or more often, the slightly higher level assembly language), there are many compelling reasons not to do this, most notably that it is not portable and typically five to 10 times more difficult and time-consuming than using a high-level language. It is generally agreed that compilers write better machine code than most humans.

An aside: The C and C++ "register" keyword has traditionally been used as a hint to compilers to optimize variables by putting them in registers. The keyword was deprecated in the C++11 standard and generates an error in C++17, reflecting the fact that even giving the compiler a hint is usually not useful anymore.

However, there are some situations where why you might want to write or at least be familiar with some assembler code:

You need to debug or trace compiler-generated code,
You need to write highly optimized or time-critical code,
You need to use special instructions not supported by a high-level language,
You are writing or modifying a compiler, or
Just for general interest.

If you have done any assembly language programming in the past at all, it was likely on Intel x86 or 8 or 16-bit microprocessors, perhaps in school. However, if you are doing any type of embedded or mobile development today, it most likely runs on the ARM architecture. The vast majority of embedded systems today use ARM. Most off-the-shelf boards and SOMs from vendors like Toradex and Variscite are ARM-based, as well as all models of Raspberry Pi. Apple recently switched their laptops and desktops to a series of ARM processors they designed and Microsoft Windows can now run on ARM platforms.

So let's take a look at the ARM architecture, at least in enough detail that you can get a flavor for the capabilities and instructions available. A few interesting facts about ARM: Owned by ARM Holdings Inc., it started as Acorn Computers in the 1980s, first offering the ARM-based Acorn Archimedes computer in 1982.

Since then, the architecture has evolved into a range of processors from low-cost microcontrollers to high-end multi-core application processors. ARM Holdings does no manufacturing of its own but instead licenses the architecture to manufacturers like NXP, Broadcom and Samsung. Over 200 billion processors have been produced to date.

ARM is an example of a Reduced Instruction Set Computer or RISC architecture. Originally intended to be a processor with a small, highly optimized set of instructions, it is today characterized more by:

a large number of registers;
a highly regular instruction pipeline with a low number of clock cycles per instruction; and
a load/store architecture in which memory is accessed through specific instructions rather than as a part of most instructions in the set.

The main benefits of RISC are higher speed and performance (ideally averaging one clock cycle per instruction due to pipelining), and reduced power consumption, particularly important for battery-operated devices like laptops, tablets, and smartphones.

A Program Example

Let's jump right into an example of an assembly language program that we can build and run. In this case, a version of the ubiquitous "Hello, world" program that runs on the Linux operating system.

The program below can be assembled using the GNU assembler program either natively on an ARM-based system like a Raspberry Pi or cross-assembled on a non-ARM system using a suitable cross-assembler. When run, it will write "Hello, World!" to the console.

        .global  _start
_start:
        MOV R7, #4      @ Syscall number
        MOV R0, #1      @ Stdout is monitor
        MOV R2, #14     @ String is 14 chars long
        LDR R1,=string  @ String located at string
        SWI 0
        MOV R7, #1      @ exit syscall
        SWI 0
        .data
string:
        .ascii "Hello, World!\n"

Much of this likely won't make sense to you until we cover more details, but we can make some observations now:

Assembler directives starting with a dot (e.g. .global) provide information to the assembler.
Most lines consist of fields separated by white space.
The first field can be a label followed by a colon that allows the program to refer to addresses using a meaningful name. In this case, we use "_start:" which has special meaning to the linker, telling it to start program execution here. Another example is "string:" at the address where we stored the string to be printed.
The second field contains mnemonics, like MOV, which map to machine language instructions.
A mnemonic can optionally be followed by operands.
The last field can be a comment, preceded by an at sign.

The code above can be assembled on an ARM-based system using a command like the following (assuming it is in the file hello.s):

as -al=hello.lst -o hello.o hello.s

And then linked to an executable called "hello" by calling the linker:

ld -o hello hello.o

The assembler options above specified to save an assembler listing file to hello.lst. Taking a look at that file we can make some more observations:

ARM GAS  hello.s             page 1

   1                          .global  _start
   2                  _start:
   3 0000 0470A0E3             MOV R7, #4      @ Syscall number
   4 0004 0100A0E3             MOV R0, #1      @ Stdout is monitor
   5 0008 0E20A0E3             MOV R2, #14     @ String is 14 chars long
   6 000c 08109FE5             LDR R1,=string  @ String located at string
   7 0010 000000EF             SWI 0
   8 0014 0170A0E3             MOV R7, #1      @ exit syscall
   9 0018 000000EF             SWI 0
  10                          .data
  11                  string:
  12 0000 48656C6C             .ascii "Hello, World!\n"
  12      6F2C2057 
  12      6F726C64 
  12      210A

The listing has the original source code, but now with line numbers, a list of addresses (in hexadecimal), and the machine code instructions. We can see that the instructions are all 32-bits in length and the total size of the program, including data, is under 50 bytes. In contrast, a C or C++ hello world program generated an executable over 5000 bytes in size and required a handful of additional shared libraries to be linked to it.

ARM Syscalls

We'll look at the instruction set shortly, and some of the instructions above, like MOV, may already make sense if you've done assembler programming before. But what is that SWI instruction and where is all the code to do the actual printing?

On Linux and platforms with an operating system, the kernel provides facilities known as system calls. A special mechanism is needed to be able to call the Linux from user space where application programs run. This uses a platform-dependent mechanism on each platform (Linux, for example, supports about 35 different platforms). You can get more details by reading the syscall man page on Linux.

On ARM (specifically the EABI version) system calls use a software interrupt (SWI) 0 instruction and pass the system call number in register R7, with other parameters passed in other registers. For example, write() is system call 4 and exit() is 1. These numbers differ across platforms.

In high-level programs, the system calls are typically wrapped in the C run-time library so they appear as standard function calls.

When writing assembler programs on Linux you can directly invoke the kernel system calls, allowing you to write very small programs with no library dependencies.

ARM Register Model (32-BIT)

In line with the RISC philosophy, the ARM architecture provides sixteen 32-bit registers, named R0 through R15. R0 to R12 are for general-purpose use, and unlike some processors, it does not use specific registers for special uses like accumulators or index registers. R13 is the Stack Pointer (SP), R14 is a Link Register (LR) used to hold the return address for a function call, and R15 is the PC or Program Counter. R11 is a general-purpose register but is often used as a frame pointer and the assembler allows you to also refer to it as fp.

Also present is a Current Program Status Register or CPSR. Sometimes referred to as just the Status Register, it includes the standard status flag bits commonly seen on processors:

V - overflow bit
C - carry/borrow/extend bit
Z - zero bit
N - negative/less than bit

A 32-bit register, it also has bits for some less commonly used functions like interrupt mask and processor mode.

Summary

That's enough for this time. In the next blog installment, we will look at some of the ARM instruction formats and commonly used instructions.