BMC Part 1 - Getting Started¶

Although the Raspberry-Pi comes with a good Linux distribution, the Pi is about software development, and sometimes we want a real-time system without an operating system. I decided it'd be great to do a tutorial outside of Linux to get to the resources of this great piece of hardware in a similar vein to the Cambridge University Tutorials which are excellently written.

However, they don't create an OS as purported and they stick with assembler rather than migrating to C. I will simply start with nothing but assembler to get us going, but switch to C as soon as possible.

The C compiler simply converts C syntax to assembler and then assembles this into executable code for us anyway.

I highly recommend going through the Cambridge University Raspberry Pi tutorials as they are excellent. If you want to learn a bit of assembler too, then definitely head off to there! These pages provide a similar experience, but with the additional of writing code in C and understanding the process behind that.

TODO¶

Why is a card of 16MiB necessary when we're not using anywhere near that?

Compatibility¶

There are quite a few versions of the RPi these days. This part of the tutorial supports the following models:

RPi Model A
RPi Model B
RPi Zero
RPi Zero W
RPi Model B+
RPi 2 Model B
RPi 3 Model B
RPi 4 Model B

Note

It is not an error that the RPI 3 Model B+ is not included in this list. The ACK LED is only available through the mailbox interface (available from part-4 of the tutorial) and so cannot be used directly by the GPIO peripheral which we'll be using in this part of the tutorial.

Cross Compiling for the Raspberry Pi (BCM2835/6/7/BCM23711)¶

ARM have now taken over the arm-gcc-embedded project and are provided the releases, so pop over to the ARM gcc downloads section and pick up a toolchain.

I've just grabbed 7.3.1 from the download page and I've locally installed it on my Linux machine to use with this tutorial.

This is what I get when I run this on my command line having decompressed the archive:

~/arm-tutorial-rpi/compiler/gcc-arm-none-eabi-7-2018-q2-update/bin $ ./arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 7.3.1 20180622 \
    (release) [ARM/embedded-7-branch revision 261907]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Cool.

You can use the compiler/get_compiler.sh script as a short-cut to get the compiler and the tutorial scripts will make use of it if you do.

Compiler Version¶

Previously with a prior stab at this tutorial I always linked and recommended a fixed, known-working version of the compiler because everything then just worked out of the box. However, now I'm saying - just get the latest and then we can fix the tutorial as things break.

NOTE: The disassembly listing you get may vary slightly from those generated by the gcc version being used in this tutorial if you're using a different compiler version or option set.

RPI1 Compiler Flags¶

The eLinux page gives us the optimal GCC settings for compiling code for the original Raspberry-Pi (V1):

-Ofast -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mtune=arm1176jzf-s

It is noted that -Ofast may cause problems with some compilations, so it is probably better that we stick with the more traditional -O2 optimisation setting. The other flags merely tell GCC what type of floating point unit we have, tell it to produce hard-floating point code (GCC can create software floating point support instead), and tells GCC what ARM processor architecture we have so that it can produce optimal and compatible assembly/machine code.

RPI2 Compiler Flags¶

For the Raspberry-Pi 2 we know that the architecture is different. The ARM1176 from the original pi has been replaced by a quad core Cortex A7 processor. Therefore, in order to compile effectively for the Raspberry-Pi 2 we use a different set of compiler options:

-O2 -mfpu=neon-vfpv4 -mfloat-abi=hard -march=armv7-a -mtune=cortex-a7

You can see from the ARM specification of the Cortex A7 that it contains a VFPV4 (See section 1.2.1) floating point processor and a NEON engine. The settings are gleaned from the GCC ARM options page.

RPI3 Compiler Flags¶

Like this:

-O2 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -march=armv8-a+crc -mcpu=cortex-a53

RPI4 Compiler Flags¶

From the Raspberry Pi Foundation page for the RPi4 we can glean some information from the technical specifications regarding what we need to do in order to compile code for the RPi4.

All four processors are A72. From the ARM documentation we can see that these implement the armv8-a architecture. This is the same as the A53's found in the RPi3 so we can go ahead and use the same crypto-neon-fp-armv8 floating point unit option for the RPI4. This is detailed in the v8 architecture programmers guide

-O2 -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -march=armv8-a+crc -mcpu=cortex-a72

The schematics you would think would be the place to get the ACT LED GPIO port number, but alas, they're so sparse they may as well not have bothered releasing them. Seriously - what they've released is a joke.

Instead we get it from some of the device tree source code for the RPi4.

Also make sure you've got the latest firmware, fixes are always being introduced!

Getting to know the Compiler and Linker¶

In order to use a C compiler, we need to understand what the compiler does and what the linker does in order to generate executable code. The compiler converts C statements into assembler and performs optimisation of the assembly instructions. This is in-fact all the C compiler does!

The C compiler then implicitly calls the assembler to assemble that file (usually a temporary) into an object file. This will have relocatable machine code in it along with symbol information for the linker to use. These days the C compiler pipes the assembly to the assembler so there is no intermediate file as creating files is a lot slower than passing data from one program to another through a pipe.

The linker's job is to link everything into an executable file. The linker requires a linker script. The linker script tells the linker how to organise the various object files. The linker will resolve symbols to addresses when it has arranged all the objects according to the rules in the linker script.

What we're getting close to here is that a C program isn't just the code we type. There are some fundamental things that must happen for C code to run. For example, some variables need to be initialised to certain values, and some variables need to be initialised to 0. This is all taken care of by an object file which is usually implicitly linked in by the linker because the linker script will include a reference to it. The object file is called crt0.o (C Run-Time zero)

This code uses symbols that the linker can resolve to clear the start of the area where initialised variables starts and ends in order to zero this memory section. It generally sets up a stack pointer, and it always includes a call to _main. Here's an important note: symbols present in C code get prepended with an underscore in the generation of the assembler version of the code. So where the start of a C program is the main symbol, in assembler we need to refer to it as it's assembler version which is _main.

Github¶

All of the source in the tutorials is available from the Github repo. So go clone or fork now so you have all the code to compile and modify as you work through the tutorials.

git clone https://github.com/BrianSidebotham/arm-tutorial-rpi

If you're on Windows - these day's I'll say, just get with the program and get yourself a Linux install. If you're entering the world of Raspberry Pi and/or embedded devices Linux is going to be your friend and will give you everything you need. This tutorial used to support both Linux and Windows, but I have no Windows installs left and so can't cover off Windows. If someone else takes that on, I'd fully support them in updating the tutorial - it's all on Github.

Let's have a look at compiling one of the simplest programs that we can. Let's compile and link the following program (part-1/armc-00):

part-1/armc-00¶

    int main(void)
    {
        while(1)
        {

        }

        return 0;
    }

Compiling¶

In order to compile the code (I realise there's not much to that code!) we can use the build.sh script in the tutorial directory. Navigate to part-1/armc-00 and run ./build.sh. With no arguments this script will just show you what it expects in order to run.

arm-tutorial-rpi/part-1/armc-00 $ ./build.sh
usage: build.sh <pi-model>
       pi-model options: rpi0, rpi1, rpi1bp, rpi2, rpi3, rpi3bp, rpi4

As there are different compiler flags for the various RPI models, it's necessary to tell the script what RPI you have in order to use the correct flags to compile with.

The V1 boards are fitted with the Broadcom BCM2835 (ARM1176) and the V2 board uses the BCM2836 (ARM Cortex A7). The RPI3 uses a Cortex-A53. Because of the processor differences, we use different build commands to build for the various RPI models.

Let's have a look at what the compiler command lines look like for the various RPI models.

Let's just concentrate on the RPI-specific options rather than including all the options here.

RPI0 (PiZero) and RPI1¶

arm-none-eabi-gcc \
    -mfloat-abi=hard \
    -mfpu=vfp \
    -march=armv6zk \
    -mtune=arm1176jzf-s \
    main.c -o main.elf

RPI2¶

arm-none-eabi-gcc \
    -mfloat-abi=hard \
    -mfpu=neon-vfpv4 \
    -march=armv7-a \
    -mtune=cortex-a7 \
    main.c -o main.elf

RPI3¶

arm-none-eabi-gcc \
    -mfloat-abi=hard \
    -mfpu=crypto-neon-fp-armv8 \
    -march=armv8-a+crc \
    -mcpu=cortex-a53 \
    main.c -o main.elf

RPI4¶

arm-none-eabi-gcc \
    -mfloat-abi=hard \
    -mfpu=crypto-neon-fp-armv8 \
    -march=armv8-a+crc \
    -mcpu=cortex-a72 \
    main.c -o main.elf

THIS IS EXPECTED TO FAIL: Using the build script, let's compile the basic source code using the RPI specific options. Here, we compile for the RPI3. (I've shortened the output so it's easier to read on-screen).

valvers-new/arm-tutorial-rpi/part-1/armc-00 $ ./build.sh rpi3

arm-none-eabi-gcc -mfloat-abi=hard \
    -mfpu=crypto-neon-fp-armv8 \
    -march=armv8-a+crc \
    -mcpu=cortex-a53 \
    armc-00.c \
    -o kernel.armc-00.rpi3.elf

GCC does successfully compile the source code (there are no C errors in it), but the linker fails with the following message:

.../hard/libc.a(lib_a-exit.o): In function `exit':
exit.c:(.text.exit+0x1c): undefined reference to '_exit'
collect2: error: ld returned 1 exit status

So with our one-line command above we're invoking the C compiler, the assembler and the linker. The C compiler does most of the menial tasks for us to make life easier for us, but because we're embedded engineers (aren't we?) we MUST be aware of how the compiler, assembler and linker work at a very low level as we generally work with custom systems which we must describe intimately to the tool-chain.

So there's a missing _exit symbol. This symbol is reference by the C library we're using. It is in-fact a system call. It's designed to be implemented by the OS. It would be called when a program terminates. In our case, we are our own OS at we're the only thing running, and in fact we will never exit so we do not need to really worry about it. System calls can be blank, they just merely need to be provided in order for the linker to resolve the symbol.

So the C library has a requirement of system calls. Sometimes these are already implemented as blank functions, or implemented for fixed functionality. For a list of system calls see the newlib documentation on system calls. Newlib is an open source, and lightweight C library which can be compiled in many different flavours.

The C library is what provides all of the C functionality found in standard C header files such as stdio.h, stlib.h, string.h, etc.

At this point I want to note that the standard Hello World example won't work here without an OS, and it is exactly unimplemented system calls that prevent it from being our first example. The lowest part of printf(...) includes a write function write - this function is used by all of the functions in the C library that need to write to a file. In the case of printf, it needs to write to the file stdout. Generally when an OS is running stdout produces output visible on a screen which can then be piped to another file system file by the OS. Without an OS, stdout generally prints to a UART to so that you can see program output on a remote screen such as a PC running a terminal program. We will discuss write implementations later on in the tutorial series, let's move on...

The easiest way to fix the link problem is to provide a minimal exit function to satisfy the linker. As it is never going to be used, all we need to do is shut the linker up and let it resolve _exit. So now, again with the build.sh script in armc-01 we can compile the next version of the code, part-1/armc-01.c

int main(void)
{
    while(1)
    {

    }

    return 0;
}

void exit(int code)
{
    while(1)
        ;
}

NOTE: In case you're wondering, the C compiler prefixes an underscore to the generated symbols for functions, so we don't include an underscore, otherwise we'd end up with a function that the linker sees as __exit. If we were writing this in an assembler file, we'd have to include the underscore prefix ourselves.

As we can see, compilation is successful and we get a kernel*.elf file generated by the compiler. Currently, that elf file is 37k.

part-1/armc-01 $ ./build.sh rpi3

arm-none-eabi-gcc -mfloat-abi=hard \
    -mfpu=crypto-neon-fp-armv8 \
    -march=armv8-a+crc \
    -mcpu=cortex-a53 \
    armc-01.c \
    -o kernel.armc-01.rpi3.elf

part-1/armc-01 $ ls -lh
total 16K
-rw-r--r-- 1 brian brian  366 Sep 21 00:19 armc-01.c
-rwxr-xr-x 1 brian brian 2.3K Sep 21 00:45 build.sh
-rwxr-xr-x 1 brian brian  36K Sep 21 00:45 kernel.armc-01.rpi3.elf

It's important to have an infinite loop in the exit function. In the C library, which is not intended to be used with an operating system (hence arm-NONE-eabi-*), _exit is marked as being noreturn. We must make sure it doesn't return otherwise we will get a warning about it. The prototype for _exit always includes an exit code int too. Yes, that's a bit oxymoronic!

Now using the same build command above we get a clean build! Yay! But there is really a problem, in order to provide a system underneath the C library we will have to provide linker scripts and our own C Startup code. In order to skip that initially and to simply get up and running we'll just use GCC's option not to include any of the C startup routines, which excludes the need for exit too.

The GCC options for that is -nostartfiles

Getting to Know the Processor¶

As in the Cambridge tutorials we will copy their initial example of illuminating an LED in order to know that our code is running correctly. This is nearly always the embedded developer's "Hello World!". Usually we'll blink the LED to make sure code is running and to know we're getting clocks at the speed we think we should.

Raspberry-Pi Boot Process¶

First, let's have a look at how a Raspberry-Pi processor boots. The BCM2385 from Broadcom includes two processors that we should know about, one is a Videocore™ GPU which is why the Raspberry-Pi makes such a good media-centre and the other is the ARM core which runs the operating system. Both of these processors share the peripheral bus and also have to share some interrupt resources. Although in this case, share means that some interrupt sources are not available to the ARM processor because they are already taken by the GPU.

The GPU starts running at reset or power on and includes code to read the first FAT partition of the SD Card on the MMC bus. It searches for and loads a file called bootcode.bin into memory and starts execution of that code. The bootcode.bin bootloader in turn searches the SD card for a file called start.elf and a config.txt file to set various kernel settings before searching the SD card again for a kernel.img file which it then loads into memory at a specific address (0x8000) and starts the ARM processor executing at that memory location. The GPU is now up and running and the ARM will start to come up using the code contained in kernel.img. The start.elf file contains the code that runs on the GPU to provide most of the requirements of OpenGL, etc.

Therefore in order to boot your own code, you need to firstly compile your code to an executable and name it kernel.img, and put it onto a FAT formatted SD Card, which has the GPU bootloader (bootcode.bin, and start.elf) on it as well. The latest Raspberry-Pi firmware is available on GitHub. The bootloader is located under the boot sub-directory. The rest of the firmware provided is closed-binary video drivers. They are compiled for use under Linux so that accelerated graphics drivers are available. As we're not using Linux these files are of no use to us, only the bootloader firmware is.

All this means that the processor is already up and running when it starts to run our code. Clock sources and PLL settings are already decided and programmed in the bootloader which alleviates that problem from us. We get to just start messing with the devices registers from an already running core. This is something I'm not that used too, normally the first thing in my code would be setting up correct clock and PLL settings to initialise the processor, but the GPU has setup the basic clocking scheme for us.

The first thing we will need to set up is the GPIO controller. There are no drivers we can rely on as there is no OS running, all the bootloader has done is boot the processor into a working state, ready to start loading the OS.

You'll need to get the Raspberry-Pi BCM2835 peripherals datahsheet, and make sure you pay attention to the errata for that too as it's not perfect. This gives us the information we require to control the IO peripherals of the BCM2835. I'll guide us through using the GPIO peripheral - there are as always some gotcha's:

The Raspberry-Pi 2B 1.2 uses the BMC2837 and so you'll want to get the Raspberry-Pi BCM2837 peripherals datahsheet.

NOTE: The 2837 peripherals document is just a modified version of the original 2835 document with the addresses updated to suit the 2837's base peripheral address. See rpi issue 325 for further details.

We'll be using the GPIO peripheral, and it would therefore be natural to jump straight to that documentation and start writing code, but we need to first read some of the 'basic' information about the processor. The important bit to note is the virtual address information. On page 5 of the BCM2835 peripherals page we see an IO map for the processor. Again, as embedded engineers we must have the IO map to know how to address peripherals on the processor and in some cases how to arrange our linker scripts when there are multiple address spaces.

ARM Virtual addresses

The VC CPU Bus addresses relate to the Broadcom Video Core CPU. Although the Video Core CPU is what bootloads from the SD Card, execution is handed over to the ARM core by the time our kernel.img code is called. So we're not interested in the VC CPU Bus addresses.

The ARM Physical addresses is the processors raw IO map when the ARM Memory Management Unit (MMU) is not being used. If the MMU is being used, the virtual address space is what what we'd be interested in.

Before an OS kernel is running, the MMU is also not running as it has not been initialised and the core is running in kernel mode. Addresses on the bus are therefore accessed via their ARM Physical Address. We can see from the IO map that the VC CPU Address 0x7E000000 is mapped to ARM Physical Address 0x20000000 for the original Raspberry Pi. This is important!

If you read the two peripheral datasheets carefully you'll see a subtle difference in them, notably , the Raspberry-Pi 2 has the ARM IO base set to 0x3F000000 instead of the original 0x20000000 of the original Raspberry-Pi. Unfortunately for us software engineers the Raspberry-Pi foundation don't appear to be good at securing the documentation we need, in fact, their attitude suggests they think we're magicians and don't actually need any. What a shame! Please if you're a member of the forum, campaign for more documentation. As engineers, especially in industry we wouldn't accept this from a manufacturer, we'd go elsewhere! In fact, we did at my work and use the TI Cortex A8 from the Beaglebone Black, a very good and well documented SoC!

Anyway, the base address can be gleaned from searching for uboot patches. The Raspberry Pi 2 uses a BCM2836 so we can search for that and u-boot and we come along a patch for supporting the Raspberry-Pi 2.

Further on in the manual we come across the GPIO peripheral section of the manual (Chapter 6, page 89).

RPi4 has the peripheral base mapped to 0xFE000000. The peripheral address space looks to be laid out the same as the previous pis.

Visual Output and Running Code¶

Finally, let's get on and see some of our code running on the Raspberry-Pi. We'll continue with using the first example of the Cambridge tutorials by lighting the OK LED on the Raspberry-Pi board.

The GPIO peripheral has a base address in the BCM2835 manual at 0x7E200000. We know from getting to know our processor that this translates to an ARM Physical Address of 0x20200000 (0x3F200000 for RPI2 and RPI3, and 0xFE200000 for RPI4). This is the first register in the GPIO peripheral register set, the GPIO Function Select 0 register.

In order to use an IO pin, we need to configure the GPIO peripheral. From the Raspberry-Pi schematic diagrams the OK LED is wired to the GPIO16 line (Sheet 2, B5) . The LED is wired active LOW - this is fairly standard practice. It means to turn the LED on we need to output a 0 (the pin is connected to 0V by the processor) and to turn it off we output a 1 (the pin is connected to VDD by the processor).

Unfortunately, again, lack of documentation is rife and we don't have schematics for the Raspberry-Pi 2 or plus models! This is important because the GPIO lines were re-jigged and as Florin has noted in the comments section, the Raspberry Pi Plus configuration has the LED on GPIO47, so I've added the changes in brackets below for the RPI B+ models (Which includes the RPI 2).

Back to the processor manual and we see that the first thing we need to do is set the GPIO pin to an output. This is done by setting the function of GPIO16 (GPIO47 RPI+) to an output.

Bits 18 to 20 in the GPIO Function Select 1 register control the GPIO16 pin.

Bits 21 to 23 in the GPIO Function Select 4 register control the GPIO47 pin. (RPI B+)

Bits 27 to 29 in the GPIO Function Select 2 register control the GPIO29 pin. (RPI3 B+) GPIO42 pin. (RPI4)

In C, we will generate a pointer to the register and use the pointer to write a value into the register. We will mark the register as volatile so that the compiler explicitly does what I tell it to. If we do not mark the register as volatile, the compiler is free to see that we do not access this register again and so to all intents and purposes the data we write will not be used by the program and the optimiser is free to throw away the write because it has no effect.

The effect however is definitely required, but is only externally visible (the mode of the GPIO pin changes). We inform the compiler through the volatile keyword to not take anything for granted on this variable and to simply do as I say with it:

We will use pre-processor definitions to change the base address of the GPIO peripheral depending on what RPI model is being targeted.

#if defined( RPI0 ) || defined( RPI1 )
    #define GPIO_BASE       0x20200000UL
#elif defined( RPI2 ) || defined( RPI3 )
    #define GPIO_BASE       0x3F200000UL
#elif defined( RPI4 )
    #define GPIO_BASE       0xFE200000UL
#else
    #error Unknown RPI Model!
#endif

In order to set GPIO16 as an output then we need to write a value of 1 in the relevant bits of the function select register. Here we can rely on the fact that this register is set to 0 after a reset and so all we need to do is set:

/* Assign the address of the GPIO peripheral (Using ARM Physical Address) */
gpio = (unsigned int*)GPIO_BASE;
gpio[LED_GPFSEL] |= (1 << LED_GPFBIT);

This code looks a bit messy, but we will tidy up and optimise later on. For now we just want to get to the point where we can light an LED and understand why it is lit!

The ARM GPIO peripherals have an interesting way of doing IO. It's actually a bit different to most other processor IO implementations. There is a SET register and a CLEAR register. Writing 1 to any bits in the SET register will SET the corresponding GPIO pins to 1 (logic high), and writing 1 to any bits in the CLEAR register will CLEAR the corresponding GPIO pins to 0 (logic low). There are reasons for this implementation over a register where each bit is a pin and the bit value directly relates to the pins output level, but it's beyond the scope of this tutorial.

So in order to light the LED we need to output a 0. We need to write a 1 to bit 16 in the CLEAR register:

*gpio_clear |= (1<<16);

Putting what we've learnt into the minimal example above gives us a program that compiles and links into an executable which should provide us with a Raspberry-Pi that lights the OK LED when it is powered. Here's the complete code we'll compile part-1/armc-02

/* The base address of the GPIO peripheral (ARM Physical Address) */
#if defined( RPI0 ) || defined( RPI1 )
    #define GPIO_BASE       0x20200000UL
#elif defined( RPI2 ) || defined( RPI3 )
    #define GPIO_BASE       0x3F200000UL
#elif defined( RPI4 )
    /* This comes from the linux source code:
       https://github.com/raspberrypi/linux/blob/rpi-4.19.y/arch/arm/boot/dts/bcm2838.dtsi */
    #define GPIO_BASE       0xFE200000UL
#else
    #error Unknown RPI Model!
#endif

/* TODO: Expand this to RPi4 as necessary */
#if defined( RPIBPLUS ) || defined( RPI2 )
    #define LED_GPFSEL      GPIO_GPFSEL4
    #define LED_GPFBIT      21
    #define LED_GPSET       GPIO_GPSET1
    #define LED_GPCLR       GPIO_GPCLR1
    #define LED_GPIO_BIT    15
#elif defined( RPI4 )
    /* The RPi4 model has the ACT LED attached to GPIO 42
       https://github.com/raspberrypi/linux/blob/rpi-4.19.y/arch/arm/boot/dts/bcm2838-rpi-4-b.dts */
    #define LED_GPFSEL      GPIO_GPFSEL4
    #define LED_GPFBIT      6
    #define LED_GPSET       GPIO_GPSET1
    #define LED_GPCLR       GPIO_GPCLR1
    #define LED_GPIO_BIT    10
#else
    #define LED_GPFSEL      GPIO_GPFSEL1
    #define LED_GPFBIT      18
    #define LED_GPSET       GPIO_GPSET0
    #define LED_GPCLR       GPIO_GPCLR0
    #define LED_GPIO_BIT    16
#endif

#define GPIO_GPFSEL0    0
#define GPIO_GPFSEL1    1
#define GPIO_GPFSEL2    2
#define GPIO_GPFSEL3    3
#define GPIO_GPFSEL4    4
#define GPIO_GPFSEL5    5

#define GPIO_GPSET0     7
#define GPIO_GPSET1     8

#define GPIO_GPCLR0     10
#define GPIO_GPCLR1     11

#define GPIO_GPLEV0     13
#define GPIO_GPLEV1     14

#define GPIO_GPEDS0     16
#define GPIO_GPEDS1     17

#define GPIO_GPREN0     19
#define GPIO_GPREN1     20

#define GPIO_GPFEN0     22
#define GPIO_GPFEN1     23

#define GPIO_GPHEN0     25
#define GPIO_GPHEN1     26

#define GPIO_GPLEN0     28
#define GPIO_GPLEN1     29

#define GPIO_GPAREN0    31
#define GPIO_GPAREN1    32

#define GPIO_GPAFEN0    34
#define GPIO_GPAFEN1    35

#define GPIO_GPPUD      37
#define GPIO_GPPUDCLK0  38
#define GPIO_GPPUDCLK1  39

/** GPIO Register set */
volatile unsigned int* gpio;

/** Simple loop variable */
volatile unsigned int tim;

/** Main function - we'll never return from here */
int main(void) __attribute__((naked));
int main(void)
{
    /* Assign the address of the GPIO peripheral (Using ARM Physical Address) */
    gpio = (unsigned int*)GPIO_BASE;

    /* Write 1 to the GPIO16 init nibble in the Function Select 1 GPIO
       peripheral register to enable GPIO16 as an output */
    gpio[LED_GPFSEL] |= (1 << LED_GPFBIT);

    /* Never exit as there is no OS to exit to! */
    while(1)
    {
        for(tim = 0; tim < 500000; tim++)
            ;

        /* Set the LED GPIO pin low ( Turn OK LED on for original Pi, and off
           for plus models )*/
        gpio[LED_GPCLR] = (1 << LED_GPIO_BIT);

        for(tim = 0; tim < 500000; tim++)
            ;

        /* Set the LED GPIO pin high ( Turn OK LED off for original Pi, and on
           for plus models )*/
        gpio[LED_GPSET] = (1 << LED_GPIO_BIT);
    }
}

We now compile with the no start files option too:

part-1/armc-02 $ ./build.sh rpi3

arm-none-eabi-gcc -nostartfiles \
    -mfloat-abi=hard \
    -mfpu=crypto-neon-fp-armv8 \
    -march=armv8-a+crc \
    -mcpu=cortex-a53 \
    armc-02.c \
    -o kernel.armc-02.rpi3.elf

The linker gives us a warning, which we'll sort out later, but importantly the linker has resolved the problem for us. This is the warning we'll see and ignore:

.../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000

As we can see from the compilation, the standard output is ELF format which is essentially an executable wrapped with information that an OS binary loader may need to know. We need a binary ARM executable that only includes machine code. We can extract this using the objcopy utility:

arm-none-eabi-objcopy kernel.elf -O binary kernel.img

A quick note about the ELF format¶

ELF is a file format used by some OS, including Linux which wraps the machine code with meta-data. The meta-data can be useful. In Linux and in fact most OS these days, running an executable doesn't mean the file gets loaded into memory and then the processor starts running from the address at which the file was loaded. There is usually an executable loader which uses formats like ELF to know more about the executable, for example the Function call interface might be different between different executables, this means that code can use different calling conventions which use different registers for different meanings when calling functions within a program. This can determine whether the executable loader will even allow the program to be loaded into memory or not. The ELF format meta-data can also include a list of all of the shared objects (SO, or DLL under Windows) that this executable also needs to have loaded. If any of the required libraries are not available, again the executable loader will not allow the file to be loaded and run.

This is all intended (and does) to increase system stability and compatibility.

We however, do not have an OS and the bootloader does not have any loader other than a disk read, directly copying the kernel.img file into memory at 0x8000 which is then where the ARM processor starts execution of machine code. Therefore we need to strip off the ELF meta-data and simply leave just the compiled machine code in the kernel.img file ready for execution.

Back to our example¶

This gives us the kernel.img binary file which should only contain ARM machine code. It should be tens of bytes long. You'll notice that kernel.elf on the otherhand is ~34Kb. Rename the kernel.img on your SD Card to something like old.kernel.img and save your new kernel.img to the SD Card.

Booting from this SD Card should now leave the OK LED on permanently. The normal startup is for the OK LED to be on, then extinguish. If it remains extinguished something went wrong with building or linking your program. Otherwise if the LED remains lit, your program has executed successfully.

A blinking LED is probably more appropriate to make sure that our code is definitely running. Let's quickly change the code to crudely blink an LED and then we'll look at sorting out the C library issues we had earlier as the C library is far too useful to not have access to it.

Compile the code in part-1/armc-03. The code listing is identical to part-1/armc-02 but the build scripts use objcopy to convert the ELF formatted binary to a raw binary ready to deploy on the SD card.

You can see that a binary image is now in the folder and is a much more sane size for some code that does so little:

part-1/armc-03 $ ./build.sh rpi3bp
    arm-none-eabi-gcc -g \
        -nostartfiles \
        -mfloat-abi=hard \
        -O0 \
        -DIOBPLUS \
        -DRPI3 \
        -mfpu=crypto-neon-fp-armv8 \
        -march=armv8-a+crc \
        -mcpu=cortex-a53 \
        armc-03.c \
        -o kernel.armc-03.rpi3bp.elf

.../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objcopy kernel.armc-03.rpi3bp.elf -O binary kernel.img

The kernel.img file contains just the ARM machine code and so is just a few hundred bytes.

part-1/armc-03 $ ll
total 28
drwxr-xr-x 2 brian brian     4096 Sep 21 01:05 ./
drwxr-xr-x 6 brian brian     4096 Sep 21 00:19 ../
-rw-r--r-- 1 brian brian     3894 Sep 21 00:19 armc-03.c
-rwxr-xr-x 1 brian brian     2805 Sep 21 01:05 build.sh*
-rwxr-xr-x 1 brian brian 16777216 Sep 21 01:05 card.armc-03.rpi3bp.img
-rwxr-xr-x 1 brian brian    35208 Sep 21 01:05 kernel.armc-03.rpi3bp.elf*
-rwxr-xr-x 1 brian brian      268 Sep 21 01:05 kernel.armc-03.rpi3bp.img*

Generating SD Cards¶

The next step is how to get this kernel.img file onto an SD Card so we can boot the card and run our newly compiled code.

A script, called make_card.sh was run during the build.sh script run. Have a look in the build.sh script to see the call near the end of the script. It does the work of generating an SD Card image that can be written directly to an SD card.

The /card/make_card.sh script is worth a look. It can generate a card image without the need to user super user priveleges. It always uses the latest firmware available from the RPi Foundation GitHub repository

Writing the image to the card can be done using cat so long as you know what the SD Card device is.

If you'd rather, you can use the write_card.sh script in the card directory which you can use interactively to select the SD Card.

If you prefer to do things manually you can insert the SD Card and then run dmesg | tail to view messages which will show you which device reference was used for the SD Card or else use lsblk to list all of the block devices available.

DON'T GET THE SD CARD DEVICE WRONG OR YOU'LL COMPLETELY WIPE OUT ANOTHER DISK!

When you know what disk to use, you can simply cat the image to the disk using cat kernel.armc-03.rpibp.img > /dev/sdg for example

Provided Binaries¶

As this is the first example where code should run and give you a visible output on your RPi, I've included the kernel binaries for each Raspberry-Pi board so that you can load the pre-built binary and see the LED flash before compiling your own kernel to make sure the build process is working for you. After this tutorial, you'll have to build your own binaries!

Although the code may appear to be written a little odd, please stick with it! There are reasons why it's written how it is. Now you can experiment a bit from a basic starting point, but beware - automatic variables won't work, and nor will initialised variables because we have no C Run Time support yet.

That will be where we start with Step 2 of Bare metal programming the Raspberry-Pi!