Copyright (c) Hyperion Entertainment and contributors.
The Hacking Way: Part 1 - First Steps
Contents
Author
Roman Kargin
Copyright (c) 2012 Roman Kargin
Proofread and grammar corrections by Daniel jedlicka.
Used by permission.
Introduction
Back in the past, I wanted to make the smallest possible executables on UNIX-ish operating systems (SunOS, Tru64, OS9, OpenVMS and others). As a result of my research I wrote a couple of small tutorials for various hacking-related magazines (like Phrack or x25zine). Doing the same on AmigaOS naturally became a topic of interest for me - even more so when I started seeing, in Amiga forums, questions like "Why are AmigaOS4 binaries bigger than they should be?" Therefore I believe that producing small OS4 executables could make an interesting topic for an article. Further in the text I'll explain how ldscripts can help the linker make non-aligned binaries, and cover various other aspects associated with the topic. I hope that at least for programmers the article will be an interesting and thought-provoking read.
Before you go on, please note that it is assumed here that you have basic programming skills and understanding of C and assembler, that you are familiar with BSD syntax, know how UNIX and AmigaOS3/4 work, and that you have the PPC V.4-ABI and ELF specification at hand. But if you don't, there's no need to stop reading as I'll try to cover the basics where necessary.
The Basics
To begin with, let's present and discuss some basic terms and concepts. We'll also dispel some popular myths.
The C standard library (libc)
Thirty years ago, when the C language developed so much that its different implementations started to pose a practical problem, the American National Institute of Standards (ANSI) formed a committee for the standardization of the language. The standard, generally referred to as ANSI C, was finally adopted in 1989 (this is why it is sometimes called C89). Part of this standard was a library including common functions, called the "C standard library", or "C library", or "libc". The library has been an inherent part of all subsequently adopted C standards.
Libc is platform-independent in the sense that it provides the same functionality regardless of operating system - be it UNIX, Linux, AmigaOS, OpenVMS, AROS, whatever. The actual implementation may vary from OS to OS. For example in UNIX, the most popular implementation of the C standard library is glibc (GNU Library C). But there are others: uClibc (for embedded Linux systems, without MMU), dietlibc (as the name suggests, it is meant to compile/link programs to the smallest possible size) or Newlib. Originally developed for a wide range of embedded systems, Newlib is the preferred C standard library in AmigaOS4 and is now part of the kernel.
On AmigaOS4, three implementations of libc are used: clib2, newlib and vclib. The GCC compiler supports clib2 and newlib, the VBCC compiler supports newlib and vclib.
clib2
This is an Amiga-specific implementation originally written from scratch by Olaf Barthel, with some ideas borrowed from the BSD libc implementation, libnix, etc. Under AmigaOS4, clib2 is becoming phased out. The GCC compiler distributed as part of the OS4 SDK uses Newlib by default (as if you used the -mcrt=newlib switch). An important note: clib2 is only available for static linking, while Newlib is opened at runtime (thus making your executables smaller). Clib2 is open source, the latest version can be found here: http://sourceforge.net/projects/clib2/
Newlib
A better and more modern libc implementation. While the AmigaOS4 version is closed source (all adaptations and additional work is done by the OS development team), it's based on the open source version of Newlib. The original version is maintained by RedHat developer Jeff Johnston, and is used in most commercial and non-commercical GCC ports for non-Linux embedded systems: http://www.sourceware.org/newlib/
Newlib does not cover the ANSI C99 standard only: it's an expanded library that also includes common POSIX functions (clib2 implements them as well). But certain POSIX functions - such as glob(), globfree(), or fork() - are missing; and while some of them are easy to implement, others are not - fork() being an example of the latter.
Newlib is also available as a shared object.
vclib
This library was made for the vbcc compiler. Like clib2 it is linked statically, but only provides ANSI C/C99 functions (i.e. no POSIX).
Myth #1: AmigaOS4 behaves like UNIX
From time to time you can hear voices saying that AmigaOS4 is becoming UNIX. This popular myth stems from three main sources. First, many games, utilities and libraries are ported over from the UNIX world. Second, AmigaOS4 uses genuine ELF, the standard binary file format used in UNIX and UNIX-like systems. Third, the OS supports, as of version 4.1, shared objects. All of this enables AmigaOS4 to provide more stuff for both programmers and users, and to complement native applications made for OS4. Today, it is quite normal that an operating system provides all the popular third-party libraries like SDL, OpenGL, Cairo, Boost, OpenAL, FreeType etc. Not only they make software development faster but they also allow platform-independent programming.
Yet getting close to UNIX or Linux in terms of software or programming tools does not mean that AmigaOS4 behaves in the same way as regards, for example, library initialization, passing arguments or system calls. On AmigaOS4 there are no "system calls" as they are on UNIXes, where you can simply pass arguments to registers and then use an instruction (like "int 0x80h" on x86 Linux, "trap 0" on M68 Linux, or "sc" on some PPC/POWER CPU based OSes), which will cause a software interrupt and enter the kernel in supervisor mode. The concept of AmigaOS is completely different. There is no kernel as such; Amiga's Kickstart is actually a collection of libraries (of which "kernel.kmod" is just one module - a new incarnation of the old exec.library). Also, an AmigaOS program, when calling a library function, won’t enter supervisor mode but rather stays in user mode when the function is executed.
Since the very first version of the OS that came with the Amigas in 1985, you must open a library and use its vector table to execute a library function, so there’s no "system call" involved. The pointer to the first library (exec.library) is always at address 4 and that hasn’t changed in AmigaOS4. By the way, the Quark kernel on MorphOS uses the "sc" instruction for system calls (so it does support them) but the programmers will never use them because they work with the libraries (just like you do on AmigaOS4).
When you program in assembler under AmigaOS4, you cannot do much until you initialize and open all the needed libraries (unlike, for example, on UNIX where the kernel does all the necessary initialisation for you).
Myth #2: AmigaOS4 binaries are fat
This misunderstanding stems from the fact that the latest AmigaOS4 SDK uses a newer version of binutils, which now aligns ELF segments to 64K so that they can be easily loaded with mmap(). Binutils are, naturally, developed with regard to UNIX-like OSes where the mmap() function actually exists so the modifications make sense - but since mmap() isn’t a genuine AmigaOS function (it’s just a wrapper using AllocVec() etc.), this kind of alignment is not needed for AmigaOS.
Luckily, the size difference is only noticeable in small programs, like Hello World, where the resulting executable grows to 65KB. Which of course is unbelievable and looks like something is wrong. But once you start programming for real and produce bigger programs, the code fills up the ELF segments as required, there’s little need for padding, and so there’s little size difference in the end. The worst-case scenario is ~64KB of extra padding, which only happens (as we said) in very small programs, or when you’re out of luck and your code only just exceeds a boundary between two segments.
It is likely that a newer SDK will adapt binutils for AmigaOS4 and the padding will no longer be needed. Currently, to avoid alignment you can use the "-N" switch, which tells the linker to use an ldscript that builds non-aligned binaries. Check the SDK:gcc/ppc-AmigaOS/lib/ldscripts directory; all the files ending with an "n" (like “AmigaOS.xn” or “ELF32ppc.xbn”) are linker scripts that ensure non-aligned builds. Such a script will be used when the GCC compiler receives the “-N” switch. See the following:
7/0.RAM Disk:> type hello.c
#include <stdio.h> main() { printf("aaaa"); }
6/1.Work:> gcc hello.c -o hello 6/1.Work:> strip hello 6/1.Work:> filesize format=%s hello 65k 6/1.Work:> hello aaaa
6/1.Work:> gcc -N hello.c -o hello 6/1.Work:> strip hello 6/1.Work:> filesize format=%s hello 5480 6/1.work:> hello aaaa
Genuine ELF executables
Just like libc, the Executable and Linkable Format (ELF) is a common standard. It is a file format used for executables, objects and shared libraries. It gets the most attention in connection with UNIX but it is really used on numerous other operating systems: all UNIX derivatives (Solaris, Irix, Linux, BSD, etc.), OpenVMS, several OSes used in mobile phones/devices, game consoles such as the PlayStation, the Wii and others. PowerUP, the PPC Amiga kernel made by Phase5 back in the 1990s used the ELF format as well.
A more detailed description of the ELF internals will be given later; all you need to know for now is that the executable ELF file contains headers (the main header, and headers for the various sections) and sections/segments. The ELF file layout looks like this:
Compared to other Amiga and Amiga-like operating systems, AmigaOS4 uses genuine ELF executables, while for example MorphOS uses relocatable objects (their own BFD backend), which contain the __abox__ symbol.
The advantage of objects is that they are smaller and that relocations are always included. But there is a drawback as well: the linker will not tell you automatically whether all symbols have been resolved because an object is allowed to have unresolved references. (On the other hand, vlink could always detect unresolved references when linking PowerUP and MorphOS objects because it sees them as a new format.) This is why ELF shared objects cannot be used easily (though it’s still kind of possible using some hacks), and it explains why the OS4 team decided to go for real executables.
By specification, ELF files are meant to be executed from a fixed absolute address, and so AmigaOS4 programs need to be relocated (because all processes share the same address space). To do that, the compiler is passed the -q switch ("keep relocations"). Relocations are handled by the MMU, which will create a new virtual address space for each new process.
If you look at the linker scripts provided to build OS4 executables (in the SDK:gcc/ppc-AmigaOS/lib/ldscripts directory), you’ll find the following piece of code:
ENTRY(_start) .... SECTIONS { PROVIDE (__executable_start = 0x01000000); . = 0x01000000 + SIZEOF_HEADERS; [...]
As you can see, AmigaOS4 executables look like they are linked to be executed at an absolute address of 0x01000000. But this is only faked; the ELF loader and relocations will recalculate all absolute addresses in the program before it executes. Without relocations, each new process would be loaded at 0x01000000, where it would crash happily due to overwriting certain important areas, and because of other reasons. You may ask why 0x01000000 is used at all, considering that it’s just a placeholder and any number (be it 0x00000000, 0x99999999, 0xDEADBEEF or 0xFEEDFACE) can be used instead. We can speculate and assume that 0x01000000 was chosen because it is the beginning of the memory map accessible for instruction execution. But anyway, the value is currently not important.
To perform a test, let’s see what happens if we build our binary without the "-q" switch (that is, without making the binary relocatable):
7/0.RAM Disk:> type test.c
#include <stdio.h> main() { printf("aaaa"); }
shell:> gcc test.c -S -o test.s shell:> as test.s -o test shell:> ld test.o -o test /SDK/newlib/lib/crtbegin.o /SDK/newlib/lib/LibC.a /SDK/newlib/lib/crtend.o
When you run the executable, you get a DSI with the 80000003 error, on the 0x1c offset in _start (i.e. the code from the crtbegin.o). Ignoring the error will produce a yellow recoverable alert. The crash occurs because we have compiled an ELF file to be executed at the 0x01000000 address, and as no "-q" switch was used, the remapping did not take place. To better understand why it happens you can check the crtbegin.o code, i.e. the code added to the binary at linking stage, which contains all the OS-dependent initialisations. If you know nothing about PPC assembler you can skip the following part for now and return when you’ve read the entire article:
6/0.RAM Disk:> objdump -D --no-show-raw-insn --stop-address=0x10000d0 test | grep -A8 "_start"
010000b0 <_start>: 10000b0: stwu r1,-64(r1) # 10000b4: mflr r0 # prologue (reserve 64 byte stack frame) 10000b8: stw r0,68(r1) # 10000bc: lis r9,257 # 257 is loaded into the higher half-word (msw) of r9 (257 << 16) 10000c0: stmw r25,36(r1) # offset into the stack frame 10000c4: mr r25,r3 # save command line stack pointer 10000c8: mr r27,r13 # r13 can be used as small data pointer in the V.4-ABI, and it also saved here 10000cc: stw r5,20(r9) # Write value (257 << 16) + 20 = 0x01010014 to the r5 (DOSBase pointer)
The address in the last instruction points to a data segment starting at 0x010100000. But the address is invalid because, without any relocation, there is no data there and the MMU produces a data storage interrupt (DSI) error.
Of course it is possible to make a working binary without relocation, if the program doesn’t need to relocate and you are lucky enough to have the 0x1000000 address free of important contents. And of course you can use a different address for the entry point, by hex-editing the binary or at build-time using self-made ldscripts. Making a non-relocatable binary will be discussed further in the text.
PowerPC assembly
In case you are not familiar and have no experience with PowerPC assembly, the following section will explain some basic terms and concepts.
Registers
The PowerPC processor architecture provides 32 general-purpose registers and 32 floating-point registers. We’ll only be interested in certain general-purpose registers and a couple of special ones. The following overview describes the registers as they are used under AmigaOS4 (not UNIX):
General-purpose registers
r0 - volatile register that may be modified during function linkage
r1 - stack-frame pointer, always valid
r2 - system reserved register
r3 - command-line pointer
r4 - command-line length
r5 - DOSBase pointer
- The contents of registers r3-r5 is only valid when the program starts)
r6 - r10 - volatile registers used for parameter passing
r11 - r12 - volatile registers that may be modified during function linkage
r13 - small data area pointer register
r14 - r30 - registers used for local variables; they are non-volatile; functions have to save and restore them
r31 - preferred by GCC in position-independent code (e.g. in shared objects) as a base pointer into the GOT section; however, the pointer can also be stored in another register Important note: This general-purpose register description shows that arguments can only be passed in registers r3 and above (that is, not in r0, r1 or r2). You need to keep that in mind when assembling/disassembling under AmigaOS4.
Some special registers
lr - link register; stores the "ret address" (i.e. the address to which a called function normally returns)
cr - condition register
Instructions
There are many different PowerPC instructions that serve many different purposes: there are branch instructions, condition register instructions, instructions for storage access, integer arithmetic, comparison, logic, rotation, cache control, processor management, and so on. In fact there are so many instructions that it would make no sense to cover them all here. You can download Freescale’s Green Book (see the Links section at the end of the article) if you are interested in a more detailed description but we’ll just stick to a number of instructions that are interesting and useful for our purposes.
b Relative branch on address (example: "b 0x7fcc7244"). Note that there are both relative and absolute branches (ba). Relative branches can branch to a distance of -32 to +32MB. Absolute branches can jump to 0x00000000 - 0x01fffffc and 0xfe000000 - 0xfffffffc. However, absolute branches will not be used in AmigaOS programs.
bctr Branch with count register. It uses the count register as a target address, so that the link register with, say, our return address remains unmodified.
lis Stands for "load immediate shifted". The PowerPC instruction set doesn’t allow loading a 32-bit constant with a single instruction. You will always need two instructions that load the upper and the lower 16-bit half, respectively. For example, if you want to load 0x12345678 into register r3, you need to do the following:
lis %r3,0x1234 ori %r3,%r3,0x5678
Later in the article you’ll notice that this kind of construction is used all the time.
mtlr "move to link register". In reality this is just a mnemonic for "mtspr 8,r". The instruction is typically used for transferring an address from register r0 to the link register (lr), but you can of course move contents to lr from other registers, not just r0.
stwu "store word and update" (all instructions starting with “st” are for storing). For example, stwu %r1, -16(%r1) stores the contents of register r1 into a memory location whose effective address is calculated by taking the value of 16 from r1. At the same time, r1 is updated to contain the effective address. As we already know, register r1 contains the stack-frame pointer so our instruction stores the contents of the register to a position at offset -16 from the current top of stack and then decrements the stack pointer by 16.
The PowerPC processor has many more instructions and various kinds of mnemonics, all of which are well covered in numerous PPC-related tutorials, so to avoid copying-and-pasting (and wasting space here) we have described a few that happen to be used very often. You’ll need to refer to the relevant documentation if you want to read more about the PowerPC instruction set (see Links below).