Copyright (c) Hyperion Entertainment and contributors.

The Hacking Way: Part 1 - First Steps

From AmigaOS Documentation Wiki
Revision as of 19:16, 24 April 2012 by Steven Solie (talk | contribs)
Jump to navigation Jump to search

Author

Roman Kargin
Copyright (c) 2012 Roman Kargin
Proofread and grammar corrections by Daniel jedlicka.
Used by permission.

Introduction

Back in the past, I wanted to make the smallest possible executables on UNIX-ish operating systems (SunOS, Tru64, OS9, OpenVMS and others). As a result of my research I wrote a couple of small tutorials for various hacking-related magazines (like Phrack or x25zine). Doing the same on AmigaOS naturally became a topic of interest for me - even more so when I started seeing, in Amiga forums, questions like "Why are AmigaOS4 binaries bigger than they should be?" Therefore I believe that producing small OS4 executables could make an interesting topic for an article. Further in the text I'll explain how ldscripts can help the linker make non-aligned binaries, and cover various other aspects associated with the topic. I hope that at least for programmers the article will be an interesting and thought-provoking read.

Before you go on, please note that it is assumed here that you have basic programming skills and understanding of C and assembler, that you are familiar with BSD syntax, know how UNIX and AmigaOS3/4 work, and that you have the PPC V.4-ABI and ELF specification at hand. But if you don't, there's no need to stop reading as I'll try to cover the basics where necessary.

The Basics

To begin with, let's present and discuss some basic terms and concepts. We'll also dispel some popular myths.

The C standard library (libc)

Thirty years ago, when the C language developed so much that its different implementations started to pose a practical problem, the American National Institute of Standards (ANSI) formed a committee for the standardization of the language. The standard, generally referred to as ANSI C, was finally adopted in 1989 (this is why it is sometimes called C89). Part of this standard was a library including common functions, called the "C standard library", or "C library", or "libc". The library has been an inherent part of all subsequently adopted C standards.

Libc is platform-independent in the sense that it provides the same functionality regardless of operating system - be it UNIX, Linux, AmigaOS, OpenVMS, AROS, whatever. The actual implementation may vary from OS to OS. For example in UNIX, the most popular implementation of the C standard library is glibc (GNU Library C). But there are others: uClibc (for embedded Linux systems, without MMU), dietlibc (as the name suggests, it is meant to compile/link programs to the smallest possible size) or Newlib. Originally developed for a wide range of embedded systems, Newlib is the preferred C standard library in AmigaOS4 and is now part of the kernel.

On AmigaOS4, three implementations of libc are used: clib2, newlib and vclib. The GCC compiler supports clib2 and newlib, the VBCC compiler supports newlib and vclib.

clib2

This is an Amiga-specific implementation originally written from scratch by Olaf Barthel, with some ideas borrowed from the BSD libc implementation, libnix, etc. Under AmigaOS4, clib2 is becoming phased out. The GCC compiler distributed as part of the OS4 SDK uses Newlib by default (as if you used the -mcrt=newlib switch). An important note: clib2 is only available for static linking, while Newlib is opened at runtime (thus making your executables smaller). Clib2 is open source, the latest version can be found here: http://sourceforge.net/projects/clib2/

Newlib

A better and more modern libc implementation. While the AmigaOS4 version is closed source (all adaptations and additional work is done by the OS development team), it's based on the open source version of Newlib. The original version is maintained by RedHat developer Jeff Johnston, and is used in most commercial and non-commercical GCC ports for non-Linux embedded systems: http://www.sourceware.org/newlib/

Newlib does not cover the ANSI C99 standard only: it's an expanded library that also includes common POSIX functions (clib2 implements them as well). But certain POSIX functions - such as glob(), globfree(), or fork() - are missing; and while some of them are easy to implement, others are not - fork() being an example of the latter.

Newlib is also available as a shared object.

vclib

This library was made for the vbcc compiler. Like clib2 it is linked statically, but only provides ANSI C/C99 functions (i.e. no POSIX).

Myth #1: AmigaOS4 behaves like UNIX

From time to time you can hear voices saying that AmigaOS4 is becoming UNIX. This popular myth stems from three main sources. First, many games, utilities and libraries are ported over from the UNIX world. Second, AmigaOS4 uses genuine ELF, the standard binary file format used in UNIX and UNIX-like systems. Third, the OS supports, as of version 4.1, shared objects. All of this enables AmigaOS4 to provide more stuff for both programmers and users, and to complement native applications made for OS4. Today, it is quite normal that an operating system provides all the popular third-party libraries like SDL, OpenGL, Cairo, Boost, OpenAL, FreeType etc. Not only they make software development faster but they also allow platform-independent programming.

Yet getting close to UNIX or Linux in terms of software or programming tools does not mean that AmigaOS4 behaves in the same way as regards, for example, library initialization, passing arguments or system calls. On AmigaOS4 there are no "system calls" as they are on UNIXes, where you can simply pass arguments to registers and then use an instruction (like "int 0x80h" on x86 Linux, "trap 0" on M68 Linux, or "sc" on some PPC/POWER CPU based OSes), which will cause a software interrupt and enter the kernel in supervisor mode. The concept of AmigaOS is completely different. There is no kernel as such; Amiga's Kickstart is actually a collection of libraries (of which "kernel.kmod" is just one module - a new incarnation of the old exec.library). Also, an AmigaOS program, when calling a library function, won’t enter supervisor mode but rather stays in user mode when the function is executed.

HackingWayPart1-1.png

Since the very first version of the OS that came with the Amigas in 1985, you must open a library and use its vector table to execute a library function, so there’s no "system call" involved. The pointer to the first library (exec.library) is always at address 4 and that hasn’t changed in AmigaOS4. By the way, the Quark kernel on MorphOS uses the "sc" instruction for system calls (so it does support them) but the programmers will never use them because they work with the libraries (just like you do on AmigaOS4).

When you program in assembler under AmigaOS4, you cannot do much until you initialize and open all the needed libraries (unlike, for example, on UNIX where the kernel does all the necessary initialisation for you).

Myth #2: AmigaOS4 binaries are fat

This misunderstanding stems from the fact that the latest AmigaOS4 SDK uses a newer version of binutils, which now aligns ELF segments to 64K so that they can be easily loaded with mmap(). Binutils are, naturally, developed with regard to UNIX-like OSes where the mmap() function actually exists so the modifications make sense - but since mmap() isn’t a genuine AmigaOS function (it’s just a wrapper using AllocVec() etc.), this kind of alignment is not needed for AmigaOS.

Luckily, the size difference is only noticeable in small programs, like Hello World, where the resulting executable grows to 65KB. Which of course is unbelievable and looks like something is wrong. But once you start programming for real and produce bigger programs, the code fills up the ELF segments as required, there’s little need for padding, and so there’s little size difference in the end. The worst-case scenario is ~64KB of extra padding, which only happens (as we said) in very small programs, or when you’re out of luck and your code only just exceeds a boundary between two segments.

It is likely that a newer SDK will adapt binutils for AmigaOS4 and the padding will no longer be needed. Currently, to avoid alignment you can use the "-N" switch, which tells the linker to use an ldscript that builds non-aligned binaries. Check the SDK:gcc/ppc-AmigaOS/lib/ldscripts directory; all the files ending with an "n" (like “AmigaOS.xn” or “ELF32ppc.xbn”) are linker scripts that ensure non-aligned builds. Such a script will be used when the GCC compiler receives the “-N” switch. See the following:

7/0.RAM Disk:> type hello.c
#include <stdio.h>
main()
{
  printf("aaaa");
}
6/1.Work:> gcc hello.c -o hello
6/1.Work:> strip hello
6/1.Work:> filesize format=%s hello 
65k
6/1.Work:> hello
aaaa
6/1.Work:> gcc -N hello.c -o hello
6/1.Work:> strip hello
6/1.Work:> filesize format=%s hello 
5480
6/1.work:> hello
aaaa

Genuine ELF executables

Just like libc, the Executable and Linkable Format (ELF) is a common standard. It is a file format used for executables, objects and shared libraries. It gets the most attention in connection with UNIX but it is really used on numerous other operating systems: all UNIX derivatives (Solaris, Irix, Linux, BSD, etc.), OpenVMS, several OSes used in mobile phones/devices, game consoles such as the PlayStation, the Wii and others. PowerUP, the PPC Amiga kernel made by Phase5 back in the 1990s used the ELF format as well.

A more detailed description of the ELF internals will be given later; all you need to know for now is that the executable ELF file contains headers (the main header, and headers for the various sections) and sections/segments. The ELF file layout looks like this:

HackingWayPart1-2.png

Compared to other Amiga and Amiga-like operating systems, AmigaOS4 uses genuine ELF executables, while for example MorphOS uses relocatable objects (their own BFD backend), which contain the __abox__ symbol.

The advantage of objects is that they are smaller and that relocations are always included. But there is a drawback as well: the linker will not tell you automatically whether all symbols have been resolved because an object is allowed to have unresolved references. (On the other hand, vlink could always detect unresolved references when linking PowerUP and MorphOS objects because it sees them as a new format.) This is why ELF shared objects cannot be used easily (though it’s still kind of possible using some hacks), and it explains why the OS4 team decided to go for real executables.

By specification, ELF files are meant to be executed from a fixed absolute address, and so AmigaOS4 programs need to be relocated (because all processes share the same address space). To do that, the compiler is passed the -q switch ("keep relocations"). Relocations are handled by the MMU, which will create a new virtual address space for each new process.

If you look at the linker scripts provided to build OS4 executables (in the SDK:gcc/ppc-AmigaOS/lib/ldscripts directory), you’ll find the following piece of code:

ENTRY(_start)
....
SECTIONS
{
 PROVIDE (__executable_start = 0x01000000); . = 0x01000000 + SIZEOF_HEADERS;
[...]

As you can see, AmigaOS4 executables look like they are linked to be executed at an absolute address of 0x01000000. But this is only faked; the ELF loader and relocations will recalculate all absolute addresses in the program before it executes. Without relocations, each new process would be loaded at 0x01000000, where it would crash happily due to overwriting certain important areas, and because of other reasons. You may ask why 0x01000000 is used at all, considering that it’s just a placeholder and any number (be it 0x00000000, 0x99999999, 0xDEADBEEF or 0xFEEDFACE) can be used instead. We can speculate and assume that 0x01000000 was chosen because it is the beginning of the memory map accessible for instruction execution. But anyway, the value is currently not important.

To perform a test, let’s see what happens if we build our binary without the "-q" switch (that is, without making the binary relocatable):

7/0.RAM Disk:> type test.c
#include <stdio.h>
main()
{
  printf("aaaa");
}
shell:> gcc test.c -S -o test.s
shell:> as test.s -o test
shell:> ld test.o -o test /SDK/newlib/lib/crtbegin.o /SDK/newlib/lib/LibC.a  /SDK/newlib/lib/crtend.o

When you run the executable, you get a DSI with the 80000003 error, on the 0x1c offset in _start (i.e. the code from the crtbegin.o). Ignoring the error will produce a yellow recoverable alert. The crash occurs because we have compiled an ELF file to be executed at the 0x01000000 address, and as no "-q" switch was used, the remapping did not take place. To better understand why it happens you can check the crtbegin.o code, i.e. the code added to the binary at linking stage, which contains all the OS-dependent initialisations. If you know nothing about PPC assembler you can skip the following part for now and return when you’ve read the entire article:

6/0.RAM Disk:> objdump -D --no-show-raw-insn --stop-address=0x10000d0 test | grep -A8 "_start"
010000b0 <_start>:
 
10000b0:       stwu    r1,-64(r1)    #
10000b4:       mflr    r0            # prologue (reserve 64 byte stack frame)
10000b8:       stw     r0,68(r1)     #
 
10000bc:       lis     r9,257        # 257 is loaded into the higher half-word (msw) of r9 (257 << 16)
10000c0:       stmw    r25,36(r1)    # offset into the stack frame 
10000c4:       mr      r25,r3        # save command line stack pointer
10000c8:       mr      r27,r13       # r13 can be used as small data pointer in the V.4-ABI, and it also saved here
10000cc:       stw     r5,20(r9)     # Write value (257 << 16) + 20 = 0x01010014 to the r5 (DOSBase pointer)

The address in the last instruction points to a data segment starting at 0x010100000. But the address is invalid because, without any relocation, there is no data there and the MMU produces a data storage interrupt (DSI) error.

Of course it is possible to make a working binary without relocation, if the program doesn’t need to relocate and you are lucky enough to have the 0x1000000 address free of important contents. And of course you can use a different address for the entry point, by hex-editing the binary or at build-time using self-made ldscripts. Making a non-relocatable binary will be discussed further in the text.

PowerPC assembly

In case you are not familiar and have no experience with PowerPC assembly, the following section will explain some basic terms and concepts.

Registers

The PowerPC processor architecture provides 32 general-purpose registers and 32 floating-point registers. We’ll only be interested in certain general-purpose registers and a couple of special ones. The following overview describes the registers as they are used under AmigaOS4 (not UNIX):

General-purpose registers

r0 - volatile register that may be modified during function linkage

r1 - stack-frame pointer, always valid

r2 - system reserved register

r3 - command-line pointer

r4 - command-line length

r5 - DOSBase pointer

  • The contents of registers r3-r5 is only valid when the program starts)

r6 - r10 - volatile registers used for parameter passing

r11 - r12 - volatile registers that may be modified during function linkage

r13 - small data area pointer register

r14 - r30 - registers used for local variables; they are non-volatile; functions have to save and restore them

r31 - preferred by GCC in position-independent code (e.g. in shared objects) as a base pointer into the GOT section; however, the pointer can also be stored in another register Important note: This general-purpose register description shows that arguments can only be passed in registers r3 and above (that is, not in r0, r1 or r2). You need to keep that in mind when assembling/disassembling under AmigaOS4.

Some special registers

lr - link register; stores the "ret address" (i.e. the address to which a called function normally returns)

cr - condition register

Instructions

There are many different PowerPC instructions that serve many different purposes: there are branch instructions, condition register instructions, instructions for storage access, integer arithmetic, comparison, logic, rotation, cache control, processor management, and so on. In fact there are so many instructions that it would make no sense to cover them all here. You can download Freescale’s Green Book (see the Links section at the end of the article) if you are interested in a more detailed description but we’ll just stick to a number of instructions that are interesting and useful for our purposes.

b Relative branch on address (example: "b 0x7fcc7244"). Note that there are both relative and absolute branches (ba). Relative branches can branch to a distance of -32 to +32MB. Absolute branches can jump to 0x00000000 - 0x01fffffc and 0xfe000000 - 0xfffffffc. However, absolute branches will not be used in AmigaOS programs.

bctr Branch with count register. It uses the count register as a target address, so that the link register with, say, our return address remains unmodified.

lis Stands for "load immediate shifted". The PowerPC instruction set doesn’t allow loading a 32-bit constant with a single instruction. You will always need two instructions that load the upper and the lower 16-bit half, respectively. For example, if you want to load 0x12345678 into register r3, you need to do the following:

lis %r3,0x1234
ori %r3,%r3,0x5678

Later in the article you’ll notice that this kind of construction is used all the time.

mtlr "move to link register". In reality this is just a mnemonic for "mtspr 8,r". The instruction is typically used for transferring an address from register r0 to the link register (lr), but you can of course move contents to lr from other registers, not just r0.

stwu "store word and update" (all instructions starting with “st” are for storing). For example, stwu %r1, -16(%r1) stores the contents of register r1 into a memory location whose effective address is calculated by taking the value of 16 from r1. At the same time, r1 is updated to contain the effective address. As we already know, register r1 contains the stack-frame pointer so our instruction stores the contents of the register to a position at offset -16 from the current top of stack and then decrements the stack pointer by 16.

The PowerPC processor has many more instructions and various kinds of mnemonics, all of which are well covered in numerous PPC-related tutorials, so to avoid copying-and-pasting (and wasting space here) we have described a few that happen to be used very often. You’ll need to refer to the relevant documentation if you want to read more about the PowerPC instruction set (see Links below).

Function prologue and epilogue

When a C function executes, its code – seen from the assembler perspective – will contain two parts called the prologue (at the beginning of the function) and the epilogue (at the end of the function). The purpose of these parts is to save the return address so that the function knows where to jump after the subroutine is finished.

stwu %r1,-16(%r1)    
mflr %r0             # prologue, reserve 16 byte stack frame
stw %r0,20(%r1)      
 
...
 
lwz %r0,20(%r1)      
addi %r1,%r1,16      #  epilogue, restore back
mtlr %r0              
blr        

The prologue code generally opens a stack frame with a stwu instruction that increments register r1 and stores the old value at the first address of the new frame. The epilogue code just loads r1 with the old stack value.

C programmers needn’t worry at all about the prologue and epilogue because the compiler will add them to their functions automatically. When you write your programs in pure assembler you can skip the prologue and the epilogue if you don’t need to keep the return address.

Plus, a new stack frame doesn’t need to be allocated for functions that do not call any subroutine. By the way, the V.4-ABI (application binary interface) defines a specific layout of the stack frame and stipulates that it should be aligned to 16 bytes.

Writing programs in assembler

There are two ways to write assembler programs under AmigaOS4:

--using libc (all initializations are done by crtbegin.o/crtend.o and libc is attached to the binary) --the old way (all initializations - opening libraries, interfaces etc. - have to be done manually in the code)

The advantage of using libc is that you can run your code "out of the box" and that all you need to know is the correct offsets to the function pointers. On the minus side, the full library is attached to the binary, making it bigger. Sure, a size difference of ten or even a hundred kilobytes doesn’t play a big role these days – but here in this article we’re going down the old hacking way (that’s why we’re fiddling with assembler at all) so let’s call it a drawback.

The advantage of not using libc is that you gain full control of your program, you can only use the functions you need, and the resulting binary will be as small as possible (a fully working binary can have as little as 100 bytes in size). The drawback is that you have to initialize everything manually.

We’ll first discuss assembler programming with the use of libc.

Assembler programming using libc

To illustrate how this works we’ll compile a Newlib-based binary (the default GCC setting) using the –g switch (“include debugging information”) and then put the GDB debugger on the job:

#include <stdio.h>
 
main()
{
   printf("aaaa");
   exit(0);
}
6/0.RAM Disk:> gcc -gstabs -O2 2.c -o 2
2.c: In function 'main':
2.c:6: warning: incompatible implicit declaration of built-in function 'exit'
 
6/0.RAM Disk:> GDB -q 2
(GDB) break main
Breakpoint 1 at 0x7fcc7208: file 2.c, line 4.
(GDB) r
Starting program: /RAM Disk/2 
BS 656d6ed8
Current action: 2
 
Breakpoint 1, main () at 2.c:4
4       {
(GDB) disas
Dump of assembler code for function main:
0x7fcc7208 <main+0>:    stwu    r1,-16(r1)
0x7fcc720c <main+4>:    mflr    r0
0x7fcc7210 <main+8>:    lis     r3,25875         ; that addr
0x7fcc7214 <main+12>:   addi    r3,r3,-16328     ; on our string
0x7fcc7218 <main+16>:   stw     r0,20(r1)
0x7fcc721c <main+20>:   crclr   4*cr1+eq
0x7fcc7220 <main+24>:   bl      0x7fcc7234 <printf>
0x7fcc7224 <main+28>:   li      r3,0
0x7fcc7228 <main+32>:   bl      0x7fcc722c <exit>
End of assembler dump.
(GDB) 

Now we’ll use GDB to disassemble the printf() and exit() functions from Newlib’s LibC.a. As mentioned above, Newlib is used by default, there’s no need to use the –mcrt switch unless we want clib2 instead (in which case we’d compile the source with “-mcrt=clib2”).

(GDB) disas printf
Dump of assembler code for function printf:
0x7fcc723c <printf+0>:  li      r12,1200
0x7fcc7240 <printf+4>:  b       0x7fcc7244 <__NewLibCall>
End of assembler dump.
(GDB)
 
(GDB) disas exit
Dump of assembler code for function exit:
0x7fcc7234 <exit+0>:    li      r12,1620
0x7fcc7238 <exit+4>:    b       0x7fcc7244 <__NewLibCall>
End of assembler dump.
(GDB) 

You can see that register r12 contains some values depending on the function - they are function pointer offsets in Newlib’s interface structure (INewLib). Then there’s the actual jump to __NewLibCall, so let’s have a look at it:

(GDB) disas __NewLibCall
Dump of assembler code for function __NewLibCall:
0x7fcc7244 <__NewLibCall+0>:    lis     r11,26006
0x7fcc7248 <__NewLibCall+4>:    lwz     r0,-25500(r11)
0x7fcc724c <__NewLibCall+8>:    lwzx    r11,r12,r0
0x7fcc7250 <__NewLibCall+12>:   mtctr   r11
0x7fcc7254 <__NewLibCall+16>:   bctr
End of assembler dump.
(GDB)

Of course you can use "objdump" (like MorphOS developers do):

6/0.RAM Disk:> objdump -d 1 | grep -A5 "<__NewLibCall>:"
01000280 <__NewLibCall>:
1000280:       3d 60 01 01     lis     r11,257
1000284:       80 0b 00 24     lwz     r0,36(r11)
1000288:       7d 6c 00 2e     lwzx    r11,r12,r0
100028c:       7d 69 03 a6     mtctr   r11
1000290:       4e 80 04 20     bctr

But using GDB is more comfortable: you don’t need to scroll through the full objdump output, or search in it with grep, etc. You can, too, obtain assembler output by compiling the source with the –S switch but GDB makes it possible to get as deep into the code as you wish (in fact down to the kernel level).

We will now remove the prologue (because we don’t need it in this case) and reorganize the code a bit:

   .globl main
main:
        lis %r3,.msg@ha          #
        la %r3,.msg@l(%r3)       # printf("aaaa");
        bl printf                #
 
        li %r3,0                 # exit(0);
        bl exit                  #  
 
.msg:
        .string "aaaa"
6/0.RAM Disk:> as test.s -o test.o
6/0.RAM Disk:> ld -N -q test.o -o test /SDK/newlib/lib/crtbegin.o /SDK/newlib/lib/LibC.a /SDK/newlib/lib/crtend.o
6/0.RAM Disk:> strip test 
6/0.RAM Disk:> filesize format=%s test
5360
6/0.RAM Disk:> test
aaaa
6/0.RAM Disk:> 

When we compile our Hello World program in C (with the -N switch and stripping, of course) it is 5504 bytes in size; our assembler code gives 5360 bytes. Nice, but let’s try to reduce it some more (even if we’ll still keep libc attached). Instead of branching to the functions themselves (“bl function”) we’ll use function pointer offsets and branch to __NewLibCall:

   .globl main
main:
        #printf("aaaa")
 
        lis %r3,.msg@ha          # arg1 part1
        la %r3,.msg@l(%r3)       # arg1 part2
        li %r12, 1200            # 1200 - pointer offset to function
        b __NewLibCall
 
        #exit(0)
 
        li %r3, 0               # arg1
        li %r12, 1620           # 1620 - pointer offset to function
        b __NewLibCall          
 
.msg:
        .string "aaaa"
6/0.RAM Disk:> as test.s -o test.o
6/0.RAM Disk:> ld -N -q test.o -o test /SDK/newlib/lib/crtbegin.o /SDK/newlib/lib/LibC.a /SDK/newlib/lib/crtend.o
6/0.RAM Disk:> strip test 
6/0.RAM Disk:> filesize format=%s test
5336
6/0.RAM Disk:> test
aaaa
6/0.RAM Disk:>