Copyright (c) Hyperion Entertainment and contributors.

Exec Memory Allocation

From AmigaOS Documentation Wiki
Jump to navigation Jump to search

Exec Memory Allocation

Exec manages all of the free memory currently available in the system. Using a slab allocation system, Exec keeps track of memory and provides the functions to allocate and access it.

When an application needs some memory, it can either declare the memory statically within the program or it can ask Exec for some memory. When Exec receives a request for memory, it searches its free memory regions to find a suitably sized block that matches the size and attributes requested.

Prior to AmigaOS 4.0, the OS did not make use of the CPU's memory management unit and used memory "as-is". That is, if you have different memory expansions plugged into your system, the memory will be seen as chunks located somewhere in the 4 gigabyte address space. Since version 4.0, the MMU will be used to "map" memory pages from their physical location to a virtual address. There are multiple reasons why this is better than using the verbatim physical addresses - among other things it reduces the effect of "memory fragmentation" and simplifies the possibility to swap currently unused memory pages to persistent storage such as a hard disk.

The "downside" is that the virtual address of a memory block is almost never identical to the physical address. This isn't much of a downside, since an application will never really need to care about it. If an application allocates a block of memory of n bytes, it will get a pointer back that points to at least n continuous addresses as expected. The pages that "fill" this memory block may come from different physical locations scattered throughout the physical memory but a program will never noticed that. For all intent and purpose, the application sees a single continuous block of memory.

Program Address Space

It is important to remember that, just like in classic AmigaOS, a single address space is used for all programs. Sometimes the mention of an MMU can lead people to assume that each process on the Amiga will have its own personal, partitioned address space. The following two programs demonstrate that, even though they are separate processes, it is possible to read and write another's memory. The memory locations are the same virtual address and that virtual address maps onto the same physical address.

This program has a global variable. It prints out the virtual address of the global, the value at that address (which can be optionally specified), waits for a keypress and, finally, prints out the value at that same address again in case it has been externally updated (which is done by the subsequent program listing):

#include <stdlib.h>
#include <stdio.h>

volatile char x = 127;

/* Usage: a [VAL] */
int main(int argc, char *argv[])
{
    if(argc==2)
        x = (char)atoi(argv[1]);

    printf("Virtual Address    of `x': %p\n", (void*)&x);
    printf("Dereferenced Value of `x': %d\n", x);
    (void)getchar();
    printf("Final Value        of `x': %d\n", x);
    return 0;
}

This program reads in an address as an argument and prints the value at that address even though it does not "own" the memory. By adding an additional argument, this program can also write to that "foreign" address. After this program is complete, you can press a key on the previous program and see that the value changed:

#include <stdlib.h>
#include <stdio.h>

/* Usage: b ADDR [VAL_TO_WRITE] */
int main(int argc, char *argv[])
{
    if(!(argc==2 || argc==3))
        return 10;

    volatile char *byte = (volatile char*)strtol(argv[1],NULL,16);
    printf("Selected Virtual Address    : %p\n",(void*)byte);
    printf("Dereferenced Value of Address: %d\n",*byte);
    if(argc==3)
    {
        printf("Writing Value `%d' to Address: %p\n",atoi(argv[2]),(void*)byte);
        *byte=(char)atoi(argv[2]);
    }

    return 0;
}

The same applies for stack and heap allocated memory.

Slab allocation

SlabDiagram.jpg

The AmigaOS memory architecture is based on the "slab allocator" system or "object cache". In essence, the slab allocator only allocates objects of a single size, allocating these in larger batches ("slabs") from the low-level page allocator. These slabs are then divided up into buffers of the required size, and kept within a list in the slab allocator.

Allocating an object with the slab allocator becomes a process of simple node removal: the first node in the first slab containing free nodes is removed and returned for use. Since the slab allocator keeps free slabs or partially free slabs in a separate list from the full slabs, this operation can be carried out in constant time. Freeing memory is accomplished by returning the buffer to its cache, and adding it to its original slab's free list. Slabs that are completely free can be returned to the system's page pool (this operation is actually driven by demand, and timestamps are used to avoid unnecessary loading of data, or "thrashing"). External fragmentation is minimal, and internal fragmentation is controlled and guaranteed not to exceed a certain amount.

Object caching

The slab allocator can also be used to cache objects. In the real world a lot of memory allocation operations will be used to allocate the same object. The system has a number of data structures which are allocated frequently (semaphores, message ports and the like). Every time such structures are allocated, they must be initialised, and when they are deleted again, they must be cleaned up. It's likely, however, that such a structure will be needed again in the future, so that it can be kept in its initialised state and re-used later. This further reduces the load on the allocator routines, and thus improves system performance.

The object caches work on memory that has already been mapped into the virtual memory space.

More Advantages

Another advantage is the possibility to improve CPU cache usage. Usually, most objects have "hot spots", i.e. they have a few fields that are used often. Since most of the time a little memory is left unused in a slab (the object size might not be a multiple of the slab size), this additional memory can be used to "shift" the hot spots by a few bytes to optimise the memory structure, leading to better cache usage.

Finally, the system can be expanded to multiple CPUs with next to no overhead. On multi-CPU system, these expanded slab allocators scale almost linearly with the number of CPUs employed, making it the ideal choice for such systems.

The combination of object caching and keeping caches for different memory blocks (for AllocVec/FreeVec emulation) makes the memory management more efficient, faster, and generally more future-proof than the old free list approach used in AmigaOS 3.x and earlier.

See Wikipedia for more information on slab allocator systems.

Physical page allocation

Every memory location in a computer system has its own, unique address. That is, there is a byte location at address x where you can store and retrieve a single byte. This address is fixed; there is no way to change it without physically changing the hardware. Therefore, this address is called the “physical” address.

The physical address of a memory page is most often completely irrelevant to the application. The CPU will typically only see the virtual address. AmigaOS will take care of assigning virtual addresses to memory paging. This is often called “mapping” a page. Only in very special cases will the physical address be relevant. For example, device drivers that want to pass memory to a hardware device via DMA. Since the MMU is part of the CPU, any external hardware like an IDE controller will not see the virtual but only physical addresses.

A common operation in memory allocation is the assignment of virtual addresses to physical memory locations. Allocation of physical memory is usually done differently from virtual allocations, since it's necessary to free up only part of the allocation (when for example the pager kicks in).

The de-facto standard in allocation of physical pages is a method invented by Knuth, called the "buddy system". Basically every modern operating system uses it and AmigaOS is no exception.

Buddy systems are in essence size-segregated free lists. To allocate, the system searches for a free block of at least the size of the allocation. Then, if the block is too large, it's split into two even-sized blocks. These blocks are called "buddies". One block is returned to it's appropriate free list, and the other is considered further, maybe splitting it further until it's size matches that of the allocation.

In a buddy system, it's easy to determine whether the "buddy" is free or not, because it's address can simply be decided based on the address of the block to be freed.

Virtual address space allocation

Most CPUs come with a special unit that is called a "memory management unit" or "MMU" for short. The MMU's primary job is to rearrange the physical memory within the 4 gigabytes of address space in a way that is convenient for the operating system and/or applications. To do that effectively, it divides the memory into blocks (called "pages"). For every page the MMU has an entry in a table that specifies where the CPU should "see" this page, and what special attributes the page has. The address where the CPU "sees" this page is a 32 bit address as well, but since the memory is not really located there, we call that a "virtual" address.

AmigaOS uses a resource map allocator for allocating virtual address space. Basically this is a means of managing a set of resources (not necessarily memory). For performance reason, it uses several optimization techniques.

For one, all free resource blocks are held in space-segregated lists, i.e. there is a list for each power-of-two resource group. This makes allocations a lot faster by providing an upper and lower bound for a search. For example, if you want to allocate a block of 2^10 bytes, you can basically skip searching any block below 2^10 bytes in size simply because it won't fit. Similarly, you don't need to search for blocks that are larger than, say, twice the size of the block, since there might still be blocks of a size near to what we need. Size-segregated free lists help narrow down the search, making the search itself faster and the result better in terms of fragmentation.

In addition, the resource maps use object caches for accelerating "small" allocations. Most allocations are below a certain size. For example, the virtual addresses are always allocated in chunks of at least one "page" in memory (4096 bytes). So it's common to allocate blocks of one, two, four, or eight pages. The object caches provide an easy method for keeping these common sizes, making every allocation of these sizes an exact fit, further reducing fragmentation.

Page cache

A lot of the time spent in allocating memory was spent looking for the appropriate pages in memory. A 256 MB memory system has 65536 4KB pages. These pages have to be searched for from time to time. Originally, hash tables were used but it turned out that distributing 65536 page entries over a few hash buckets still produced lists of several thousand pages that had to be traversed to find a page. The hash table was replaced with a radix tree. These trees are rather broad, but shallow, making traversal very fast. In usual circumstances, the tree does not grow more than 4 to 5 levels in depth, making searching of a page a matter of maximum 4 to 5 compare operations.

Pager

AmigaOS has the possibility to swap out parts of memory to disk in order to free up more memory for other applications. This feature allows applications to use more memory than is actually physically installed in the system.

Paging is commonly referred to as "virtual memory" by users and sometimes even software developers. The fact AmigaOS uses virtual memory does not imply the use of the pager.

The system can be tuned to different strategies, either page out only on demand (for highly interactive tasks), or based on other needs (lots of free memory in core for disk caches etc.).

The optimized data structures allow the memory system to operate at a very high speed.

The time for a memory allocation is now in the order of a few microseconds. This is especially true for small allocation (below 8096 bytes). During system testing it was observed that by the time the system has booted up to Workbench, there have already been 40,000 allocations to the global memory pool below 2096 bytes.

Memory Functions

Normally, an application uses the AllocVecTags() function to ask for memory:

void *AllocVecTags(uint32 size, uint32 tag1, ...);

The size argument is the amount of memory the application needs and the tag list specifies the type of memory and any special memory characteristics (described later). If AllocVecTags() is successful, it returns a pointer to a block of memory. The memory allocation will fail if the system cannot find a big enough block with the requested attributes. If AllocVecTags() fails, it returns NULL.

Because the system only keeps track of how much free memory is available and not how much is in use, it has no idea what memory has been allocated by any task. This means an application has to explicitly return, or deallocate, any memory it has allocated so the system can reuse it. If an application does not return a block of memory to the system, the system will not be able to reallocate that memory to some other task. That block of memory will be lost until the Amiga is reset. If you are using AllocVecTags() to allocate memory, a call to FreeVec() will return that memory to the system:

void FreeVec(void *memoryBlock);

Here memoryBlock is a pointer to the memory block the application is returning to the system. The size of the memory block is tracked internally by the system.

Unlike some compiler memory allocation functions, the Amiga system memory allocation functions return memory blocks that are at least longword aligned. This means that the allocated memory will always start on an address which is at least evenly divisible by four. This alignment makes the memory suitable for any system structures or buffers which require word or long word alignment, and also provides optimal alignment for stacks and memory copying.

Memory Types

There are three primary types of memory in AmigaOS which are summarized in the following table:

Type Use
MEMF_PRIVATE This memory is private and only accessible within the context of the Task which allocated it. Private memory should always be preferred. Private memory is also swappable by default (i.e. not locked). This memory will not be visible to any other address space.

In a future version of AmigaOS, it is planned to have Task specific address spaces. This means each task could potentially address up to 4 GB of private memory each.

MEMF_SHARED The memory is shared and accessible by any Task in the system without restriction. This memory can be shared between all address spaces and will always appear at the same address in any address space. Shared memory is locked by default and thus is not swappable.
MEMF_EXECUTABLE The memory is used to store executable PowerPC code. This is used two-fold in AmigaOS. First, it allows the system to determine if a function pointer points to real native PowerPC code as opposed to 68k code which needs to be emulated. Second, it prevents common exploits that use stack overflows to execute malicious code. Executable memory is locked by default and thus is not swappable.

Memory Attributes

Memory allocation on AmigaOS has traditionally been rather complicated. Over time, as new hardware models were released, more memory attribute flags were introduced.

All the various memory allocation functions and strategies have been consolidated and distilled into a single AllocVecTags() function call. Programmers are strongly encouraged to stop using any other function to allocate system memory. Using AllocVecTags() is the only way to guarantee future compatibility with more advanced AmigaOS features yet to come.

If an application does not specify any attributes when allocating memory, the system tries to satisfy the request with the fastest memory available on the system memory lists.

Make Sure You Have Memory
Always check the result of any memory allocation to be sure the type and amount of memory requested is available. Failure to do so will lead to trying to use an non-valid pointer.

Using Tags

The AllocVecTags() function uses a tag list to define what attributes a block of memory must have. The currently supported tags are listed below.

Tag (Default) Description
AVT_Type

(MEMF_PRIVATE)

MEMF_PRIVATE: Allocate from the task private heap. This memory will not be visible to any other address space.
MEMF_SHARED: Allocate from the system shared heap. This memory can be shared between all address spaces and will always appear at the same address in any address space. This memory is locked by default (see AVT_Lock tag below).
MEMF_EXECUTABLE: Allocate memory that is marked executable. This memory is locked by default (see AVT_Lock tag below).
AVT_Contiguous

(FALSE)

Memory allocated with this property is allocated from a contiguous block of physical memory. This makes the memory suitable for DMA purposes when the DMA device does not support scatter/gather operation. For devices that do support scatter/gather use StartDMA() instead.

Note: Physical pages can move at any time, be removed from memory due to paging or otherwise made unavailable unless the memory pages are locked (see LockMemory() function or AVT_Lock tag).

AVT_Lock

(TRUE or FALSE)

After allocating memory, lock the associated pages in memory. This will prevent the pages from being moved, swapped out or otherwise being made unavailable. This is useful in conjunction with the AVT_Contiguous tag since it will ensure that the memory will stay contiguous after allocation.

This tag defaults to FALSE for MEMF_PRIVATE allocations and to TRUE for MEMF_SHARED or MEMF_EXECUTABLE.

AVT_Alignment

(16)

Define an alignment constraint for the allocated memory block. The returned memory block will be aligned to the given size (in bytes). It's virtual address will be at least a multiple of the AVT_Alignment value.

Note: Alignment values must be powers of two. MEMF_EXECUTABLE memory is always aligned to the current page size.

AVT_PhysicalAlignment

(16)

Define an alignment constraint for the allocated memory block physical address. See AVT_Alignment for more information.

Note: This functionality is mainly used for DMA drivers that require a specific alignment. Alignment values must be powers of two. MEMF_EXECUTABLE memory is always aligned to the current page size.

AVT_ClearWithValue Clear the newly allocated memory with the given byte value. If this tag is not given the memory block is not cleared.
AVT_Wait

(TRUE)

Wait for the memory subsystem to be available. If TRUE, the calling task will be retired until the memory subsystem is available. This might cause a Forbid() to break. When FALSE and the memory subsystem is not currently available, the function will return immediately and the allocation will fail.
AVT_NoExpunge

(FALSE)

If allocation fails because of unavailability, prevent invocation of low memory cleanup handlers. The default is FALSE which means that when memory is not available, cleanup handlers are invoked to try and satisfy the allocation.

Using Flags

The use of memory flags was the only supported way to allocate memory prior to AmigaOS 4.0. It may be still useful to know more about this obsolete system for porting older applications. For more information about the memory flags system see Obsolete Exec Memory Allocation.

Allocating System Memory

The following examples show how to allocate memory.

APTR apointer = IExec->AllocVecTags(100, TAG_END);
 
if (apointer == NULL)
    {  /* COULDN'T GET MEMORY, EXIT */ }

AllocVecTags() returns the address of the first byte of a memory block that is at least 100 bytes in size or NULL if there is not that much free memory. Because there are no tags specified, private memory is assumed so this memory cannot be shared with any other tasks.

In addition to allocating a block of memory, this function keeps track of the size of the memory block, so your application doesn't have to remember it when it deallocates that memory block. The AllocVecTags() function allocates a little more memory to store the size of the memory allocation request.

Make No Assumptions
It is not legal to peek the longword in front of the returned memory pointer to find out how big the block is. This has always been illegal regardless of what any other documentation may have stated to the contrary. Your application has access to the memory returned by AllocVecTags() and the only guarantee made is that the returned block has n bytes of continuous address space starting at the returned address. Anything outside this area is not to be touched.
APTR anotherptr = IExec->AllocVecTags(1000,
  AVT_Type, MEMF_SHARED,
  AVT_Lock, FALSE,
  AVT_ClearWithValue, 0,
  TAG_END);
 
if (anotherptr == NULL)
    {  /* COULDN'T GET MEMORY, EXIT */ }

The example above allocates shared memory which is accessible by any task in the system and clears the memory contents to zero. MEMF_SHARED memory is locked by default for compatibility with the obsolete MEMF_PUBLIC flag. This is by far the most common case when allocating shared memory.

APTR lockedmem = IExec->AllocVecTags(3000,
  AVT_Type, MEMF_SHARED,
  TAG_END);
 
if (lockedmem == NULL)
    {  /* COULDN'T GET MEMORY, EXIT */ }

The example above allocated shared memory which is accessible by any task in the system. MEMF_SHARED memory is locked by default so the underlying memory pages are not moveable and cannot be swapped out. Such memory could be used to share memory between a Process and an interrupt routine for example.

If the system free memory list does not contain enough contiguous memory bytes in an area matching your requirements, AllocVecTags() returns a zero. You must check for this condition.

APTR yap = IExec->AllocVec(500, MEMF_CHIP);
 
if (yap == NULL)
    {  /* COULDN'T GET MEMORY, EXIT */ }

The deprecated AllocVec() function is used in the example above because it is the only way to allocate MEMF_CHIP memory on a classic Amiga system.

Locking System Memory

The LockMem() function is used to explicitly lock a memory block. This function will make sure your memory block is not unmapped, swapped out or somehow made inaccessible. Use this function wisely. If there is no good reason to lock memory then do not do it. It will prevent the memory system from optimizing memory layout and may lead to poor performance.

IExec->LockMem(mem, 1000);

Before deallocating the memory is must be unlocked as well.

IExec->UnLockMem(mem, 1000);
Do not forget to UnlockMem()
Failure to unlock memory will have adverse affects on the memory system. Locked memory pages cannot be moved so the system cannot optimize the layout of locked memory pages.

Be careful to always match the number of locks and the number of unlocks. If something else may have locked memory in the same page and an extraneous UnlockMem() decreases the lock reference counter to 0, that page can be paged out or moved. If that happens, for example, with data used in an interrupt handler or by the device driver which handles the swap partition the system will crash.

Freeing System Memory

The following examples free the memory chunks shown in the previous calls to AllocVecTags().

IExec->FreeVec(apointer);
 
IExec->FreeVec(anotherptr);
 
IExec->FreeVec(lockedmem);
 
IExec->FreeVec(yap);

A memory block allocated with AllocVecTags() or AllocVec() must be returned to the system pool with the FreeVec() function. This function uses the stored size in the allocation to free the memory block, so there is no need to specify the size of the memory block to free.

Any memory that is locked must be explicitly unlocked with UnlockMem() prior to freeing it. Failure to unlock memory will not cause any immediate problems but any pages which are locked cannot be moved or swapped out which can decrease system performance.

FreeVec() returns no status. However, if you attempt to free a memory block in the middle of a chunk that the system believes is already free, you will cause a system crash. It is also illegal to free the same memory block twice and this will lead to a system crash.

Leave Memory Allocations Out Of Interrupt Code
Do not allocate or deallocate system memory from within interrupt code. The Exec Interrupts section explains that an interrupt may occur at any time, even during a memory allocation process. As a result, system data structures may not be internally consistent at this time.

Memory may be Locked by Default

By default, the system memory allocation routines will allocate MEMF_SHARED and MEMF_EXECUTABLE memory and implicitly lock it. This memory must not be explicitly unlocked. Let the system handle the unlocking internally.

Here is the right way to allocate and free MEMF_SHARED memory:

APTR ptr = IExec->AllocVecTags(10,
           AVT_Type, MEMF_SHARED,
           TAG_END);
IExec->FreeVec(ptr);

Here is the wrong way:

APTR ptr = IExec->AllocVecTags(10,
           AVT_Type, MEMF_SHARED,
           TAG_END);
IExec->UnlockMem(ptr, 10);  // Doing this could lead to undefined system behaviour.
IExec->FreeVec(ptr);

Unlocking memory not explicitly locked does not immediately cause any issues. However, in a future version of AmigaOS the memory pages may be handled differently than they are handled today. This could lead to incompatibilities with your applications and AmigaOS. The simple rule is that if you called LockMem() you must also call UnlockMem(). All other times you should not call UnlockMem().

Note
Memory locking is done at the page level. Even if a single byte of memory is locked in a page that entire page is locked. The programmer has no control over which pages may be used to satisfy a memory allocation.
Note
Shared and executable memory is locked implicitly for 68K backwards compatibility reasons. Use AVT_Lock (or equivalent) to ensure your shared and executable memory is not locked. MEMF_PRIVATE memory has no backwards compatibility issues and is always unlocked by default.

Memory Information Functions

The memory information routines AvailMem() and TypeOfMem() can provide the amount of memory available in the system, and the attributes of a particular block of memory.

Memory Requirements

The same attribute flags used in memory allocation routines are valid for the memory information routines. There is also an additional flag, MEMF_LARGEST, which can be used in the AvailMem() routine to find out what the largest available memory block of a particular type is. Specifying the MEMF_TOTAL flag will return the total amount of memory currently available.

Calling Memory Information Functions

The following example shows how to find out how much memory of a particular type is available.

uint32 size = IExec->AvailMem(MEMF_CHIP | MEMF_LARGEST);

AvailMem() returns the size of the largest chunk of available chip memory.

AvailMem() May Not Be Totally Accurate
Because of multitasking, the return value from AvailMem() may be inaccurate by the time you receive it.

The following example shows how to determine the type of memory of a specified memory address.

uint32 memtype = IExec->TypeOfMem((APTR)0x090000);
if ((memtype & MEMF_CHIP) == MEMF_CHIP) {  /*  ... It's chip memory ...  */   }

TypeOfMem() returns the attributes of the memory at a specific address. If it is passed an invalid memory address, TypeOfMem() returns NULL. This routine is normally used to determine if a particular chunk of memory is in chip memory.

Using Memory Copy Functions

For memory block copies, the CopyMem() and CopyMemQuick() functions can be used.

Copying System Memory

The following samples show how to use the copying routines.

APTR source = IExec->AllocVecTags(1000, AVT_ClearWithValue, 0, TAG_END);
APTR target = IExec->AllocVecTags(1000, AVT_Type, MEMF_SHARED, TAG_END);
IExec->CopyMem(source, target, 1000);

CopyMem() copies the specified number of bytes from the source data region to the target data region. The pointers to the regions can be aligned on arbitrary address boundaries. CopyMem() will attempt to copy the memory as efficiently as it can according to the alignment of the memory blocks, and the amount of data that it has to transfer. These functions are optimized for copying large blocks of memory which can result in unnecessary overhead if used to transfer very small blocks of memory.

CopyMemQuick() is now identical to CopyMem(). In previous versions of the operating system, CopyMemQuick() performed more optimized copying of the specified number of bytes from the source data region to the target data region. The source and target pointers must be longword aligned and the size (in bytes) must be divisible by four. There are no such restrictions starting with AmigaOS 4.0.

Not All Copies Are Supported
Neither CopyMem() nor CopyMemQuick() supports copying between regions that overlap. For overlapping regions see MoveMem() in the Utility Library.

System Memory Pools

A construct carried over from the AmigaOS 3.x Exec is the memory pool. The code handling memory pools uses an algorithm based on boundary tags and size-segregated memory lists. The speed gain is tremendous: even for the "dumb" case of just allocating 100000 blocks and freeing them again, the speed is ten times faster than the previous implementation. Due to the size-segregated free lists and the easy coalescing due to the boundary tags, real-life performance gain is even higher and would be between 10 and 20 times.

Two types of memory pools are available depending on the needs of the application:

  • Item Pools are built for speed and are used for allocating large numbers of items that are all the same size.
  • Generic Memory Pools can handle allocations of different sizes at the expense of some speed.

Summary of System Controlled Memory Handling Routines

AllocVecTags() and FreeVec()
These are system-wide memory allocation and deallocation routines. They use a memory free-list owned and managed by the system.
LockMem() and UnlockMem()
These routines explicitly lock and unlock underlying memory pages.
AvailMem()
This routine returns the number of free bytes in a specified type of memory.
TypeOfMem()
This routine returns the memory attributes of a specified memory address.
CopyMem() and CopyMemQuick()
CopyMem() is a general purpose memory copy routine. CopyMemQuick() is an optimized version of CopyMemQuick(), but has restrictions on the size and alignment of the arguments.

Allocating DMA Memory

Device drivers often use DMA to transfer data to and from the device. The StartDMA(), GetDMAList() and EndDMA() functions are used to create a scatter/gather list suitable for such DMA transfers.

The following conditions are guaranteed to be met when using these DMA functions:

  • The memory region given is guaranteed to be mapped to physical memory.
  • The mapping will not change as long as EndDMA() is not called.
  • All cache entries in this region will be flushed out.

The example code below demonstrates how to perform a DMA write transfer:

APTR addr;
uint32 size;
 
/* Tell the system to prepare for DMA */
uint32 arraySize = IExec->StartDMA(addr, size, 0);
if (arraySize > 0)
{
    /* The memory area is prepared, allocate and retrieve the DMA list */
    struct DMAEntry *DMAList = IExec->AllocSysObject(ASOT_DMAENTRY,
      ASODMAE_NumEntries,  arraySize,
      TAG_END);
 
    if (DMAList != NULL)
    {
        IExec->GetDMAList(addr, size, 0, DMAList);
 
        /* Feed the DMA controller and do stuff */
        ...
        /* Get rid of the DMAList's memory */
 
        IExec->EndDMA(addr, size, endFlags);
        IExec->FreeSysObject(ASOT_DMAENTRY, DMAList);
    }
    else
    {
        // Note We still call EndDMA() even though the actual
        // transfer didn't happen.
        IExec->EndDMA(addr, size, DMAF_NoModify);
 
        IDOS->Printf("Can't allocate DMA list\n");
    }
}
else
{
    IDOS->Printf("Can't initiate DMA transfer\n");
}

Syncing and Memory Access

The PowerPC is a pure load/store architecture. That means, it will never operate on memory arguments like x86, to modify data, you have to load, modify and store.

All PowerPC/POWER cpus have a load/store queue, i.e. a queue where load and store operations are queued to reduce memory latency. If more than one store is made to the same long word (i.e. you write the first byte and then the second byte), then those stores are combined. As you can see, this will cause a problem for a chip register since both are written at the same time which is likely not what you want.

Similar for reading: A read might shortcut through the load/store queue and use a value that's already been read and is present in the load/store queue. Again, for chip registers, this will cause a problem because you might read an old value.

There are two instructions that deal with this: eieio (Ensure In-order Execution of I/O) and sync.

The eieio instruction simply inserts a "barrier" into the load/store queue. When combining stores the CPU never searches past this barrier for possible combines. This means that the sequence

  1. store to some address
  2. eieio
  3. store to some address + 1

will never be combined because of the eieio barrier between them.

The sync instruction will simply halt all execution and flush the load/store queue, executing and finishing all loads and stores.

As you can imagine, sync is MUCH MORE costly than eieio.

When to use eieio and sync

If you want to write, post fix your writes with an eieio instruction.

If you want to read, prefix your reads with a sync instruction.

This is also where the "GUARDED" memory flag comes in. GUARDED memory simply ensures program order of loads and stores, that is, it's an implicit eieio.

Note
Knowing when and how to use the eieio and sync instructions is somewhat vital for driver developers.

Allocating Multiple Memory Blocks

Exec provides the routines AllocTaskMemEntry() and FreeEntry() to allocate multiple memory blocks in a single call.

AllocTaskMemEntry() accepts a data structure called a MemList, which contains the information about the size of the memory blocks to be allocated and the requirements, if any, that you have regarding the allocation.

The MemList structure is found in the include file <exec/memory.h> and is defined as follows:

struct MemList
{
    struct Node     ml_Node;
    UWORD           ml_NumEntries;      /* number of MemEntrys */
    struct MemEntry ml_ME[1];           /* where the MemEntrys begin*/
};
ml_Node
allows you to link together multiple MemLists. However, the node is ignored by the routines AllocTaskMemEntry() and FreeEntry().
ml_NumEntries
tells the system how many MemEntry sets are contained in this MemList. Notice that a MemList is a variable-length structure and can contain as many sets of entries as you wish.

The MemEntry structure looks like this:

struct MemEntry
{
    union {
        ULONG   meu_Reqs;   /* the AllocMem requirements */
        APTR    meu_Addr;   /* address of your memory */
        } me_Un;
    ULONG   me_Length;      /* the size of this request */
};

Sample Code for Allocating Multiple Memory Blocks

Here's an example of showing how to use the AllocTaskMemEntry() with multiple blocks of memory.

// alloctaskmementry.c - example of allocating several memory areas.
#include <exec/types.h>
#include <exec/memory.h>
#include <proto/exec.h>
#include <proto/dos.h>
 
struct MemList *memlist;             /* pointer to a MemList structure        */
 
struct MemBlocks /* define a new structure because C cannot initialize unions */
{
    struct MemList  mn_head;         /* one entry in the header               */
    struct MemEntry mn_body[3];      /* additional entries follow directly as */
} memblocks;                         /* part of the same data structure       */
 
int main()
{
    memblocks.mn_head.ml_NumEntries = 4; /* 4! Since the MemEntry starts at 1! */
 
    /* Describe the first piece of memory we want.  Because of our MemBlocks structure */
    /* setup, we reference the first MemEntry differently when initializing it.        */
    memblocks.mn_head.ml_ME[0].me_Reqs   = MEMF_CLEAR;
    memblocks.mn_head.ml_ME[0].me_Length = 4000;
 
    memblocks.mn_body[0].me_Reqs   = MEMF_PRIVATE | MEMF_CLEAR;/* Describe the other pieces of    */
    memblocks.mn_body[0].me_Length = 100000;                   /* memory we want. Additional      */
    memblocks.mn_body[1].me_Reqs   = MEMF_SHARED | MEMF_CLEAR; /* MemEntries are initialized this */
    memblocks.mn_body[1].me_Length = 200000;                   /* way. If we wanted even more en- */
    memblocks.mn_body[2].me_Reqs   = MEMF_EXECUTABLE;          /* tries, we would need to declare */
    memblocks.mn_body[2].me_Length = 25000;                    /* a larger MemEntry array in our  */
                                                               /* MemBlocks structure.            */
 
    memlist = (struct MemList *)IExec->AllocTaskMemEntry((struct MemList *)&memblocks);
 
    if (memlist == NULL)
    {
       IDOS->Printf("AllocTaskMemEntry FAILED\n");
       return RETURN_FAIL;
    }
 
    /* We got all memory we wanted.  Use it and call FreeEntry() to free it */
    IDOS->Printf("AllocTaskMemEntry succeeded - now freeing all allocated blocks\n");
    IExec->FreeEntry(memlist);
 
    return 0;
}

AllocTaskMemEntry() returns a pointer to a new MemList of the same size as the MemList that you passed to it. For example, ROM code can provide a MemList containing the requirements of a task and create a RAM-resident copy of the list containing the addresses of the allocated entries. The pointer to the MemList is used as the argument for FreeEntry() to free the memory blocks.

Result of Allocating Multiple Memory Blocks

The MemList created by AllocTaskMemEntry() contains MemEntry entries. MemEntrys are defined by a union statement, which allows one memory space to be defined in more than one way.

If AllocTaskMemEntry() returns a non-NULL value then all of the meu_Addr positions in the returned MemList will contain valid memory addresses meeting the requirements you have provided. To use this memory area, you would use code similar to the following:

struct MemList *mlist = IExec->AllocTaskMemEntry(&ML);
APTR memory = NULL;
 
if ( mlist != NULL )
{
  memory = mlist->ml_ME[0].me_Un.meu_Addr;
}

Multiple Memory Blocks and Tasks

If you want to take advantage of Exec's automatic cleanup, use the MemList and AllocTaskMemEntry() facility to do your dynamic memory allocation.

In the Task control block structure, there is a list header named tc_MemEntry.

This is the list header that you initialize to include MemLists that your task has created by call(s) to AllocTaskMemEntry(). Here is a short program segment that handles task memory list header initialization only. It assumes that you have already run AllocTaskMemEntry() as shown in the simple AllocTaskMemEntry() example above.

struct MemList *ml;
 
struct Task *tc = IExec->FindTask(0);
 
IExec->AddTail(tc->tc_MemEntry, ml);

Assuming that you have only used the AllocTaskMemEntry() method (or AllocVecTags() and built your own custom MemList), the system now knows where to find the blocks of memory that your task has dynamically allocated. The RemTask() function automatically frees all memory found on tc_MemEntry.

CreateTask() Sets Up A MemList.
The CreateTask() function, and other system task and process creation functions use a MemList in tc_MemEntry so that the Task structure and stack will be automatically deallocated when the Task is removed.

Summary of Multiple Memory Blocks Allocation Routines

These are routines for allocating and freeing multiple memory blocks with a single call.

This routine initializes memory from data and offset values in a table. Typically only assembly language programs benefit from using this routine. See the SDK for more details.

Allocating Memory at an Absolute Address

For special advanced applications, AllocAbs() is provided. Using AllocAbs(), an application can allocate a memory block starting at a specified absolute memory address. If the memory is already allocated or if there is not enough memory available for the request, AllocAbs() returns a zero.

Be aware that an absolute memory address which happens to be available on one Amiga may not be available on a machine with a different configuration or different operating system revision, or even on the same machine at a different times. For example, a piece of memory that is available during expansion board configuration might not be available at earlier or later times. Here is an example call to AllocAbs():

APTR absoluteptr = (APTR)IExec->AllocAbs(10000, 0x2F0000);
if (!(absoluteptr))
    { /* Couldn't get memory, act accordingly. */  }
 
/* After we're done using it, we call FreeMem() to free the memory block. */
IExec->FreeMem(absoluteptr, 10000);

Function Reference

The following are brief descriptions of the Exec functions that handle memory management. See the SDK for details on each call.

Memory Function Description
AllocMem() Allocate memory with specified attributes. This function is obsolete.
AllocAbs() Allocate memory at a specified location.
AllocTaskMemEntry() Allocate multiple memory blocks.
AllocVec() Allocate memory with specified attributes and keep track of the size. This function is obsolete.
AllocVecTags() Allocate memory with specified attributes defined by tags and keep track of the size. If an application needs to allocate some memory, it will usually use this function.
AvailMem() Return the amount of free memory, given certain conditions.
CopyMem() Copy memory block, which can be non-aligned and of arbitrary length.
CopyMemQuick() Copy aligned memory block.
FreeEntry() Free multiple memory blocks, allocated with AllocTaskMemEntry().
FreeMem() Free a memory block of specified size, allocated with AllocMem() or AllocAbs().
FreeVec() Free a memory block allocated with AllocVecTags() or AllocVec().
InitStruct() Initialize memory from a table.
LockMem() Lock the underlying pages given a memory block address and size.
TypeOfMem() Determine attributes of a specified memory address.
UnlockMem() Unlock the underlying pages given a memory block address and size.