Copyright (c) Hyperion Entertainment and contributors.
DMA Resource
DMA Engine
Some hardware targets include a DMA engine which can be used for general purpose copying. This article describes the DMA engines available and how to use them.
Hardware Features
The Direct Memory Access (DMA) Engines found in the NXP/Freescale p5020, p5040 and p1022 System On a Chip (SoC)s, as found in the AmigaONE X5000/20, X5000/40 and A1222 respectively, are quite flexible and powerful. Each of these chips contains two distinct engines with four data channels each. This provides the ability to have a total of eight DMA Channels working at once, with up to two DMA transactions actually being executed at the same time (one on each of the two DMA Engines).
Further, each of the four DMA Channels found in a DMA Engine may be individually programmed to handle either; a single transaction, a Chain of transactions, or even Lists of Chains of transactions. The DMA Engines automatically arbitrate control between each DMA Channel following programmed bandwidth settings for each Channel (typically 1024 bytes).
This means that after completing a transfer of 1024 bytes (for example), the hardware will consider switching to the next Channel to allow it to move another block of data, and so on in a round-robin fashion. If all other DMA Channels on a given DMA Engine are idle when arbitration would take place, the hardware will not arbitrate control to another Channel, but simply continue processing the transaction(s) for the Channel it is on.
DMA Copy Memory - Execution Flow Diagram
What a call to perform a DMA copy does internally
As shown in the above diagram, when the user makes a call to request a memory copy be performed by the DMA hardware, the next available DMA Channel is selected for use and a DMA Transaction (source, destination and size) is constructed. The DMA Transaction is then programmed into the DMA Engine which owns the available DMA Channel.
At this point the calling task will Wait() until it hears the transaction has been completed. It will then return to the caller with the result. This provides a basic blocking function, which only returns to the caller once that data has been copied. This single tasking behavior is the simplest to use and what is normally expected by most applications using a memory copy function.
Diagram of multitasking DMA Copies
How multiple simultaneous DMA copies are handled
fsldma.resource
The fsldma.resource API is provided automatically in the kernel for all supported machines (Currently the AmigaONE X5000/20, X5000/40 and A1222).
The FslDMA API
The API provided by the fsldma.resource breaks down into three main parts:
- Memory management
- Copy Memory functions
- Utility / Miscellaneous
Memory Management Functions
The FslDMA resource API provides convenience functions for the allocation and freeing of DMA Compliant blocks of memory. Two functions are provided for this purpose; DMAAllocPhysicalMemory() and DMAFreePhysicalMemory().
The memory allocation function also includes two Tags based versions of the call to allow for addition variables to be passed into the call using a variable set of Tags or fixed TagList. Currently the only supported Tag is FSLDMA_APM_ClearWithValue, which is the equivalent to the IExec->AllocVecTagList() function's AVT_ClearWithValue tag. If the FSLDMA_APM_ClearWithValue tag is not provided then the requested memory block is cleared with zeroes by default before being returned. DMAAllocPhysicalMemory() returns the Physical Address pointer to the allocated memory or NULL if it is unable to allocate the requested memory.
APTR DMAAllocPhysicalMemoryTagList( uint32 lSize, const struct TagItem *tags ); APTR DMAAllocPhysicalMemoryTags( uint32 lSize, uint32 Tag1, ... ); APTR DMAAllocPhysicalMemory( uint32 lSize );
The corresponding function to allocating a DMA Compliant memory block is DMAFreePhysicalMemory(). Any memory allocated using DMAAllocPhysicalMemory() must be eventually freed using DMAFreePhysicalMemory().
void DMAFreePhysicalMemory( APTR pPhysicalMemoryBlock );
Copy Memory Functions
Two API calls make up the core of the FslDMA API; DMAPhysicalCopyMem() and DMACopyMem(). Both of these calls will "block" (not return to the user) until after the requested transfer has either succeeded or failed. A value of TRUE is returned upon success and FALSE if an error occurred.
BOOL DMAPhysicalCopyMem( CONST_APTR pPhysicalSourceBuffer, APTR pPhysicalDestBuffer, uint32 lBufferSize ); BOOL DMACopyMem( CONST_APTR pSourceBuffer, APTR pDestBuffer, uint32 lBufferSize );
As the name implies the first (and core version) of the copy memory functions, DMAPhysicalCopyMem(), accepts the Physical Addresses to the source and destination buffers (as returned by DMAAllocPhysicalMemory()), along with an unsigned 32-Bit value for the amount of bytes that should be copied (lBuffsize must be greater than zero(0) and no more than the maximum size of the source and destination buffers).
DMAPhysicalCopyMem() is the most direct an efficient means of using the DMA hardware to effect a single memory copy. The DMACopyMem() function is provided as an alternative way to request a DMA transfer by passing in the Virtual Addresses to the source and destination buffers instead of the Physical Addresses. DMACopyMem() will attempt to determine if the supplied memory buffers are DMA Compliant and if so it will internally use DMAPhysicalCopyMem() to handle the transaction. If DMACopyMem() determines that the supplied memory is not DMA Compliant it will return FALSE and no data copy will occur.
In theory any contiguous block of memory from 1 Byte up to 4GB in size may be transferred using either one of these calls. In practice the DMA hardware can directly accept memory blocks up to FSLDMA_MAXBLOCK_SIZE (or 64MB - 1 Byte). Therefore any data blocks greater than FSLDMA_MAXBLOCK_SIZE will automatically be feed to the DMA hardware in a series of smaller chunks. For maximum efficiency transfer sizes should be at least 256 Bytes in size and be an even multiple of 64 Bytes. Odd sizes are handled by the hardware but will degrade performance.
If you are transferring many large blocks of data in series and wish to manually send them to the FslDMA API's memory copy functions one block at a time, then setting your block sizes to FSLDMA_OPTIMAL_BLKSIZE (or 64 MB - 64 Bytes) and exclusively using DMAPhysicalCopyMem() will provide the fastest transfer speeds will the least amount of CPU overhead.
Utility / Miscellaneous Functions
The last section to the FslDMA API provides a single convenience function call; DMAGetVirtualAddress().
APTR DMAGetVirtualAddress( APTR pPhysicalAddress );
The DMAGetVirtualAddress() function returns the Virtual memory location for the Physical memory location that was itself allocated and returned by the DMAAllocPhysicalMemory() function. DMAGetVirtualAddress() will not work on physical addresses returned by IMMU->GetPhysicalAddress() on memory not allocated using DMAAllocPhysicalMemory().
The main purpose of this function (other then debugging purposes) is to provide the "normal" Virtual Address of memory allocated by the FslDMA API for use by functions that expect "normal" Virtual address locations; for example IExec->CopyMemQuick().
Further reference - The fsldma.resource AutoDoc
See the [fsldma.resource AutoDoc] file for more details on each API call.
Example usage
#include <interfaces/fsldma.h> // Obtain the fsldma.resource struct fslDMAIFace *IfslDMA = IExec->OpenResource(FSLDMA_NAME); if ( NULL != IfslDMA ) { uint32 lTestSize = 1024; CONST_APTR pPhysicalSrcAddr = NULL; APTR pPhysicalDestAddr = NULL; // Allocate the Source Buffer (for DMA) // and set the contents to 0xB3 (just an example value) pPhysicalSrcAddr = (CONST_APTR)IfslDMA->DMAAllocPhysicalMemoryTags(lTestSize, FSLDMA_APM_ClearWithValue, 0xB3, TAG_END); if ( NULL != pPhysicalSrcAddr ) { // Allocate the Destination Buffer (for DMA) // - contents will by cleared by default pPhysicalDestAddr = IfslDMA->DMAAllocPhysicalMemory(lTestSize); if ( NULL != pPhysicalDestAddr ) { // Call IfslDMA->DMAPhysicalCopyMem() // to perform the memory copy using the DMA hardware if ( TRUE == IfslDMA->DMAPhysicalCopyMem(pPhysicalSrcAddr, pPhysicalDestAddr,lTestSize) ) { // Success - Do something with the copied data } else { // Fallback and use CPU Copy instead (or do something else) // Since we allocated the memory buffers using the FslDMA API, // which returns the Physical address to that memory, and the // IExec->CopyMemQuick() function expects the Virtual address, // we first need to obtain the Virtual address for the Physical // ones (using the FslDMA API again) before we can call our // fallback copy function. CONST_APTR pVirtualSrcAddr = NULL; APTR pVirtualDestAddr = NULL; pVirtualSrcAddr = IfslDMA->DMAGetVirtualAddress((APTR)pPhysicalSrcAddr); pVirtualDestAddr = IfslDMA->DMAGetVirtualAddress(pPhysicalDestAddr); IExec->CopyMemQuick(pVirtualSrcAddr,pVirtualDestAddr,lTestSize); } // We *must* use DMAFreePhysicalMemory() to free the memory // that we allocated with DMAAllocPhysicalMemory() IfslDMA->DMAFreePhysicalMemory(pPhysicalDestAddr); } // We *must* use DMAFreePhysicalMemory() to free the memory // that we allocated with DMAAllocPhysicalMemory() IfslDMA->DMAFreePhysicalMemory((APTR)pPhysicalSrcAddr); } }
Obtaining the fsldma.resource
Breaking the above example down the first thing we do is include the interface header for the fsldma.resource and obtain the resource itself.
#include <interfaces/fsldma.h> struct fslDMAIFace *IfslDMA = IExec->OpenResource(FSLDMA_NAME);
Allocating DMA Compliant Memory
Once we have successfully obtained the DMA resource we can directly use its API. The next step we need to do is to allocate some memory which we know is DMA compliant for use by the resource. The easiest way to accomplish this is to use the IfslDMA->DMAAllocPhysicalMemory() function. This will automatically take care of ensuring the memory that is returned is properly aligned, contiguous, cache-inhibited and coherent.
We use the Tags version of the allocate memory function first so we can allocate a block of memory for our source buffer and fill it with some test value (in this case the byte value 0xB3).
pPhysicalSrcAddr = (CONST_APTR)IfslDMA->DMAAllocPhysicalMemoryTags(lTestSize, FSLDMA_APM_ClearWithValue, 0xB3, TAG_END);
We also need a destination buffer for our test. Since it only needs to start out being cleared (filled with zeroes), we can use the simplest form of the allocate memory function here as the allocated memory is cleared by default.
pPhysicalDestAddr = IfslDMA->DMAAllocPhysicalMemory(lTestSize);
The core function - Copy Physical Memory
Now that we have two DMA Compliant memory buffers available we can get to the heart of the API usage and make the call to IfslDMA->DMAPhysicalCopyMem().
IfslDMA->DMAPhysicalCopyMem(pPhysicalSrcAddr,pPhysicalDestAddr,lTestSize);
Looking at the full Example source above you can see that the DMAPhysicalCopyMem() call returns a Boolean value to indicate success or failure. Therefore, TRUE will be returned after the DMA hardware had completed copying the requested data (blocking call) and FALSE will be returned if a problem occurred.
Since the memory buffers we are using were allocated using the FslDMA API and our transfer size was greater than zero and less than or equal to the total size of either buffer, (in others words a legal copy request), there is very little chance that the DMAPhysicalCopyMem() function will fail. In fact, about the only reason the DMA hardware would fail to handle a legal transaction would be if the physical RAM installed in the system was faulty.
So even though it is extremely unlikely for our DMA copy to have failed and returned FALSE, let's take a look at how we might handle the failure. In other words, falling back and using a CPU based copy function instead to complete the data move.
Since we are reverting back to using a normal system copy memory function here, we first need to obtain the Virtual Addresses to both our source and destination buffers. This is accomplished by using the DMAGetVirtualAddress() function and passing in the Physical Address that was returned by DMAAllocPhysicalMemory().
CONST_APTR pVirtualSrcAddr = NULL; APTR pVirtualDestAddr = NULL; pVirtualSrcAddr = IfslDMA->DMAGetVirtualAddress((APTR)pPhysicalSrcAddr); pVirtualDestAddr = IfslDMA->DMAGetVirtualAddress(pPhysicalDestAddr);
Now that we have the Virtual Address equivalents to the Physical Addresses that were returned by DMAAllocPhysicalMemory(), we can proceed to call the Exec CopyMemQuick() function to complete the copy. Here we can only trust that the IExec->CopyMemQuick() call can not fail since it does not return a result code.
IExec->CopyMemQuick(pVirtualSrcAddr,pVirtualDestAddr,lTestSize);
Cleaning up - Freeing DMA Compliant memory
It is essential that you free any memory that was allocated using the DMAAllocPhysicalMemory() function using the corresponding DMAFreePhysicalMemory() call. The reason for this is two-fold; first because additional resource tracking is maintained by the FslDMA API whenever it allocates memory which must itself be released, and second because the Physical Address to the memory is returned by the allocate call and not the Virtual Address, so if you attempt to pass the address returned by DMAAllocPhysicalMemory() directly into IExec->FreeVec() it would likely result in a crash.
IfslDMA->DMAFreePhysicalMemory(pPhysicalSrcAddr); IfslDMA->DMAFreePhysicalMemory(pPhysicalDestAddr);