This article discusses the use of direct memory access (DMA) in embedded systems programming and how DMA interacts with peripherals and memory modules to make processors work more efficiently.

This article explains the use cases, advantages, and disadvantages of using direct memory access (DMA) in embedded systems programming. The article describes how DMA interacts with peripherals and memory modules to make processors work more efficiently. The article will also introduce the reader to different DMA bus access architectures and the advantages of each.

One task that is common to embedded systems is controlling an external input. Input control can put a lot of unnecessary computational strain on the processor, causing longer periods in active power modes and slow response times. To optimize power, maintain fast event responses, and manage large continuous data transfers, a direct memory access (DMA) microcontroller can offer the best solution.

Direct Memory Access (DMA)

In system applications involving peripherals, there are many points at which the microprocessor can become bogged down. For example, when driving an ADC that is continuously sending data, the CPU may be interrupted so often that it struggles to perform other tasks. DMA is a method of moving data and minimizing CPU involvement in large or fast data transactions. You can think of the DMA controller as a coprocessor whose sole purpose is to interface with memory and peripherals. This allows the main processor to successfully manage a greedy peripheral, focus on another task, or even go to sleep and conserve power while data transactions take place in the background. For example, on Arm® architectures, a DMA module can operate during LP2 (sleep) or LP3 (work) modes. This can give a distinct advantage in applications that require extended battery life, such as sensor wearable hubs and smartwatches.

Advantages and disadvantages

DMA is useful in many digital systems and is sometimes even required to handle large amounts of bus traffic. It was used in network cards, graphics cards, and even some of the original IBM computers. That being said, incorporating DMA into a design has some trade-offs.

Table 1. Advantages of using DMA

CPU time DMA minimizes the need for CPU execution and interrupts,

reducing the processing time required for data transactions.

energy consumption Using DMA can provide opportunities to minimize power consumption by allowing the CPU to sleep during DMA transfers.
Parallel work Depending on the architectural details of the system bus, the processor may be able to perform other operations while peripheral transactions are in progress.

Table 2. Disadvantages of using DMA

price Including a system with DMA requires a DMA controller and this can make the system more expensive.
Complexity While DMA can reduce interrupt frequency, it can increase the size and complexity of application firmware.
Platform dependency DMA controllers have different internal architectures between and within manufacturers and can have different behavior depending on their own bus access schemes.
Cache incoherence DMA transactions can cause logic errors to occur by writing to a cached layer of the memory hierarchy. This can be resolved by using coherent cache system architectures or by invalidating the cache after DMA is complete.

Bus access and CPU cycles

While DMA controllers can be incredibly efficient at saving power or speeding up embedded systems, their implementation is not highly standardized. There are multiple schemes to ensure that access to the internal bus is not granted at the same time as the CPU. The purpose of the bus access scheme is primarily to avoid simultaneous access to the same memory locations, which can lead to cache incoherence and logic errors. A single DMA controller will typically be configured to use one of these schemes, as different hardware or firmware control may be required to use each. The bus access schemes used by most DMA controllers are burst, cycle-stealing, and transparent DMA.

Transparent DMA can only perform one operation at a time, but must also wait for the processor to execute instructions that grant access to the desired data or address buses. Additional logic is required to check this access restriction, and this type of DMA is usually the slowest. Transparent DMA can be advantageous in applications where there is additional processing that does not require access to the memory buses. The advantage in this case would be to remove CPU throttling, since the CPU doesn’t have to stop working completely.

Table 3. Summary of DMA types and their pros/cons

DMA type Professionals cons
DMA destruction The fastest type of DMA Relatively long periods of CPU inactivity
DMA to steal a cycle The processor is not idle for long

adjacent periods

Slower than DMA burst
Transparent DMA No CPU usage throttling required The slowest form of DMA

Figure 1. Architecture diagram of packet DMA during DMA operations.

Burst DMA occurs through infrequent large bursts where the DMA controller sends as much data to the target buffer as the buffer can hold. The DMA controller blocks the processor for a very short period of time to move a large chunk of memory, and then releases the bus back to the main processor, repeating until the transfer is complete. Burst DMA is generally considered the fastest type.


Figure 2. A DMA cycle steal during DMA operations occurs between two processor cycles.

In contrast, a single-byte transfer or cycle-stealing DMA is processor-oriented and only performs operations between processor instructions. It inserts an operation between two CPU cycles and thus effectively “steals” CPU time. Due to the limitation of performing one operation at a time, it is usually slower than batch DMA.


Figure 3. Transparent DMA during DMA operations occurs while the processor is working on tasks that do not have access to the data or address buses.

An example of a Burst DMA architecture


Figure 4. Architecture diagram of the MAX32660 DMA controller.

An example of a packet DMA controller can be found at MAX32660 (see Figure 4). The upper path corresponds to the data flow and the lower path represents the control/status flow between the Advanced High Performance Bus (AHB) and the DMA logic. The DMA controller can act as a buffer interface between the AHB and memory or peripherals, depending on how it is configured. DMA logic resides between the DMA buffer and each peripheral to independently manage each unique peripheral bus during transactions. A DMA operation can move up to 32 bytes at a time, provided the source/destination buffers can hold that much data. The buffer can hold up to 16 MB and can be configured for I2C, SPI, I2S and UART transmit or receive in addition to internal memory transfers. DMA control programming may vary slightly between protocols, but peripheral transactions are exclusively controlled by the DMA controller. An arbiter module controls bus access restrictions between the four DMA channels and the processor, granting requests according to a priority system.

Modern DMA options

In summary, DMA is a critical feature for modern embedded systems that manage an abundance of sensors and require high performance, efficiency, and low-power operation. It behaves as a coprocessor dedicated exclusively to memory and peripheral bus transactions.

The use of DMA is imperative for many applications to minimize power consumption and lighten CPU loads. For example, healthcare and wearable devices deal with large amounts of data, but they also need to conserve as much battery power as possible while processing sensitive data. Analog Devices offers fast packet DMA microcontroller architectures well-equipped for low-power wearable device designs such as MAX32660 and MAX32670. In addition, DARWIN Arm microcontrollers such as MAX32666 are designed for wearable and IoT applications with integrated Bluetooth® 5. These devices have two 8-channel packet DMA controllers with integrated support for event-based transactions. They even feature best-in-class security hardware with a secure bootloader and Trust Protection Unit (TPU) to accelerate ECDSA, SHA-2 and AES encryption. From early IBM PCs to network cards, and now to secure low-power wearables and IoT devices, DMA is an essential feature of modern digital systems.

Note: All graphics courtesy of Maxim Integrated/Analog Devices.


Brandon Hurst is a hardware and firmware engineer working with the training and technical services group for Maxim Integrated, now part of Analog Devices. He graduated with a BS in Electrical Engineering from Cal Poly, San Luis Obispo and is joining Maxim in January 2021. Brandon previously interned with both Maxim’s TTS team and the Product Safety Engineering team at Apple. Inc. He can be reached at brandon .hurst@analog.com.

Related Content:

For more built-in, subscribe to Embedded’s weekly email newsletter.




Accelerating peripheral monitoring in wearables with DMA

Previous articleApple researchers developed Fairness Optimized Reweighting via Meta-Learning (FORML), a machine learning training algorithm that balances fairness and robustness with accuracy by jointly learning training sample weights and neural network parameters
Next articleSTEPN (GMT) down 16% – Is it a good time to buy GMT?