Jump to: navigation, search


Revision as of 13:40, 29 June 2020 by Elias (talk | contribs) (Shared memory interface)

AR100 is an OpenRISK 1000 implemented in a range of Allwinner SoCs. This page is dedicated to documenting the use of the AR100 for real time applications. Real time in this sense means predictable latency on the control of GPIO. The goal is to run a regular Linux kernel on the standard ARM based cores (CPUX) and use a shared memory region to transfer data from the CPUX to the AR100 core (CPUS).

Overview of the A64 system

The overview shows the critical path that the AR100 needs to take in order to set registers in a GPIO bank (marked in green). In that path there are two Arbeiters that can cause unpredictability in the chain. It is not known at this point how the arbiters work and the worst case delay caused by the arbiter.

Also seen in the overview is the R_GPIO (marked in brown). This is a GPIO bank that is dedicated to CPUS. The access latency is less than what the GPIO bank is, but since the TWI (I2C) bus used by CPUX to communicate with the PMIC is located on the same bus, Linux uses this bus to poll for changes in the PMIC. If the all bandwith is used by the CPUS, this blocks the CPUX from accessing the R_TWI which in turn causes errors in Linux. A64-overview.png

One option to remove the need for CPUX to access the R_TWI peripheral is to use a different I2C bus for the CPUX-PMIC communication. Using TWI0 with pin PH0/PH1 an the A64 might remove the need for CPUX to access any of the R_ peripherals.

A second option is to let AR100 and Linux share access to the bus. It should be very seldom that Linux needs to change anything, especially during a print. Once the voltage levels are set, which happens during boot, not much should have to change.

Comparison between PIO and R_PIO

The R_PIO is a bit faster than the PIO. A comparison shows that R_PIO uses about 13 ns to set the GPIO pins while PIO uses 60 ns. SDS5034X PNG 71.png SDS5034X PNG 72.png

Shared memory interface

In order to make data available for the CPUS to consume, it's good to reserve a region in memory for the CPUX to write to and for the CPUS to read from. This can be either DDR3 memory (plentyful, but slow and unpredictable) or SRAM (much less memory available, but requires 3 cycles to read by the CPUS).

As a first approach, it is suggested to reserve a region in SRAM A2 for the shared data.


The SRAM A2 has a total size of 0x10000 bytes, 64 K. The bottom part of the memory space is reserved for ATF (Arm Trusted Firmware), so in reality only 0x4000 bytes (16 K) is usable for firmware and shared memory. Preliminary tests with a simple program shows that 0x1800 (6 K) is used for starting a simple program. The standard Crust uses 0x400 bytes for stack, which might not be needed. That leaves some 0x2500 bytes, (just under 10 K) for shared memory. Each pin-state, delay requires 8 bytes (without optimization) giving a total of 1184 transitions.