Jump to: navigation, search

Difference between revisions of "AR100"

(Shared memory interface)
(SRAM A2)
 
(One intermediate revision by the same user not shown)
Line 28: Line 28:
 
That leaves some 0x2500 bytes, (just under 10 K) for shared memory.  
 
That leaves some 0x2500 bytes, (just under 10 K) for shared memory.  
 
Each pin-state, delay requires 8 bytes (without optimization) giving a total of 1184 transitions.
 
Each pin-state, delay requires 8 bytes (without optimization) giving a total of 1184 transitions.
 +
===SRAM A1===
 +
SRAM A1 has a size of 32 K, the memory is not in use during normal operations. See below for benchmarks on latency for data read.
 +
The base address as seen from AR100 is 0x40000, this is not well documented.
 +
 +
==Benchmarks from ar100-info==
 +
Included is the results from running the ar100-info program: [https://github.com/skristiansson/ar100-info]
 +
<pre>
 +
OpenRISC-1200 (rev 1)
 +
D-Cache: no
 +
I-Cache: 4096 bytes, 16 bytes/line, 1 way(s)
 +
DMMU: no
 +
IMMU: no
 +
MAC unit: yes
 +
Debug unit: yes
 +
Performance counters: no
 +
Power management: yes
 +
Interrupt controller: yes
 +
Timer: yes
 +
Custom unit(s): no
 +
ORBIS32: yes
 +
ORBIS64: no
 +
ORFPX32: no
 +
ORFPX64: no
 +
Test support for l.addc...yes
 +
Test support for l.cmov...yes
 +
Test support for l.cust1...no
 +
Test support for l.cust2...no
 +
Test support for l.cust3...no
 +
Test support for l.cust4...no
 +
Test support for l.cust5...no
 +
Test support for l.cust6...no
 +
Test support for l.cust7...no
 +
Test support for l.cust8...no
 +
Test support for l.div...yes
 +
Test support for l.divu...yes
 +
Test support for l.extbs...yes
 +
Test support for l.extbz...yes
 +
Test support for l.exths...yes
 +
Test support for l.exthz...yes
 +
Test support for l.extws...yes
 +
Test support for l.extwz...yes
 +
Test support for l.ff1...yes
 +
Test support for l.fl1...yes
 +
Test support for l.lws...no
 +
Test support for l.mac...yes
 +
Test support for l.maci...yes
 +
Test support for l.macrc...yes
 +
Test support for l.mul...yes
 +
Test support for l.muli...yes
 +
Test support for l.mulu...yes
 +
Test support for l.ror...yes
 +
Test support for l.rori...yes
 +
Test timer functionality...OK
 +
Set clock source to LOSC...done
 +
Test CLK freq...27 KHz
 +
Set clock source to HOSC (POSTDIV=0, DIV=0)...done
 +
Test CLK freq...24 MHz
 +
Set clock source to HOSC (POSTDIV=0, DIV=1)...done
 +
Test CLK freq...12 MHz
 +
Set clock source to HOSC (POSTDIV=1, DIV=1)...done
 +
Test CLK freq...12 MHz
 +
Setup PLL6 (M=1, K=1, N=24)...done
 +
Set clock source to PLL6 (POSTDIV=1, DIV=0)...done
 +
Test CLK freq...300 MHz
 +
 +
  == Benchmark ==
 +
  == Code in SRAM A2 (I-cache ON), data in SRAM A2 ==
 +
      Instructions fetch : 1.0 cycles per instruction
 +
      Back-to-back L.LWZ : 3.0 cycles per 32-bit read
 +
          L.LWZ + L.NOP : 4.0 cycles per 32-bit read
 +
  == Code in SRAM A2 (I-cache ON), data in SRAM A1 ==
 +
      Instructions fetch : 1.0 cycles per instruction
 +
      Back-to-back L.LWZ : 3.0 cycles per 32-bit read
 +
          L.LWZ + L.NOP : 15.9 cycles per 32-bit read
 +
 +
  == Code in SRAM A2 (I-cache OFF), data in SRAM A2 ==
 +
      Instructions fetch : 3.0 cycles per instruction
 +
      Back-to-back L.LWZ : 4.5 cycles per 32-bit read
 +
          L.LWZ + L.NOP : 8.0 cycles per 32-bit read
 +
  == Code in SRAM A2 (I-cache OFF), data in SRAM A1 ==
 +
      Instructions fetch : 3.0 cycles per instruction
 +
      Back-to-back L.LWZ : 16.0 cycles per 32-bit read
 +
          L.LWZ + L.NOP : 19.3 cycles per 32-bit read
 +
 +
Set clock source to PLL6 (POSTDIV=2, DIV=0)...done
 +
Test CLK freq...200 MHz
 +
Set clock source to PLL6 (POSTDIV=2, DIV=1)...done
 +
Test CLK freq...100 MHz
 +
Setup PLL6 (M=2, K=1, N=24)...done
 +
Test CLK freq...100 MHz
 +
Setup PLL6 (M=1, K=2, N=24)...done
 +
Test CLK freq...75 MHz
 +
Setup PLL6 (M=1, K=1, N=12)...done
 +
Test CLK freq...52 MHz
 +
Restore clock config to original state...done
 +
All tests completed!
 +
</pre>

Latest revision as of 11:59, 1 July 2020

AR100 is an OpenRISK 1000 implemented in a range of Allwinner SoCs. This page is dedicated to documenting the use of the AR100 for real time applications. Real time in this sense means predictable latency on the control of GPIO. The goal is to run a regular Linux kernel on the standard ARM based cores (CPUX) and use a shared memory region to transfer data from the CPUX to the AR100 core (CPUS).


Overview of the A64 system

The overview shows the critical path that the AR100 needs to take in order to set registers in a GPIO bank (marked in green). In that path there are two Arbeiters that can cause unpredictability in the chain. It is not known at this point how the arbiters work and the worst case delay caused by the arbiter.

Also seen in the overview is the R_GPIO (marked in brown). This is a GPIO bank that is dedicated to CPUS. The access latency is less than what the GPIO bank is, but since the TWI (I2C) bus used by CPUX to communicate with the PMIC is located on the same bus, Linux uses this bus to poll for changes in the PMIC. If the all bandwith is used by the CPUS, this blocks the CPUX from accessing the R_TWI which in turn causes errors in Linux. A64-overview.png

One option to remove the need for CPUX to access the R_TWI peripheral is to use a different I2C bus for the CPUX-PMIC communication. Using TWI0 with pin PH0/PH1 an the A64 might remove the need for CPUX to access any of the R_ peripherals.

A second option is to let AR100 and Linux share access to the bus. It should be very seldom that Linux needs to change anything, especially during a print. Once the voltage levels are set, which happens during boot, not much should have to change.

Comparison between PIO and R_PIO

The R_PIO is a bit faster than the PIO. A comparison shows that R_PIO uses about 13 ns to set the GPIO pins while PIO uses 60 ns. SDS5034X PNG 71.png SDS5034X PNG 72.png

Shared memory interface

In order to make data available for the CPUS to consume, it's good to reserve a region in memory for the CPUX to write to and for the CPUS to read from. This can be either DDR3 memory (plentyful, but slow and unpredictable) or SRAM (much less memory available, but requires 3 cycles to read by the CPUS).

As a first approach, it is suggested to reserve a region in SRAM A2 for the shared data.

SRAM A2

The SRAM A2 has a total size of 0x10000 bytes, 64 K. The bottom part of the memory space is reserved for ATF (Arm Trusted Firmware), so in reality only 0x4000 bytes (16 K) is usable for firmware and shared memory. Preliminary tests with a simple program shows that 0x1800 (6 K) is used for starting a simple program. The standard Crust uses 0x400 bytes for stack, which might not be needed. That leaves some 0x2500 bytes, (just under 10 K) for shared memory. Each pin-state, delay requires 8 bytes (without optimization) giving a total of 1184 transitions.

SRAM A1

SRAM A1 has a size of 32 K, the memory is not in use during normal operations. See below for benchmarks on latency for data read. The base address as seen from AR100 is 0x40000, this is not well documented.

Benchmarks from ar100-info

Included is the results from running the ar100-info program: [1]

OpenRISC-1200 (rev 1)
D-Cache: no
I-Cache: 4096 bytes, 16 bytes/line, 1 way(s)
DMMU: no
IMMU: no
MAC unit: yes
Debug unit: yes
Performance counters: no
Power management: yes
Interrupt controller: yes
Timer: yes
Custom unit(s): no
ORBIS32: yes
ORBIS64: no
ORFPX32: no
ORFPX64: no
Test support for l.addc...yes
Test support for l.cmov...yes
Test support for l.cust1...no
Test support for l.cust2...no
Test support for l.cust3...no
Test support for l.cust4...no
Test support for l.cust5...no
Test support for l.cust6...no
Test support for l.cust7...no
Test support for l.cust8...no
Test support for l.div...yes
Test support for l.divu...yes
Test support for l.extbs...yes
Test support for l.extbz...yes
Test support for l.exths...yes
Test support for l.exthz...yes
Test support for l.extws...yes
Test support for l.extwz...yes
Test support for l.ff1...yes
Test support for l.fl1...yes
Test support for l.lws...no
Test support for l.mac...yes
Test support for l.maci...yes
Test support for l.macrc...yes
Test support for l.mul...yes
Test support for l.muli...yes
Test support for l.mulu...yes
Test support for l.ror...yes
Test support for l.rori...yes
Test timer functionality...OK
Set clock source to LOSC...done
Test CLK freq...27 KHz
Set clock source to HOSC (POSTDIV=0, DIV=0)...done
Test CLK freq...24 MHz
Set clock source to HOSC (POSTDIV=0, DIV=1)...done
Test CLK freq...12 MHz
Set clock source to HOSC (POSTDIV=1, DIV=1)...done
Test CLK freq...12 MHz
Setup PLL6 (M=1, K=1, N=24)...done
Set clock source to PLL6 (POSTDIV=1, DIV=0)...done
Test CLK freq...300 MHz

  == Benchmark ==
   == Code in SRAM A2 (I-cache ON), data in SRAM A2 ==
      Instructions fetch : 1.0 cycles per instruction
      Back-to-back L.LWZ : 3.0 cycles per 32-bit read
           L.LWZ + L.NOP : 4.0 cycles per 32-bit read
   == Code in SRAM A2 (I-cache ON), data in SRAM A1 ==
      Instructions fetch : 1.0 cycles per instruction
      Back-to-back L.LWZ : 3.0 cycles per 32-bit read
           L.LWZ + L.NOP : 15.9 cycles per 32-bit read

   == Code in SRAM A2 (I-cache OFF), data in SRAM A2 ==
      Instructions fetch : 3.0 cycles per instruction
      Back-to-back L.LWZ : 4.5 cycles per 32-bit read
           L.LWZ + L.NOP : 8.0 cycles per 32-bit read
   == Code in SRAM A2 (I-cache OFF), data in SRAM A1 ==
      Instructions fetch : 3.0 cycles per instruction
      Back-to-back L.LWZ : 16.0 cycles per 32-bit read
           L.LWZ + L.NOP : 19.3 cycles per 32-bit read

Set clock source to PLL6 (POSTDIV=2, DIV=0)...done
Test CLK freq...200 MHz
Set clock source to PLL6 (POSTDIV=2, DIV=1)...done
Test CLK freq...100 MHz
Setup PLL6 (M=2, K=1, N=24)...done
Test CLK freq...100 MHz
Setup PLL6 (M=1, K=2, N=24)...done
Test CLK freq...75 MHz
Setup PLL6 (M=1, K=1, N=12)...done
Test CLK freq...52 MHz
Restore clock config to original state...done
All tests completed!