Cube Buffer Hierarchy

The cube core (AIC) operates on a dedicated buffer hierarchy distinct from the Unified Buffer (UB) that Vector blocks use. Cube operands move through L1 (cbuf) → L0A / L0BL0C → writeback, with optional BT (bias table) and FB (FIXPIPE buffer) helpers.

Address Spaces

Space Role Layout Typical Producer Typical Consumer
gm Global Memory (off-chip HBM/DDR) ND row-major host / kernel DMA loaders
l1 Cube CBUF, ~1 MB on-chip NZ fractal pto.mte_gm_l1, pto.mte_gm_l1_frac, pto.mte_ub_l1 pto.mte_l1_l0a, pto.mte_l1_l0b, pto.mte_l1_ub, pto.mte_l1_bt
l0a Cube left-operand scratchpad FRACTAL_NZ (A5) / FRACTAL_ZZ (A3) pto.mte_l1_l0a pto.mad*
l0b Cube right-operand scratchpad FRACTAL_ZN (K innermost) pto.mte_l1_l0b pto.mad*
l0c Cube accumulator FRACTAL_NZ output of MMAD pto.mad* FIXPIPE writeback (pto.mte_l0c_*)
bt Bias Table element-type-matched vector pto.mte_l1_bt pto.mad_bias, pto.mad_mx_bias
fb FIXPIPE auxiliary buffer implementation-defined pto.mte_l1_fb FIXPIPE writeback ops
ub Vector Unified Buffer ND DMA loaders vector pipe

See NZ Fractal Layout for the precise per-buffer NZ index orders.

Data-Flow Contract

                +----------------- AIC issue queues -----------------+
                |    MTE2     MTE1    CUBE (MMAD)    FIXP            |
                |     |         |        |             |             |
GM (ND) --- pto.mte_gm_l1 / pto.mte_gm_l1_frac        |             |
              |   |                                                  |
              v   v                                                  |
              L1 (NZ) <-- pto.mte_ub_l1 --- UB                       |
              |                                                      |
       +------+-----+---------------------+                          |
       |            |                     |                          |
   mte_l1_l0a   mte_l1_l0b           mte_l1_bt / mte_l1_fb            |
       |            |                     |                          |
       v            v                     |                          |
      L0A          L0B                    |                          |
       |            |                     |                          |
       +-----+------+                     |                          |
             |                            |                          |
             |     pto.mad / pto.mad_acc / pto.mad_bias / *_mx*       |
             |     <----------------------+                          |
             v                                                       |
            L0C                                                      |
             |                                                       |
             +-- pto.mte_l0c_l1 / pto.mte_l0c_gm / pto.mte_l0c_ub ---+
                  (FIXPIPE writeback)

Alignment and Sizing Conventions

  • All cube buffer pointers (L1 / L0A / L0B / L0C / BT / FB) are 32-byte aligned.
  • L0A and L0B fractal tiles are 512B (one 32B-wide × 16-row block in the appropriate inner orientation).
  • L0C accumulator tiles use the N1 M1 M0 N0 order so that FIXPIPE can stream out one M-row of results at a time.
  • Element-type-derived inner widths (K0 = N0 = C0 / sizeof(T)) follow NZ Fractal Layout.

Synchronization

The cube programs are issued from the AIC's Scalar Unit (SU) into the MTE2 / MTE1 / CUBE / FIXP issue queues. Synchronization with the Vector blocks happens through the System Controller (SC) semaphores and the dedicated 1:2 fixpipe broadcast path. See:

  • Pipeline Synchronization for the intra-block (pto.set_flag / pto.wait_flag) primitives that order MTE2 → MTE1 → CUBE → FIXP within the AIC.
  • Cluster Programming Model for inter-block (pto.set_intra_block / pto.wait_intra_core) primitives used between AIC and AIV.