DMA Copy

These pto.* forms configure and execute scalar-side DMA movement between GM, UB, and L1. They are part of the scalar and control instructions because they describe DMA configuration and copy behavior, not vector-register compute.

What This Instruction Set Covers

  • Grouped GM↔UB transfers with inline burst / loop / pad clauses
  • Grouped UB↔UB and UB→L1 copies
  • (Pre-v0.6) standalone loop-size and loop-stride configuration registers

v0.6 Grouped Transfer Ops

These are the four public grouped DMA interfaces in the PTO ISA v0.6 micro-instruction surface. Each instruction expresses its repetition structure via inline nburst(...) / loop(...) clauses on the op itself; standalone loop / stride configuration registers are no longer required.

  • pto.mte_gm_ub — GM → UB, with optional pad(...) for 32B-aligned row padding
  • pto.mte_ub_gm — UB → GM, strips padding added during load
  • pto.mte_ub_ub — intra-UB copy in 32B-unit bursts with gap fields
  • pto.mte_ub_l1 — UB → L1 (cube CBUF), 32B-unit bursts with gap fields

Deprecated Pre-v0.6 Configuration Ops

These ops correspond to the older surface where loop counts and per-level strides were programmed via standalone configuration registers and then consumed by a separate copy op. In v0.6 the same information lives inline on the grouped transfer op (nburst(...) and outer loop(...) clauses). The pages below are retained for historical reference and pre-v0.6 ports.

The legacy execution ops pto.copy_gm_to_ubuf / pto.copy_ubuf_to_gm / pto.copy_ubuf_to_ubuf have been replaced by the v0.6 grouped forms pto.mte_gm_ub / pto.mte_ub_gm / pto.mte_ub_ub linked above. Their per-op pages (URL slugs preserved) now document the v0.6 surface.