pto.mte_gm_l1¶
pto.mte_gm_l1 is part of Cube Data Movement Ops.
Summary¶
Structured GM→L1 (cube CBUF) copy. Copies grouped byte ranges from %src in GM to %dst in L1 without performing any layout transform — the source bytes are written to L1 verbatim.
Use pto.mte_gm_l1_frac when the source is row-major ND data that needs ND→NZ fractal repack before it can serve as a cube operand.
Mechanism¶
Like the scalar pto.mte_gm_ub, this op uses the grouped nburst(...) [loop(...)]* model. For each nburst row, source and destination advance by src_stride / dst_stride. Optional outer loop(...) groups wrap the inner transfer.
Syntax¶
pto.mte_gm_l1 %src, %dst, %len_burst
nburst(%count, %src_stride, %dst_stride)
[loop(%count_i, %src_stride_i, %dst_stride_i)]*
: !pto.ptr<T, gm>, !pto.ptr<T, l1>, i64, i64, i64, i64
Inputs¶
| Parameter | Width | Description |
|---|---|---|
%src |
ptr | GM source base pointer |
%dst |
ptr | L1 destination base pointer (!pto.ptr<T, l1>) |
%len_burst |
i64 | Bytes copied per burst row |
nburst(%count, %src_stride, %dst_stride) |
i64 triple | Innermost burst count and byte strides between row starts |
loop(%count_i, %src_stride_i, %dst_stride_i) |
i64 triple | Optional outer repetition; byte advances between enclosed patterns |
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
| None | — |
Writes data into the L1 destination region. |
Side Effects¶
Reads GM-visible storage; writes L1-visible storage. Engages the AIC MTE2 pipe.
Constraints¶
Constraints
nburst(...)is required.- Each
loop(...)group must provide all three operands. - All strides are bytes. For a contiguous 16-element f16 vector, use
%len_burst = 32.
Examples¶
pto.mte_gm_l1 %bias_gm, %l1_bias, %c32_i64
nburst(%c4_i64, %c64_i64, %c32_i64)
: !pto.ptr<f16, gm>, !pto.ptr<f16, l1>, i64, i64, i64, i64
Related Ops¶
- ND→NZ repack: pto.mte_gm_l1_frac
- L1 → UB: pto.mte_l1_ub
- L1 → cube operand tiles: pto.mte_l1_l0a, pto.mte_l1_l0b