pto.mte_gm_l1¶

pto.mte_gm_l1 is part of Cube Data Movement Ops.

Summary¶

Structured GM→L1 (cube CBUF) copy. Copies grouped byte ranges from %src in GM to %dst in L1 without performing any layout transform — the source bytes are written to L1 verbatim.

Use pto.mte_gm_l1_frac when the source is row-major ND data that needs ND→NZ fractal repack before it can serve as a cube operand.

Mechanism¶

Like the scalar pto.mte_gm_ub, this op uses the grouped nburst(...) [loop(...)]* model. For each nburst row, source and destination advance by src_stride / dst_stride. Optional outer loop(...) groups wrap the inner transfer.

Syntax¶

pto.mte_gm_l1 %src, %dst, %len_burst
  nburst(%count, %src_stride, %dst_stride)
  [loop(%count_i, %src_stride_i, %dst_stride_i)]*
  : !pto.ptr<T, gm>, !pto.ptr<T, l1>, i64, i64, i64, i64

Inputs¶

Parameter	Width	Description
`%src`	ptr	GM source base pointer
`%dst`	ptr	L1 destination base pointer (`!pto.ptr<T, l1>`)
`%len_burst`	i64	Bytes copied per burst row
`nburst(%count, %src_stride, %dst_stride)`	i64 triple	Innermost burst count and byte strides between row starts
`loop(%count_i, %src_stride_i, %dst_stride_i)`	i64 triple	Optional outer repetition; byte advances between enclosed patterns

Expected Outputs¶

Result	Type	Description
None	`—`	Writes data into the L1 destination region.

Side Effects¶

Reads GM-visible storage; writes L1-visible storage. Engages the AIC MTE2 pipe.

Constraints¶

Constraints

nburst(...) is required.
Each loop(...) group must provide all three operands.
All strides are bytes. For a contiguous 16-element f16 vector, use %len_burst = 32.

Examples¶

pto.mte_gm_l1 %bias_gm, %l1_bias, %c32_i64
  nburst(%c4_i64, %c64_i64, %c32_i64)
  : !pto.ptr<f16, gm>, !pto.ptr<f16, l1>, i64, i64, i64, i64

ND→NZ repack: pto.mte_gm_l1_frac
L1 → UB: pto.mte_l1_ub
L1 → cube operand tiles: pto.mte_l1_l0a, pto.mte_l1_l0b