pto.mte_l0c_l1¶
pto.mte_l0c_l1 is part of Cube Data Movement Ops. It is one of the three FIXPIPE writeback ops; see FIXPIPE Model for the shared writeback pipeline, layout modes, and clause semantics.
Summary¶
FIXPIPE writeback from l0c to L1 l1. Applies optional pre-quant, pre-ReLU/clip, layout transform, outer-loop repeat, and saturation behavior in canonical order before storing the converted result to L1.
Syntax¶
pto.mte_l0c_l1 %src, %dst, %m, %n, %src_stride, %dst_stride
[, unit_flag(check_only | check_and_clear)]?
[, pre_quant(%payload, mode = <quant_pre_mode>)]?
[, pre_relu([%payload, ]mode = <relu_pre_mode> [, clip = %clip])]?
[, nz2nd | nz2dn(%loop0_src_stride) | nz2nz(%split)?]
[, loop3(%count, %src_stride3, %dst_stride3)]?
[, sat | sat(preserve_nan) | nosat]?
: ...
Inputs¶
| Parameter | Width | Description |
|---|---|---|
%src |
buffer-like | Accumulator source in l0c |
%dst |
buffer-like | L1 destination in l1 |
%m |
i64 | Logical M element count |
%n |
i64 | Logical N element count |
%src_stride |
i64 | Source stride in C0-size units (1 unit = 32 bytes) |
%dst_stride |
i64 | Destination stride in destination elements |
See FIXPIPE Common Clauses and FIXPIPE Layout Model for the optional clauses.
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
| None | — |
Writes converted M x N result to L1. |
Side Effects¶
Reads L0C; writes L1. Engages the AIC FIXP pipe. Consumers in L1 must synchronize through pipe events.
Constraints¶
Constraints
- Clauses must appear in canonical order:
unit_flag→pre_quant→pre_relu→ layout →loop3→sat/nosat. pre_quantrequires payload and mode together.- Vector
pre_quantmodes require afbpointer withf16,bf16, orf32element type. - Scalar
pre_quantmodes require anf16,bf16, orf32scalar payload. pre_quantsource element type must bef32ori32, and the selected mode must be compatible with the source and destination element types.no_reluandnormal_reludo not accept a payload.scalar_relurequires anf16/bf16/f32scalar payload.vector_relurequires afbpointer withf16/bf16/f32element type.clipcan appear only insidepre_relu(...).clipis supported for destinationf16,ui8, and signed/signless 4/8/16-bit integer destinations; payload must match the destination family.nz2dnrequires%loop0_src_stride;nz2ndandnz2nzdo not accept it.unit_flagmust be omitted whennz2dn(%loop0_src_stride)uses a value other than 1.nz2nzrequiresf32destination element type and does not acceptloop3.sat,sat(preserve_nan), andnosatare mutually exclusive.
Examples¶
pto.mte_l0c_l1 %l0c, %l1_out, %c16_i64, %c32_i64, %c16_i64, %c32_i64,
pre_quant(%c1_f32, mode = qf322f16_pre_scalar),
pre_relu(%c025_f32, mode = scalar_relu),
nz2nd,
sat
: !pto.ptr<f32, l0c>, !pto.ptr<f16, l1>, i64, i64, i64, i64, f32, f32
Related Ops¶
- FIXPIPE writeback siblings: pto.mte_l0c_gm, pto.mte_l0c_ub
- Parameter payload loader: pto.mte_l1_fb
- MAD producers: pto.mad and variants