pto.mad_mx_acc¶
pto.mad_mx_acc is part of the Cube MAD Ops.
Summary¶
Accumulating MX (microscaled) cube matrix multiply: dst[m, n] = dst[m, n] + mx_product[m, n].
See MX Matmul Model for the per-K-group scaled multiply-accumulate.
Mechanism¶
Like pto.mad_mx but adds the MX-scaled product to existing L0C state. Typical use is K-axis tiling for MX GEMM.
Syntax¶
pto.mad_mx_acc %lhs, %rhs, %dst, %m, %n, %k
unit_flag(check_only | check_and_set)?
disable_gemv?
(sat | nosat)?
n_dir?
: !pto.ptr<A, l0a>, !pto.ptr<B, l0b>, !pto.ptr<C, l0c>, i64, i64, i64
Inputs¶
Same parameter shape as pto.mad_mx.
See MAD Common Clauses for the optional clauses. tf32_mode(...) is not accepted.
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
| None | — |
Updates the existing M x N tile in L0C with MX-scaled accumulation. |
Side Effects¶
Same as pto.mad_mx. The caller is responsible for ensuring the L0C tile has been initialized (typically by an initial pto.mad_mx or pto.mad_mx_bias on the same %dst).
Constraints¶
Same as pto.mad_mx.
Examples¶
pto.mad_mx_acc %l0a, %l0b, %l0c, %c16_i64, %c16_i64, %c64_i64
: !pto.ptr<f8E4M3FN, l0a>, !pto.ptr<f8E4M3FN, l0b>, !pto.ptr<f32, l0c>, i64, i64, i64
Related Ops¶
- Zero-init MX form: pto.mad_mx
- Bias-init MX form: pto.mad_mx_bias
- Non-MX accumulating form: pto.mad_acc