pto.mad_mx_bias¶
pto.mad_mx_bias is part of the Cube MAD Ops.
Summary¶
Bias-init MX (microscaled) cube matrix multiply: dst[m, n] = mx_product[m, n] + bias[n].
See MX Matmul Model for the per-K-group scaled multiply-accumulate.
Mechanism¶
Combines the MX scaling of pto.mad_mx with the bias-init seed of pto.mad_bias. The accumulator starts from bias[n] instead of zero.
Syntax¶
pto.mad_mx_bias %lhs, %rhs, %dst, %bias, %m, %n, %k
unit_flag(check_only | check_and_set)?
disable_gemv?
(sat | nosat)?
n_dir?
: !pto.ptr<A, l0a>, !pto.ptr<B, l0b>, !pto.ptr<C, l0c>, !pto.ptr<C, bt>, i64, i64, i64
Inputs¶
Same parameter shape as pto.mad_bias, with MX %lhs / %rhs scale payload requirements from pto.mad_mx.
See MAD Common Clauses for the optional clauses. tf32_mode(...) is not accepted.
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
| None | — |
Writes the produced M x N MX-scaled tile to L0C with bias-init seed. |
Side Effects¶
Engages the CUBE pipe; reads %bias from BT and MX scale payloads associated with %lhs / %rhs; writes to L0C.
Constraints¶
Constraints
- All constraints from
pto.mad_mx(MX dtype combination, scale payload prerequisites, K grouping rule). - All
%biasconstraints frompto.mad_bias:%biasmust be inbtspace with element type matching%dst; onlyNvalues are consumed.
Examples¶
pto.mad_mx_bias %l0a, %l0b, %l0c, %bt, %c16_i64, %c16_i64, %c64_i64
: !pto.ptr<f8E4M3FN, l0a>, !pto.ptr<f8E4M3FN, l0b>, !pto.ptr<f32, l0c>, !pto.ptr<f32, bt>, i64, i64, i64
Related Ops¶
- Zero-init MX form: pto.mad_mx
- Accumulating MX form: pto.mad_mx_acc
- Non-MX bias-init form: pto.mad_bias
- Bias staging: pto.mte_l1_bt
- MX scale loaders: pto.mte_l1_l0a_mx, pto.mte_l1_l0b_mx