pto.mad¶
pto.mad is part of the Cube MAD Ops.
Summary¶
Zero-init cube matrix multiply: dst[m, n] = sum_k(lhs[m, k] * rhs[k, n]).
Mechanism¶
Reads tiled operands from L0A and L0B, multiplies them in the cube MMAD pipe, and writes the accumulator tile in L0C. The result overwrites L0C (no accumulation with prior L0C state — use pto.mad_acc for accumulation, or pto.mad_bias for bias-init).
The matrix element types are inferred from %lhs, %rhs, and %dst pointer element types — there is no separate type selector. Unsupported type combinations are invalid programs.
Syntax¶
pto.mad %lhs, %rhs, %dst, %m, %n, %k
unit_flag(check_only | check_and_set)?
disable_gemv?
(sat | nosat)?
tf32_mode(round_even | round_away)?
n_dir?
: !pto.ptr<A, l0a>, !pto.ptr<B, l0b>, !pto.ptr<C, l0c>, i64, i64, i64
Inputs¶
| Parameter | Type | Description |
|---|---|---|
%lhs |
!pto.ptr<A, l0a> |
Left operand tile in L0A, interpreted as logical M x K |
%rhs |
!pto.ptr<B, l0b> |
Right operand tile in L0B, interpreted as logical K x N |
%dst |
!pto.ptr<C, l0c> |
Accumulator destination tile in L0C, interpreted as logical M x N |
%m |
i64 |
Logical M element count |
%n |
i64 |
Logical N element count |
%k |
i64 |
Logical K element count |
See MAD Common Clauses for the optional clauses.
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
| None | — |
Writes the produced M x N tile to L0C. No SSA result. |
Side Effects¶
Engages the CUBE pipe and writes to L0C. Downstream FIXPIPE consumers must synchronize through pto.set_flag / pto.wait_flag (PIPE_CUBE → PIPE_FIXP).
Constraints¶
Constraints
%lhs,%rhs, and%dstmust be inl0a,l0b, andl0c.%m,%n, and%kmust be positive and satisfy the target shape limits for the selected element-type combination.tf32_mode(...)requiresf32lhs, rhs, and dst element types.sat/nosatrequires a floating element-type combination.- Packed 4-bit integer data requires
%kto select an even number of K elements.
MAD Common Clauses¶
| Clause | Values | Effect |
|---|---|---|
unit_flag(...) |
check_only, check_and_set |
Participates in producer-side tile synchronization. check_only checks that the producer slot can be used. check_and_set also publishes the produced %dst tile for later consumers. Omit when the schedule does not use unit flags for this tile. |
disable_gemv |
flag | Applies only when %m = 1. Omitted means GEMV A-vector consumption: %lhs must contain the logical 1 x K row in the target GEMV left-tile organization. Present means normal matmul left-tile organization. The mathematical result is still lhs @ rhs; only the required %lhs organization changes. For %m != 1, normal matmul organization is used. |
sat / nosat |
flags | Floating exceptional-value mode for floating and MX MAD forms. With sat, exceptional multiply inputs are normalized before arithmetic (+/-inf to finite type extrema, nan to 0) and finite overflow saturates to the finite type range. With nosat, exceptional inputs are preserved and overflow may produce exceptional outputs. Omit both to use the execution mode selected outside this op. Integer MAD forms do not accept these flags. |
tf32_mode(...) |
round_even, round_away |
Valid only for non-MX f32 x f32 -> f32. FP32 inputs are rounded to TF32 precision before multiplication; accumulation and output remain FP32. |
n_dir |
flag | Requests N-direction result production order for schedules that combine compute with unit flags and later layout movement. It does not change dst[m, n]. |
Examples¶
pto.mad %l0a, %l0b, %l0c, %c16_i64, %c16_i64, %c32_i64
: !pto.ptr<f16, l0a>, !pto.ptr<f16, l0b>, !pto.ptr<f32, l0c>, i64, i64, i64
Related Ops¶
- Accumulating form: pto.mad_acc
- Bias-init form: pto.mad_bias
- MX variants: pto.mad_mx, pto.mad_mx_acc, pto.mad_mx_bias
- Operand staging: pto.mte_l1_l0a, pto.mte_l1_l0b
- Result writeback: FIXPIPE Model