pto.tmatmul_acc¶
pto.tmatmul_acc is part of the Matrix And Matrix Vector instruction set.
Summary¶
Matrix multiply with accumulator input (fused accumulate).
Mechanism¶
Matrix multiply with accumulator input (fused accumulate). It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.
Let:
M = aMatrix.GetValidRow()K = aMatrix.GetValidCol()N = bMatrix.GetValidCol()
For 0 <= i < M and 0 <= j < N:
\[ \mathrm{C1}_{i,j} = \mathrm{C0}_{i,j} + \sum_{k=0}^{K-1} \mathrm{A}_{i,k} \cdot \mathrm{B}_{k,j} \]
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%acc1 = tmatmul.acc %acc0, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 1 (SSA)¶
%c_out = pto.tmatmul.acc %c_in, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tmatmul.acc ins(%c_in, %a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c_out : !pto.tile_buf<...>)
IR Level 1 (SSA)¶
%c_out = pto.tmatmul.acc %c_in, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.tmatmul.acc ins(%c_in, %a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c_out : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileRes, typename TileLeft, typename TileRight, typename... WaitEvents>
PTO_INST RecordEvent TMATMUL_ACC(TileRes &cOutMatrix, TileRes &cInMatrix, TileLeft &aMatrix, TileRight &bMatrix, WaitEvents &... events);
template <AccPhase Phase, typename TileRes, typename TileLeft, typename TileRight, typename... WaitEvents>
PTO_INST RecordEvent TMATMUL_ACC(TileRes &cOutMatrix, TileRes &cInMatrix, TileLeft &aMatrix, TileRight &bMatrix, WaitEvents &... events);
template <AccPhase Phase = AccPhase::Unspecified, typename TileRes, typename TileLeft, typename TileRight,
typename... WaitEvents>
PTO_INST RecordEvent TMATMUL_ACC(TileRes &cMatrix, TileLeft &aMatrix, TileRight &bMatrix, WaitEvents &... events);
Inputs¶
cInis the input accumulator tile.ais the left operand tile (must be TileLeft location).bis the right operand tile (must be TileRight location).dstnames the output accumulator tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds the accumulated matrix multiply result: dst[i,j] = cIn[i,j] + sum over k of a[i,k] * b[k,j].
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- All constraints from
TMATMULapply to the(cOutMatrix, aMatrix, bMatrix)triple.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Implementation notes (A2A3/A5):
TMATMUL_ACC_IMPLusesaMatrix.GetValidRow(),aMatrix.GetValidCol(), andbMatrix.GetValidCol()form/k/n.cInMatrixis not validated by explicit assertions in the current implementations (target-defined behavior).
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using A = TileLeft<half, 16, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 16, 16>;
A a;
B b;
C c0, c1;
TMATMUL_ACC(c1, c0, a, b);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using A = TileLeft<half, 16, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 16, 16>;
A a;
B b;
C c0, c1;
TASSIGN(a, 0x1000);
TASSIGN(b, 0x2000);
TASSIGN(c0, 0x3000);
TASSIGN(c1, 0x4000);
TMATMUL_ACC(c1, c0, a, b);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%c_out = pto.tmatmul.acc %c_in, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%c_out = pto.tmatmul.acc %c_in, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%acc1 = tmatmul.acc %acc0, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tmatmul.acc ins(%c_in, %a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c_out : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Matrix And Matrix Vector
- Previous op in instruction set: pto.tmatmul
- Next op in instruction set: pto.tmatmul_bias