pto.tgemv¶
pto.tgemv is part of the Matrix And Matrix Vector instruction set.
Summary¶
General Matrix-Vector multiplication producing an accumulator/output tile.
Mechanism¶
General Matrix-Vector multiplication (GEMV) producing an accumulator/output tile. It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.
Let:
M = 1K = bMatrix.GetValidRow()N = bMatrix.GetValidCol()
1. TGEMV (Tile-based GEMV)¶
For 0 <= j < N (output elements in the effective matmul domain):
2. TGEMV_ACC (Tile-based GEMV with Accumulation)¶
For 0 <= j < N (accumulates into existing tile):
3. TGEMV_BIAS (Tile-based GEMV with Bias)¶
For 0 <= j < N (adds bias term to matrix product):
Accumulator behavior and datatype promotion are concrete per target. On A2/A3: accumulation uses the accumulator tile's native datatype (int32_t or float), with int8 accumulation performed in 32-bit and fp accumulation using standard IEEE round-to-nearest-even. On A5: accumulation is always in the accumulator tile's native type, and fp accumulation follows the accumulator's native rounding mode. On CPU simulator: follows A5 semantics.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%acc = tgemv %a, %b : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%acc1 = tgemv.acc %acc0, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%acc = tgemv.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 1 (SSA)¶
%c = pto.tgemv %a, %b : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%c_out = pto.tgemv.acc %c_in, %a, %b : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%c = pto.tgemv.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tgemv ins(%a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)
pto.tgemv.acc ins(%c_in, %a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c_out : !pto.tile_buf<...>)
pto.tgemv.bias ins(%a, %b, %bias : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileRes, typename TileLeft, typename TileRight, typename... WaitEvents>
PTO_INST RecordEvent TGEMV(TileRes &cMatrix, TileLeft &aMatrix, TileRight &bMatrix, WaitEvents&... events);
template <typename TileRes, typename TileLeft, typename TileRight, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_ACC(TileRes &cOutMatrix, TileRes &cInMatrix, TileLeft &aMatrix, TileRight &bMatrix, WaitEvents&... events);
template <typename TileRes, typename TileLeft, typename TileRight, typename TileBias, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_BIAS(TileRes &cMatrix, TileLeft &aMatrix, TileRight &bMatrix, TileBias &biasData, WaitEvents&... events);
Inputs¶
ais the left operand tile (must be TileLeft location).bis the right operand tile (must be TileRight location).dstnames the destination accumulator tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds the matrix-vector product: dst[0,j] = sum over k of a[0,k] * b[k,j].
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
Common shape and location constraints¶
These constraints apply to TGEMV, TGEMV_ACC, and TGEMV_BIAS unless otherwise noted.
-
Static shape constraints:
TileLeft::Rows == TileRes::RowsTileLeft::Cols == TileRight::RowsTileRight::Cols == TileRes::Cols
-
Tile locations:
TileLeft::Loc == LeftTileRight::Loc == RightTileRes::Loc == Acc
-
Runtime valid-size constraints:
mmust be1kandn(taken frombMatrix.GetValidRow()andbMatrix.GetValidCol()) must be in[1, 4095]
TGEMV / TGEMV_ACC datatype constraints¶
-
Bias tile datatype must exactly match
TileRes::DType. -
Bias tile must be configured as a single row.
-
Bias tile location must be
TileType::Bias.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Implementation checks (A2A3):
- Supported
(CType, AType, BType)triples:(int32_t, int8_t, int8_t)(float, half, half)(float, float, float)(float, bfloat16_t, bfloat16_t)
- Supported
-
Implementation checks (A5):
- Accumulator type must be
int32_torfloat. - If
int32_t:AType == int8_tandBType == int8_t. - If
float: supportshalf,bfloat16_t,float, and selected fp8 pairs (target-defined). - Fractal/layout constraints are enforced:
- Left:
Loc == Left,!isRowMajor,SFractal == RowMajor - Right:
Loc == Right,isRowMajor,SFractal == ColMajor - Acc:
Loc == Acc,!isRowMajor,SFractal == RowMajor
- Left:
- Accumulator type must be
TGEMV_BIAS additional constraints¶
- Additional A5 note:
- No separate explicit
m/k/nruntime assertions are enforced in the underlying A5 matmul implementation beyond the GEMV contract described above.
- No separate explicit
Examples¶
Auto¶
1. TGEMV¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
C c;
TGEMV(c, a, b);
}
2. TGEMV_ACC¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
C c0, c1;
TGEMV_ACC(c1, c0, a, b);
}
3. TGEMV_BIAS¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using Bias = Tile<TileType::Bias, half, 1, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
Bias bias;
C c;
TGEMV_BIAS(c, a, b, bias);
}
Manual¶
1. TGEMV¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
C c;
TASSIGN(a, 0x1000);
TASSIGN(b, 0x2000);
TASSIGN(c, 0x3000);
TGEMV(c, a, b);
}
2. TGEMV_ACC¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
C c0, c1;
TASSIGN(a, 0x1000);
TASSIGN(b, 0x2000);
TASSIGN(c0, 0x3000);
TASSIGN(c1, 0x4000);
TGEMV_ACC(c1, c0, a, b);
}
3. TGEMV_BIAS¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using A = TileLeft<half, 1, 16>;
using B = TileRight<half, 16, 16>;
using Bias = Tile<TileType::Bias, half, 1, 16>;
using C = TileAcc<float, 1, 16>;
A a;
B b;
Bias bias;
C c;
TASSIGN(a, 0x1000);
TASSIGN(b, 0x2000);
TASSIGN(bias, 0x3000);
TASSIGN(c, 0x4000);
TGEMV_BIAS(c, a, b, bias);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%c = pto.tgemv %a, %b : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%c = pto.tgemv %a, %b : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%acc = tgemv %a, %b : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tgemv ins(%a, %b : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Matrix And Matrix Vector
- Previous op in instruction set: pto.tmatmul_bias
- Next op in instruction set: pto.tgemv_acc