pto.tmatmul_bias¶

pto.tmatmul_bias is part of the Matrix And Matrix Vector instruction set.

Summary¶

Matrix multiply with bias add.

Mechanism¶

Matrix multiply with bias add. It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.

Let:

M = aMatrix.GetValidRow()
K = aMatrix.GetValidCol()
N = bMatrix.GetValidCol()

For 0 <= i < M and 0 <= j < N:

\[ \mathrm{C}_{i,j} = \sum_{k=0}^{K-1} \mathrm{A}_{i,k} \cdot \mathrm{B}_{k,j} + \mathrm{Bias}_{0,j} \]

Bias broadcasting extends across the M dimension. On A2/A3: bias must have exactly 1 row and N columns, broadcast along M=1 rows; no other broadcasting configurations are supported. On A5: bias must have exactly 1 row and N columns with row-major layout, broadcast along M=1 rows; no other broadcasting configurations are supported. On CPU simulator: follows A5 semantics.

Syntax¶

Textual spelling is defined by the PTO ISA syntax-and-operands pages.

Synchronous form:

%acc = tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 1 (SSA)¶

%c = pto.tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 2 (DPS)¶

pto.tmatmul.bias ins(%a, %b, %bias : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)

IR Level 1 (SSA)¶

%c = pto.tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

IR Level 2 (DPS)¶

pto.tmatmul.bias ins(%a, %b, %bias : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)

C++ Intrinsic¶

Declared in include/pto/common/pto_instr.hpp:

template <typename TileRes, typename TileLeft, typename TileRight, typename TileBias, typename... WaitEvents>
PTO_INST RecordEvent TMATMUL_BIAS(TileRes &cMatrix, TileLeft &aMatrix, TileRight &bMatrix, TileBias &biasData, WaitEvents &... events);

template <AccPhase Phase, typename TileRes, typename TileLeft, typename TileRight, typename TileBias,
          typename... WaitEvents>
PTO_INST RecordEvent TMATMUL_BIAS(TileRes &cMatrix, TileLeft &aMatrix, TileRight &bMatrix, TileBias &biasData, WaitEvents &... events);

Inputs¶

a is the left operand tile (must be TileLeft location).
b is the right operand tile (must be TileRight location).
bias is the bias tile (must be TileType::Bias, single row).
dst names the destination accumulator tile. The operation iterates over dst's valid region.

Expected Outputs¶

dst holds the biased matrix multiply result: dst[i,j] = bias[0,j] + sum over k of a[i,k] * b[k,j].

Side Effects¶

No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.

Constraints¶

Constraints

All constraints from TMATMUL apply to the (cMatrix, aMatrix, bMatrix) triple.

Exceptions¶

Exceptions

Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.

Target-Profile Restrictions¶

Target-Profile Restrictions

Bias constraints (A2A3):
- TileBias::DType must match TileRes::DType.
- TileBias::Loc == TileType::Bias and TileBias::Rows == 1.
Bias constraints (A5):
- TileBias::DType must match TileRes::DType.
- TileBias::Loc == TileType::Bias, TileBias::Rows == 1, and TileBias::isRowMajor.

Examples¶

Auto¶

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto() {
  using A = TileLeft<half, 16, 16>;
  using B = TileRight<half, 16, 16>;
  using Bias = Tile<TileType::Bias, half, 1, 16>;
  using C = TileAcc<float, 16, 16>;
  A a;
  B b;
  Bias bias;
  C c;
  TMATMUL_BIAS(c, a, b, bias);
}

Manual¶

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual() {
  using A = TileLeft<half, 16, 16>;
  using B = TileRight<half, 16, 16>;
  using Bias = Tile<TileType::Bias, half, 1, 16>;
  using C = TileAcc<float, 16, 16>;
  A a;
  B b;
  Bias bias;
  C c;
  TASSIGN(a, 0x1000);
  TASSIGN(b, 0x2000);
  TASSIGN(bias, 0x3000);
  TASSIGN(c, 0x4000);
  TMATMUL_BIAS(c, a, b, bias);
}

Auto Mode¶

# Auto mode: compiler/runtime-managed placement and scheduling.
%c = pto.tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

Manual Mode¶

# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%c = pto.tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

PTO Assembly Form¶

%acc = tmatmul.bias %a, %b, %bias : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tmatmul.bias ins(%a, %b, %bias : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%c : !pto.tile_buf<...>)

Instruction set overview: Matrix And Matrix Vector
Previous op in instruction set: pto.tmatmul_acc
Next op in instruction set: pto.tgemv