pto.tgemv_mx

pto.tgemv_mx is part of the Matrix And Matrix Vector instruction set.

Summary

GEMV in an MX block-scale format, with explicit left and right scale tiles.

Mechanism

GEMV in an MX block-scale format on supported targets.

This instruction set extends TGEMV with additional scale operands (mx path). Accumulator and scale handling are target-dependent. It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.

Conceptually (base GEMV path):

\[ \mathrm{C}_{0,j} = \sum_{k=0}^{K-1} \mathrm{A}_{0,k} \cdot \mathrm{B}_{k,j} \]

For TGEMV_MX, the scale tiles carry the block-scale metadata required by the MX format. The architectural contract is that output corresponds to the target-defined MX GEMV semantics rather than a plain scalar-rescaled GEMV.

Syntax

Textual spelling is defined by the PTO ISA syntax-and-operands pages.

Schematic form:

%acc = tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 1 (SSA)

%acc = pto.tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 2 (DPS)

pto.tgemv.mx ins(%a, %a_scale, %b, %b_scale : (!pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>)) outs(%acc : !pto.tile_buf<...>)

IR Level 1 (SSA)

%acc = pto.tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

IR Level 2 (DPS)

pto.tgemv.mx ins(%a, %a_scale, %b, %b_scale : (!pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>)) outs(%acc : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight, typename TileRightScale,
          typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, WaitEvents &... events);

template <AccPhase Phase, typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight,
          typename TileRightScale, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, WaitEvents &... events);

template <typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight, typename TileRightScale,
          typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cOutMatrix, TileRes &cInMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, WaitEvents &... events);

template <AccPhase Phase, typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight,
          typename TileRightScale, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cOutMatrix, TileRes &cInMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, WaitEvents &... events);

template <typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight, typename TileRightScale,
          typename TileBias, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, TileBias &biasData, WaitEvents &... events);

template <AccPhase Phase, typename TileRes, typename TileLeft, typename TileLeftScale, typename TileRight,
          typename TileRightScale, typename TileBias, typename... WaitEvents>
PTO_INST RecordEvent TGEMV_MX(TileRes &cMatrix, TileLeft &aMatrix, TileLeftScale &aScaleMatrix, TileRight &bMatrix, TileRightScale &bScaleMatrix, TileBias &biasData, WaitEvents &... events);

Additional overloads support accumulation/bias variants and AccPhase selection.

Inputs

  • a is the left operand tile (must be TileLeft location).
  • aScale is the left scale tile for the MX block-scale format.
  • b is the right operand tile (must be TileRight location).
  • bScale is the right scale tile for the MX block-scale format.
  • bias (optional): bias tile (must be TileType::Bias).
  • cIn (optional): input accumulator tile for accumulation variants.
  • dst names the destination accumulator tile. The operation iterates over dst's valid region.

Expected Outputs

dst holds the MX block-scale matrix-vector result.

Side Effects

No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.

Constraints

Constraints

  • Uses backend-specific mx legality checks for data types, tile locations, fractal/layout combinations, and scaling formats.

  • Scale tile compatibility and accumulator promotion are concrete per target. On A2/A3: scale tiles must match the input element type (e.g., float8 scales for float8 inputs) and the accumulator must be float32; no other type combinations are supported. On A5: scale tiles follow MX format block scaling rules where each scale tile encodes per-block scaling factors for both A and B operands, and the accumulator must be float32; fp8 scale tile types follow the selected fp8 variant. On CPU simulator: follows A5 semantics.

  • For portability, validate the exact (A, B, scaleA, scaleB, C) type tuple and tile layout against target implementation constraints.

Exceptions

Exceptions

  • Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
  • Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.

Target-Profile Restrictions

Target-Profile Restrictions
  • pto.tgemv_mx preserves PTO-visible semantics across CPU simulation, A2/A3-class targets, and A5-class targets, but concrete support subsets may differ by profile.

  • Portable code must rely only on the documented type, layout, shape, and mode combinations that the selected target profile guarantees.

Examples

For practical usage patterns, see:

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%acc = pto.tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

Manual Mode

# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%acc = pto.tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

PTO Assembly Form

%acc = pto.tgemv.mx %a, %a_scale, %b, %b_scale : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tgemv.mx ins(%a, %a_scale, %b, %b_scale : (!pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>)) outs(%acc : !pto.tile_buf<...>)