pto.tgather¶
pto.tgather is part of the Irregular And Complex instruction set.
Summary¶
Gather/select elements using either an index tile or a compile-time mask pattern.
Mechanism¶
Gather/select elements using either an index tile or a compile-time mask pattern. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
Index-based gather (conceptual):
Let R = dst.GetValidRow() and C = dst.GetValidCol(). For 0 <= i < R and 0 <= j < C:
Exact index interpretation and bounds behavior are as follows: On A2/A3 and A5, out-of-range indices produce undefined results (no explicit masking); on the CPU simulator, out-of-range indices wrap modulo the source extent.
Mask-pattern gather is a selection controlled by pto::MaskPattern. On A2/A3 and A5, the mask selects elements from the source in a pattern-defined order; on the CPU simulator, the same mask semantics apply.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Index-based gather:
%dst = tgather %src0, %indices : !pto.tile<...> -> !pto.tile<...>
Mask-pattern gather:
%dst = tgather %src {maskPattern = #pto.mask_pattern<P0101>} : !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tgather %src, %indices : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%dst = pto.tgather %src {maskPattern = #pto.mask_pattern<P0101>}: !pto.tile<...> -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tgather ins(%src, %indices : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
pto.tgather ins(%src, {maskPattern = #pto.mask_pattern<P0101>} : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataD, typename TileDataS0, typename TileDataS1, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TGATHER(TileDataD &dst, TileDataS0 &src0, TileDataS1 &src1, TileDataTmp &tmp, WaitEvents &... events);
template <typename DstTileData, typename SrcTileData, MaskPattern maskPattern, typename... WaitEvents>
PTO_INST RecordEvent TGATHER(DstTileData &dst, SrcTileData &src, WaitEvents &... events);
Inputs¶
src0is the source tile.indices(index-based gather): index tile providing gather indices.tmp(optional): temporary tile for index-based gather.maskPattern(mask-pattern gather): compile-time mask pattern.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds gathered elements from src0 at positions specified by indices or maskPattern.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- Bounds / validity:
- Index bounds are not validated by explicit runtime assertions; on A2/A3 and A5, out-of-range indices produce undefined results; on the CPU simulator, out-of-range indices are clamped to the valid range.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Index-based gather: implementation checks (A2A3):
sizeof(DstTileData::DType)must be must beint16_t,uint16_t,int32_t,uint32_t,half,float.sizeof(Src1TileData::DType)must be must beint32_t,uint32_t.DstTileData::DTypemust be the same type asSrc0TileData::DType.src1.GetValidCol() == Src1TileData::Colsanddst.GetValidCol() == DstTileData::Cols.
-
Index-based gather: implementation checks (A5):
sizeof(DstTileData::DType)must be must beint16_t,uint16_t,int32_t,uint32_t,half,float.sizeof(Src1TileData::DType)must be must beint16_t,uint16_t,int32_t,uint32_t.DstTileData::DTypemust be the same type asSrc0TileData::DType.src1.GetValidCol() == Src1TileData::Colsanddst.GetValidCol() == DstTileData::Cols.
-
Mask-pattern gather: implementation checks (A2A3):
- Source element size must be
2or4bytes. SrcTileData::DType/DstTileData::DTypemust beint16_toruint16_torint32_toruint32_torhalforbfloat16_torfloat.dstandsrcmust both beTileType::Vecand row-major.sizeof(dst element) == sizeof(src element)anddst.GetValidCol() == DstTileData::Cols(continuous dst storage).
- Source element size must be
-
Mask-pattern gather: implementation checks (A5):
- Source element size must be
1or2or4bytes. dstandsrcmust both beTileType::Vecand row-major.SrcTileData::DType/DstTileData::DTypemust beint8_toruint8_torint16_toruint16_torint32_toruint32_torhalforbfloat16_torfloatorfloat8_e4m3_torfloat8_e5m2_torhifloat8_t.- Supported dtypes are restricted to a target-defined set (checked via
static_assertin the implementation), andsizeof(dst element) == sizeof(src element),dst.GetValidCol() == DstTileData::Cols(continuous dst storage).
- Source element size must be
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using IdxT = Tile<TileType::Vec, int32_t, 16, 16>;
using DstT = Tile<TileType::Vec, float, 16, 16>;
SrcT src0;
IdxT idx;
DstT dst;
TGATHER(dst, src0, idx);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, float, 1, 16>;
SrcT src;
DstT dst;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TGATHER<DstT, SrcT, MaskPattern::P0101>(dst, src);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tgather %src, %indices : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tgather %src, %indices : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = pto.tgather %src, %indices : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tgather ins(%src, %indices : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tsort32
- Next op in instruction set: pto.tci