pto.tgatherb¶
pto.tgatherb is part of the Irregular And Complex instruction set.
Summary¶
Gather elements using byte offsets.
Mechanism¶
Gather elements using byte offsets. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
For each element in the valid region:
On A2/A3 and A5, out-of-range offsets produce undefined results; on the CPU simulator, out-of-range offsets are clamped to the source tile boundary.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tgatherb %src, %offsets : !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tgatherb %src, %offsets : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tgatherb ins(%src, %offsets : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
IR Level 1 (SSA)¶
%dst = pto.tgatherb %src, %offsets : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.tgatherb ins(%src, %offsets : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataDst, typename TileDataSrc, typename TileDataOffset, typename... WaitEvents>
PTO_INST RecordEvent TGATHERB(TileDataDst &dst, TileDataSrc &src, TileDataOffset &offset, WaitEvents &... events);
Inputs¶
srcis the source tile.offsetis an offset tile providing byte offsets for each destination element.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds elements gathered from src using byte offsets from offset.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- Offset interpretation:
- Offsets are interpreted as
uint32_tvalues (byte offsets) by the implementation. - Offset bounds are not validated by explicit runtime assertions; on A2/A3 and A5, out-of-range offsets produce undefined results; on the CPU simulator, out-of-range offsets are clamped to the source tile boundary.
- Offsets are interpreted as
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Implementation checks (A2A3):
- Destination layout must be row-major (
TileDataDst::isRowMajor). - Destination element size must be
1,2, or4bytes (enforced viastatic_assertin the helper). SrcTileData::DType/DstTileData::DTypemust beint8_toruint8_torint16_toruint16_torint32_toruint32_torhalforbfloat16_torfloat.
- Destination layout must be row-major (
-
Implementation checks (A5):
- Destination element size must be
1,2, or4bytes. SrcTileData::DType/DstTileData::DTypemust beint8_toruint8_torint16_toruint16_torint32_toruint32_torhalforbfloat16_torfloat.
- Destination element size must be
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, uint8_t, 1, 256>;
using OffT = Tile<TileType::Vec, uint32_t, 1, 256>;
using DstT = Tile<TileType::Vec, uint8_t, 1, 256>;
SrcT src;
OffT off;
DstT dst;
TGATHERB(dst, src, off);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, uint8_t, 1, 256>;
using OffT = Tile<TileType::Vec, uint32_t, 1, 256>;
using DstT = Tile<TileType::Vec, uint8_t, 1, 256>;
SrcT src;
OffT off;
DstT dst;
TASSIGN(src, 0x1000);
TASSIGN(off, 0x2000);
TASSIGN(dst, 0x3000);
TGATHERB(dst, src, off);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tgatherb %src, %offsets : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tgatherb %src, %offsets : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = tgatherb %src, %offsets : !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tgatherb ins(%src, %offsets : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tpartmin
- Next op in instruction set: pto.tscatter