pto.tscatter¶
pto.tscatter is part of the Irregular And Complex instruction set.
Summary¶
Scatter rows of a source tile into a destination tile using per-element row indices.
Mechanism¶
Scatter source elements into a destination tile using per-element flattened destination offsets. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
For each source element (i, j), let k = idx[i,j] and write:
Here dst_flat denotes the destination tile viewed as a single linear storage sequence. TSCATTER does not interpret idx[i,j] as a destination row selector. On the standard row-major tile layout, this is equivalent to writing the k-th flattened destination element.
If multiple elements map to the same destination location, the final value is undefined. On A2/A3 and A5, the last writer wins according to the hardware scheduling order; on the CPU simulator, the last writer wins according to the iteration order.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tscatter %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
IR Level 1 (SSA)¶
%dst = pto.tscatter %src, %idx : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.tscatter ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataD, typename TileDataS, typename TileDataI, typename... WaitEvents>
PTO_INST RecordEvent TSCATTER(TileDataD &dst, TileDataS &src, TileDataI &indexes, WaitEvents &... events);
Inputs¶
srcis the source tile.indexesis an index tile providing flattened destination offsets.dstnames the destination tile. The operation iterates over src's valid region.
Expected Outputs¶
Elements from src are scattered to positions in dst specified by indexes.
Side Effects¶
No architectural side effects beyond producing the destination tile. Concurrent writes to the same location produce undefined results. On A2/A3 and A5, the final value is determined by hardware scheduling order; on the CPU simulator, the final value is determined by iteration order.
Constraints¶
Constraints
- Operand shape, mode, and state tuples MUST match the documented contract of this operation and its instruction set overview.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Implementation checks (A2A3):
TileDataD::Loc,TileDataS::Loc,TileDataI::Locmust beTileType::Vec.TileDataD::DType,TileDataS::DTypemust be one of:int32_t,int16_t,int8_t,half,float32_t,uint32_t,uint16_t,uint8_t,bfloat16_t.TileDataI::DTypemust be one of:int16_t,int32_t,uint16_toruint32_t.indexesvalues are interpreted as flattened destination element offsets in destination tile storage order.- No bounds checks are enforced on
indexesvalues. - Static valid bounds:
TileDataD::ValidRow <= TileDataD::Rows,TileDataD::ValidCol <= TileDataD::Cols,TileDataS::ValidRow <= TileDataS::Rows,TileDataS::ValidCol <= TileDataS::Cols,TileDataI::ValidRow <= TileDataI::Rows,TileDataI::ValidCol <= TileDataI::Cols. TileDataD::DTypeandTileDataS::DTypemust be the same.- When size of
TileDataD::DTypeis 4 bytes, the size ofTileDataI::DTypemust be 4 bytes. - When size of
TileDataD::DTypeis 2 bytes, the size ofTileDataI::DTypemust be 2 bytes. -
When size of
TileDataD::DTypeis 1 bytes, the size ofTileDataI::DTypemust be 2 bytes. -
Implementation checks (A5):
TileDataD::Loc,TileDataS::Loc,TileDataI::Locmust beTileType::Vec.TileDataD::DType,TileDataS::DTypemust be one of:int32_t,int16_t,int8_t,half,float32_t,uint32_t,uint16_t,uint8_t,bfloat16_t.TileDataI::DTypemust be one of:int16_t,int32_t,uint16_toruint32_t.indexesvalues are interpreted as flattened destination element offsets in destination tile storage order.- No bounds checks are enforced on
indexesvalues. - Static valid bounds:
TileDataD::ValidRow <= TileDataD::Rows,TileDataD::ValidCol <= TileDataD::Cols,TileDataS::ValidRow <= TileDataS::Rows,TileDataS::ValidCol <= TileDataS::Cols,TileDataI::ValidRow <= TileDataI::Rows,TileDataI::ValidCol <= TileDataI::Cols. TileDataD::DTypeandTileDataS::DTypemust be the same.- When size of
TileDataD::DTypeis 4 bytes, the size ofTileDataI::DTypemust be 4 bytes. - When size of
TileDataD::DTypeis 2 bytes, the size ofTileDataI::DTypemust be 2 bytes. - When size of
TileDataD::DTypeis 1 bytes, the size ofTileDataI::DTypemust be 2 bytes.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
using IdxT = Tile<TileType::Vec, uint16_t, 16, 16>;
TileT src, dst;
IdxT idx;
TSCATTER(dst, src, idx);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
using IdxT = Tile<TileType::Vec, uint16_t, 16, 16>;
TileT src, dst;
IdxT idx;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TASSIGN(idx, 0x3000);
TSCATTER(dst, src, idx);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tscatter %src, %idx : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tscatter %src, %idx : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = tscatter %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
# IR Level 2 (DPS)
pto.tscatter ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tgatherb