pto.mscatter¶
pto.mscatter is part of the Memory And Data Movement instruction set.
Summary¶
Scatter-store elements from a tile into global memory using per-element indices.
Mechanism¶
Scatter-store elements from a tile into global memory using per-element indices. It is part of the tile memory/data-movement instruction set, so the visible behavior includes explicit transfer between GM-visible data and tile-visible state.
For each element (i, j) in the source valid region:
If multiple elements map to the same destination location, the final value is undefined on A2/A3 and A5 (MSCATTER aliases are illegal and must not occur in correct programs); on the CPU simulator, the last writer in row-major iteration order wins.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
mscatter %src, %mem, %idx : !pto.memref<...>, !pto.tile<...>, !pto.tile<...>
AS Level 1 (SSA)¶
pto.mscatter %src, %idx, %mem : (!pto.tile<...>, !pto.tile<...>, !pto.partition_tensor_view<MxNxdtype>) -> ()
AS Level 2 (DPS)¶
pto.mscatter ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%mem : !pto.partition_tensor_view<MxNxdtype>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename GlobalData, typename TileSrc, typename TileInd, typename... WaitEvents>
PTO_INST RecordEvent MSCATTER(GlobalData &dst, TileSrc &src, TileInd &indexes, WaitEvents &... events);
Inputs¶
srcis the source tile.indexesis an index tile providing per-element indices intodst.dstis the destination GlobalTensor.
Expected Outputs¶
Elements from src are scattered to positions in dst specified by indexes.
Side Effects¶
This operation writes to global memory. Concurrent writes to the same location produce undefined behavior on A2/A3 and A5 (MSCATTER concurrent aliases are illegal and must not occur in correct programs); on the CPU simulator, the last writer wins.
Constraints¶
Constraints
-
Supported data types:
src/dstelement type must be one of:int8_t,uint8_t,int16_t,uint16_t,int32_t,uint32_t,half,bfloat16_t,float.- On AICore targets,
float8_e4m3_tandfloat8_e5m2_tare also supported. indexeselement type must beint32_toruint32_t.
-
Tile and memory types:
srcmust be a vector tile (TileType::Vec).indexesmust be a vector tile (TileType::Vec).srcandindexesmust use row-major layout.dstmust be aGlobalTensorin GM memory.dstmust useNDlayout.
-
Atomic operation constraints:
- Non-atomic scatter is supported for all supported element types.
Addatomic mode requiresint32_t,uint32_t,float, orhalf.Max/Minatomic mode requiresint32_torfloat.
-
Shape constraints:
src.Rows == indexes.Rows.indexesmust be shaped as[N, 1]for row-indexed scatter or[N, M]for element-indexed scatter.srcrow width must be 32-byte aligned, that is,src.Cols * sizeof(DType)must be a multiple of 32.dststatic shape must satisfyShape<1, 1, 1, TableRows, RowWidth>.
-
Index interpretation:
- Index interpretation is target-defined. The CPU simulator treats indices as linear element indices into
dst.data(). - The CPU simulator does not enforce bounds checks on
indexes.
- Index interpretation is target-defined. The CPU simulator treats indices as linear element indices into
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
pto.mscatterpreserves PTO-visible semantics across CPU simulation, A2/A3-class targets, and A5-class targets, but concrete support subsets may differ by profile. -
Portable code must rely only on the documented type, layout, shape, and mode combinations that the selected target profile guarantees.
Examples¶
See related examples in docs/isa/ and docs/coding/tutorials/.
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
pto.mscatter %src, %idx, %mem : (!pto.tile<...>, !pto.tile<...>, !pto.partition_tensor_view<MxNxdtype>) -> ()
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
pto.mscatter %src, %idx, %mem : (!pto.tile<...>, !pto.tile<...>, !pto.partition_tensor_view<MxNxdtype>) -> ()
PTO Assembly Form¶
mscatter %src, %mem, %idx : !pto.memref<...>, !pto.tile<...>, !pto.tile<...>
# AS Level 2 (DPS)
pto.mscatter ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%mem : !pto.partition_tensor_view<MxNxdtype>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Memory And Data Movement
- Previous op in instruction set: pto.mgather