pto.tsort32¶
pto.tsort32 is part of the Irregular And Complex instruction set.
Summary¶
Sort a fixed-size 32-element block and produce an index mapping.
Mechanism¶
Sort each 32-element block of src together with the corresponding indices from idx, and write the sorted value-index pairs into dst. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
For each row, TSORT32 processes src in independent 32-element blocks. Let block b cover columns 32b ... 32b+31, and let n_b = min(32, C - 32b) be the valid element count of that block.
For each valid element in the block, form a pair
Then sort the pairs by value and write the sorted value-index pairs to dst. The exact packing layout in dst is target-defined, but semantically the output of each block is the reordered sequence
where π is the permutation produced by the implementation for that 32-element block.
Notes:
idxis an input tile, not an output tile.dststores sorted value-index pairs, not just sorted values.- The CPU simulation sorts in descending order by value, and for equal values keeps smaller indices first.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tsort32 %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tsort32 %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tsort32 ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename DstTileData, typename SrcTileData, typename IdxTileData>
PTO_INST RecordEvent TSORT32(DstTileData &dst, SrcTileData &src, IdxTileData &idx);
template <typename DstTileData, typename SrcTileData, typename IdxTileData, typename TmpTileData>
PTO_INST RecordEvent TSORT32(DstTileData &dst, SrcTileData &src, IdxTileData &idx, TmpTileData &tmp);
Inputs¶
srcis the source tile containing values to sort.idxis the index tile providing initial indices.tmp(optional): temporary tile for non-32-aligned tails.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds sorted value-index pairs from src according to idx order.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
-
TSORT32does not takeWaitEvents&...and does not callTSYNC(...)internally; synchronize explicitly if needed. -
idxis a required input operand in both overloads; it provides the indices that are permuted together withsrc. -
Valid region:
- The implementation uses
dst.GetValidRow()as the row count. - The implementation uses
src.GetValidCol()to determine how many elements participate in sorting in each row. - Sorting is performed independently per 32-element block; the 4-argument overload additionally supports non-32-aligned tails with
tmp.
- The implementation uses
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Implementation checks (A2A3/A5):
DstTileData::DTypemust behalforfloat.SrcTileData::DTypemust matchDstTileData::DType.IdxTileData::DTypemust beuint32_t.dst/src/idxtile location must beTileType::Vec, and all must be row-major (isRowMajor).
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 1, 32>;
using IdxT = Tile<TileType::Vec, uint32_t, 1, 32>;
using DstT = Tile<TileType::Vec, float, 1, 64>;
SrcT src;
IdxT idx;
DstT dst;
TSORT32(dst, src, idx);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 1, 32>;
using IdxT = Tile<TileType::Vec, uint32_t, 1, 32>;
using DstT = Tile<TileType::Vec, float, 1, 64>;
SrcT src;
IdxT idx;
DstT dst;
TASSIGN(src, 0x1000);
TASSIGN(idx, 0x2000);
TASSIGN(dst, 0x3000);
TSORT32(dst, src, idx);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tsort32 %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
# pto.tassign %arg2, @tile(0x3000)
%dst = pto.tsort32 %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
PTO Assembly Form¶
%dst = tsort32 %src, %idx : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tsort32 ins(%src, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tmrgsort
- Next op in instruction set: pto.tgather