pto.tmrgsort¶
pto.tmrgsort is part of the Irregular And Complex instruction set.
Summary¶
Merge sort for multiple sorted lists. On A2/A3 and A5, elements are stored in row-major order with a single row (Rows == 1); on the CPU simulator, elements are stored in the natural tile row-major order.
Mechanism¶
Merge sort for multiple sorted lists. On A2/A3 and A5, elements are stored in row-major order with a single row (Rows == 1); on the CPU simulator, elements are stored in the natural tile row-major order. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
Merges sorted input lists into dst. On A2/A3 and A5, the merge produces a single sorted list in row-major order; the CPU simulator follows the natural tile row-major ordering.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form (conceptual):
%dst, %executed = tmrgsort %src0, %src1 {exhausted = false}
: !pto.tile<...>, !pto.tile<...> -> (!pto.tile<...>, vector<4xi16>)
AS Level 1 (SSA)¶
%dst = pto.tmrgsort %src, %blockLen : (!pto.tile<...>, dtype) -> !pto.tile<...>
%dst, %executed = pto.tmrgsort %src0, %src1, %src2, %src3 {exhausted = false}
: (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, vector<4xi16>)
AS Level 2 (DPS)¶
pto.tmrgsort ins(%src, %blockLen : !pto.tile_buf<...>, dtype) outs(%dst : !pto.tile_buf<...>)
pto.tmrgsort ins(%src0, %src1, %src2, %src3 {exhausted = false} : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>)
outs(%dst, %executed : !pto.tile_buf<...>, vector<4xi16>)
IR Level 1 (SSA)¶
%dst = pto.tmrgsort %src, %blockLen : (!pto.tile<...>, dtype) -> !pto.tile<...>
%dst, %executed = pto.tmrgsort %src0, %src1, %src2, %src3 {exhausted = false}
: (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, vector<4xi16>)
IR Level 2 (DPS)¶
pto.tmrgsort ins(%src, %blockLen : !pto.tile_buf<...>, dtype) outs(%dst : !pto.tile_buf<...>)
pto.tmrgsort ins(%src0, %src1, %src2, %src3 {exhausted = false} : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>)
outs(%dst, %executed : !pto.tile_buf<...>, vector<4xi16>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename DstTileData, typename TmpTileData, typename Src0TileData, typename Src1TileData,
typename Src2TileData, typename Src3TileData, bool exhausted, typename... WaitEvents>
PTO_INST RecordEvent TMRGSORT(DstTileData &dst, MrgSortExecutedNumList &executedNumList, TmpTileData &tmp, Src0TileData &src0, Src1TileData &src1, Src2TileData &src2, Src3TileData &src3, WaitEvents &... events);
template <typename DstTileData, typename TmpTileData, typename Src0TileData, typename Src1TileData,
typename Src2TileData, bool exhausted, typename... WaitEvents>
PTO_INST RecordEvent TMRGSORT(DstTileData &dst, MrgSortExecutedNumList &executedNumList, TmpTileData &tmp, Src0TileData &src0, Src1TileData &src1, Src2TileData &src2, WaitEvents &... events);
template <typename DstTileData, typename TmpTileData, typename Src0TileData, typename Src1TileData, bool exhausted,
typename... WaitEvents>
PTO_INST RecordEvent TMRGSORT(DstTileData &dst, MrgSortExecutedNumList &executedNumList, TmpTileData &tmp, Src0TileData &src0, Src1TileData &src1, WaitEvents &... events);
template <typename DstTileData, typename SrcTileData, typename... WaitEvents>
PTO_INST RecordEvent TMRGSORT(DstTileData &dst, SrcTileData &src, uint32_t blockLen, WaitEvents &... events);
Inputs¶
src0...src3are source tiles (sorted lists to merge).tmpis a temporary tile used during merge.executedNumListoutputs the number of consumed elements from each source.blockLen(single-list variant): length of each sorted block.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds the merged sorted output. executedNumList reports consumed counts.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
-
Single-list variant (
TMRGSORT(dst, src, blockLen)):blockLenmust be a multiple of 64 (as checked by the implementation).src.GetValidCol()must be an integer multiple ofblockLen * 4.repeatTimes = src.GetValidCol() / (blockLen * 4)must be in[1, 255].
-
Multi-list variants:
tmpis required andexecutedNumListis written by the implementation; supported list counts and exact semantics are target-defined.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Implementation checks (A2A3/A5):
- Element type must be
halforfloatand must match acrossdst/tmp/src*tiles. - All tiles must be
TileType::Vec, row-major, and haveRows == 1(list stored in a single row). - UB memory usage is checked (compile-time and runtime) against target limits (single
Colsacross inputs plustmp/dst).
- Element type must be
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 1, 256>;
using DstT = Tile<TileType::Vec, float, 1, 256>;
SrcT src;
DstT dst;
TMRGSORT(dst, src, /*blockLen=*/64);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 1, 256>;
using DstT = Tile<TileType::Vec, float, 1, 256>;
SrcT src;
DstT dst;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TMRGSORT(dst, src, /*blockLen=*/64);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tmrgsort %src, %blockLen : (!pto.tile<...>, dtype) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tmrgsort %src, %blockLen : (!pto.tile<...>, dtype) -> !pto.tile<...>
PTO Assembly Form¶
%dst = pto.tmrgsort %src, %blockLen : (!pto.tile<...>, dtype) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tmrgsort ins(%src, %blockLen : !pto.tile_buf<...>, dtype) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tprint
- Next op in instruction set: pto.tsort32