pto.trowargmin¶
pto.trowargmin is part of the Reduce And Expand instruction set.
Summary¶
Get the column index of the minimum element for each row.
Mechanism¶
Get the column index of the minimum element for each row.
Let R = src.GetValidRow() and C = src.GetValidCol(). For 0 <= i < R:
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = trowargmin %src : !pto.tile<...> -> !pto.tile<...>
Lowering may introduce internal scratch tiles; the C++ intrinsic requires an explicit tmp operand.
IR Level 1 (SSA)¶
%dst = pto.trowargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.trowargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataOut, typename TileDataIn, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TROWARGMIN(TileDataOut& dst, TileDataIn& src, TileDataTmp& tmp, WaitEvents&... events);
Inputs¶
srcis the source tile.tmpis a temporary tile used for intermediate storage.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds the column index of the row-wise minimum: for each row i, dst[i,0] = argmin of elements in row i of src.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
General constraints / checks¶
-
dstandsrcmust beTileType::Vec. -
Supported source element types:
half,float. -
Supported destination element types:
uint32_t,int32_t. -
srcmust use standard ND layout: row-major and non-fractal (BLayout::RowMajor,SLayout::NoneBox). -
dstandsrcmust satisfy the shared row-reduce-index check path used byTRowArgMin. -
Temporary tile is not used when
srcValidCol <= ElementPerRepeat, used whensrcValidCol > ElementPerRepeat. -
tmptile's rows is the same assrc. -
Simply set
tmptile size the same assrcwhensrcis small. -
tmptile's stride can be calculated out based onsrc'svalidColusing the following formula:
repeats = ceil(validCol / elementPerRepeat)
stride = ceil(repeats * 2 / elementPerBlock) * elementPerBlock + ceil(repeats / elementPerBlock) * elementPerBlock
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Runtime checks follow the shared row-reduce check path:
src.GetValidRow() != 0src.GetValidCol() != 0src.GetValidRow() == dst.GetValidRow()
A2A3 implementation checks¶
dstis checked through the shared row-reduce-index path and may use either of these non-fractal layouts:- DN layout with one column (
BLayout::ColMajor,Cols == 1), or - ND layout whose valid column count is 1.
A5 implementation checks¶
- In the checked A5 implementation path,
tmpis accepted by the interface but not used byTROWARGMIN_IMPL.
About temporary tile tmp for A3¶
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 16, 1, BLayout::ColMajor>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
TmpT tmp;
TROWARGMIN(dst, src, tmp);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 16, 1, BLayout::ColMajor>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
TmpT tmp;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TASSIGN(tmp, 0x3000);
TROWARGMIN(dst, src, tmp);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.trowargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.trowargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = trowargmin %src : !pto.tile<...> -> !pto.tile<...>
# IR Level 2 (DPS)
pto.trowargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Reduce And Expand
- Previous op in instruction set: pto.trowargmax
- Next op in instruction set: pto.trowexpand