pto.tcolargmin¶
pto.tcolargmin is part of the Reduce And Expand instruction set.
Summary¶
Get the row index of the minimum element for each column. A value+index variant also returns the minimum value for each column.
Mechanism¶
Get the row index of the minimum element for each column. The 4-operand overload returns both the minimum value and the row index for each column.
Let R = src.GetValidRow() and C = src.GetValidCol(). For 0 <= j < C:
For value+index mode:
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tcolargmin %src : !pto.tile<...> -> !pto.tile<...>
%dstVal, %dstIdx = tcolargmin %src : !pto.tile<...> -> !pto.tile<...>, !pto.tile<...>
Lowering may introduce internal scratch tiles; the C++ intrinsic requires an explicit tmp operand.
Assembly¶
%dst = tcolargmin %src : !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tcolargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%dstVal, %dstIdx = pto.tcolargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, !pto.tile<...>)
AS Level 2 (DPS)¶
pto.tcolargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
pto.tcolargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataOut, typename TileDataIn, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TCOLARGMIN(TileDataOut& dst, TileDataIn& src, TileDataTmp& tmp, WaitEvents&... events);
template <typename TileDataOutVal, typename TileDataOutIdx, typename TileDataIn, typename TileDataTmp,
typename... WaitEvents>
PTO_INST RecordEvent TCOLARGMIN(TileDataOutVal& dstVal, TileDataOutIdx& dstIdx, TileDataIn& src, TileDataTmp& tmp,
WaitEvents&... events);
Inputs¶
srcis the source tile.tmpis a temporary tile used for intermediate storage.dstnames the destination tile. The operation writes the column-wise argmin todst[0, j]for each columnj.- In value+index mode,
dstValnames the value output tile anddstIdxnames the index output tile.
Expected Outputs¶
dst holds the row index of the column-wise minimum: for each column j, dst[0,j] = argmin of all elements in column j of src. The output tile has shape (1, C) where C is the number of columns in src.
In value+index mode, dstVal[0,j] holds the minimum value and dstIdx[0,j] holds its row index.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
General constraints / checks¶
dstandsrcmust beTileType::Vec.srcmay use ND or DN non-fractal layout because the checked helper only requiresSLayout::NoneBox.dstmust use standard ND layout: row-major and non-fractal (BLayout::RowMajor,SLayout::NoneBox).- Supported destination element types:
uint32_t,int32_t. - Compile-time check:
TileDataIn::ValidCol == 1 || TileDataIn::ValidCol == -1. - Runtime checks:
src.GetValidRow() != 0src.GetValidCol() != 0dst.GetValidRow() == 1src.GetValidCol() == dst.GetValidCol()
A2A3 implementation checks¶
- Supported source element types:
half,float,uint16_t,uint32_t. tmpmust use the same element type assrc.- In the checked A2A3 implementation path,
tmpis used as scratch storage for index tracking and current comparison values.
A5 implementation checks¶
- Supported source element sizes are 8-bit, 16-bit, or 32-bit; the checked implementation therefore covers
int8_t,uint8_t,int16_t,uint16_t,int32_t,uint32_t,half,float. - In the checked A5 implementation path,
tmpis accepted by the interface but not used byTCOLARGMIN_IMPL.
About temporary tile tmp for A2A3¶
tmpis always used in the A2A3 implementation as scratch space for intermediate results (current index, argmin index, and current min elements).tmptile's data type must be the same assrc's data type.tmptile is organized into three regions within a single row:- Region 0 (
[0, tmpGapEles)): current row index counter (incremented per row). - Region 1 (
[tmpGapEles, 2 * tmpGapEles)): current minimum elements for comparison. - Region 2 (
[2 * tmpGapEles, 3 * tmpGapEles)): argmin index result (before final conversion todst). tmpGapElesis determined as follows:- When
srcValidCol >= elemPerRpt:tmpGapEles = elemPerRpt. - When
srcValidCol < elemPerRpt:tmpGapEles = ceil(srcValidCol / elemPerBlock) * elemPerBlock. - Simply set
tmptile size the same assrcwhensrcis small, or calculate the required stride based onsrc'svalidColusing the following formula:
repeats = ceil(validCol / elementPerRepeat)
stride = ceil(repeats * 2 / elementPerBlock) * elementPerBlock + ceil(repeats / elementPerBlock) * elementPerBlock
About temporary tile tmp for A5¶
tmptemporary tile is not used in the A5 implementation. The A5 uses vector register-based computation (__VEC_SCOPE__) and does not require scratch tile storage.tmpis retained in the C++ intrinsic signature solely for API compatibility with A2A3.
Value+index mode¶
dstValmust be aTileType::Vectile with standard ND layout.dstValelement type must matchsrc.- 8-bit source element types are not supported by the value+index overload.
dstVal.GetValidRow() == 1dstVal.GetValidCol() == dstIdx.GetValidCol()dstVal.GetValidCol() == src.GetValidCol()- For 16-bit source element types,
dstIdxmust useuint16_torint16_t. - For 32-bit source element types,
dstIdxmust useuint32_torint32_t.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Supported source element types:
half,float,int16_t,int32_t.
- If
src.GetValidRow() == 0orsrc.GetValidCol() == 0, the implementation returns early.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 16, 256, BLayout::RowMajor, -1, -1>;
using DstT = Tile<TileType::Vec, uint32_t, 1, 256, BLayout::RowMajor, -1, -1>;
using TmpT = Tile<TileType::Vec, float, 1, 32, BLayout::RowMajor, -1, -1>;
SrcT src(16, 255);
DstT dst(1, 255);
TmpT tmp(1, 32);
TCOLARGMIN(dst, src, tmp);
}
Auto Value + Index¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto_value_index() {
using SrcT = Tile<TileType::Vec, float, 16, 256, BLayout::RowMajor, -1, -1>;
using DstValT = Tile<TileType::Vec, float, 1, 256, BLayout::RowMajor, -1, -1>;
using DstIdxT = Tile<TileType::Vec, int32_t, 1, 256, BLayout::RowMajor, -1, -1>;
using TmpT = Tile<TileType::Vec, float, 1, 32, BLayout::RowMajor, -1, -1>;
SrcT src(16, 255);
DstValT dstVal(1, 255);
DstIdxT dstIdx(1, 255);
TmpT tmp(1, 32);
TCOLARGMIN(dstVal, dstIdx, src, tmp);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 16, 256, BLayout::RowMajor, -1, -1>;
using DstT = Tile<TileType::Vec, uint32_t, 1, 256, BLayout::RowMajor, -1, -1>;
using TmpT = Tile<TileType::Vec, float, 1, 32, BLayout::RowMajor, -1, -1>;
SrcT src(16, 255);
DstT dst(1, 255);
TmpT tmp(1, 32);
TASSIGN(src, 0x0);
TASSIGN(dst, 0x1000);
TASSIGN(tmp, 0x2000);
TCOLARGMIN(dst, src, tmp);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tcolargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tcolargmin %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = tcolargmin %src : !pto.tile<...> -> !pto.tile<...>
%dstVal, %dstIdx = tcolargmin %src : !pto.tile<...> -> !pto.tile<...>, !pto.tile<...>
# AS Level 2 (DPS)
pto.tcolargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
pto.tcolargmin ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Reduce And Expand
- Previous op in instruction set: pto.tcolmax
- Next op in instruction set: pto.tcolexpand