pto.tcolargmax¶
pto.tcolargmax is part of the Reduce And Expand instruction set.
Summary¶
Get the row index of the maximum element for each column. A value+index variant also returns the maximum value for each column.
Mechanism¶
Get the row index of the maximum element for each column. The 4-operand overload returns both the maximum value and the row index for each column.
Let R = src.GetValidRow() and C = src.GetValidCol(). For 0 <= j < C:
For value+index mode:
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tcolargmax %src : !pto.tile<...> -> !pto.tile<...>
%dstVal, %dstIdx = tcolargmax %src : !pto.tile<...> -> !pto.tile<...>, !pto.tile<...>
Lowering may introduce internal scratch tiles; the C++ intrinsic requires an explicit tmp operand.
AS Level 1 (SSA)¶
%dst = pto.tcolargmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
%dstVal, %dstIdx = pto.tcolargmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, !pto.tile<...>)
AS Level 2 (DPS)¶
pto.tcolargmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
pto.tcolargmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataOut, typename TileDataIn, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TCOLARGMAX(TileDataOut& dst, TileDataIn& src, TileDataTmp& tmp, WaitEvents&... events);
template <typename TileDataOutVal, typename TileDataOutIdx, typename TileDataIn, typename TileDataTmp,
typename... WaitEvents>
PTO_INST RecordEvent TCOLARGMAX(TileDataOutVal& dstVal, TileDataOutIdx& dstIdx, TileDataIn& src, TileDataTmp& tmp,
WaitEvents&... events);
Inputs¶
srcis the source tile.tmpis a temporary tile used for intermediate storage.dstnames the destination tile. The operation iterates over dst's valid region.- In value+index mode,
dstValnames the value output tile anddstIdxnames the index output tile.
Expected Outputs¶
dst holds the row index of the column-wise maximum: for each column j, dst[0,j] = argmax of all elements in column j of src.
In value+index mode, dstVal[0,j] holds the maximum value and dstIdx[0,j] holds its row index.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
General constraints / checks¶
-
dstandsrcmust beTileType::Vec. -
Supported source element types:
half,float,int32_t,int16_t. -
Supported destination element types:
uint32_t,int32_t. -
srcmust use standard ND layout: row-major and non-fractal (BLayout::RowMajor,SLayout::NoneBox). -
dstandsrcmust satisfy the shared column-reduce-index check path used byTColArgMax. -
Temporary tile is not used when
srcValidRow <= ElementPerRepeat, used whensrcValidRow > ElementPerRepeat. -
tmptile's columns is the same assrc. -
Simply set
tmptile size the same assrcwhensrcis small. -
tmptile's stride can be calculated out based onsrc'svalidRowusing the following formula:
repeats = ceil(validRow / elementPerRepeat)
stride = ceil(repeats * 2 / elementPerBlock) * elementPerBlock + ceil(repeats / elementPerBlock) * elementPerBlock
Value+index mode¶
dstValmust be aTileType::Vectile with standard ND layout.dstValelement type must matchsrc.- 8-bit source element types are not supported by the value+index overload.
dstVal.GetValidRow() == 1dstVal.GetValidCol() == dstIdx.GetValidCol()dstVal.GetValidCol() == src.GetValidCol()- For 16-bit source element types,
dstIdxmust useuint16_torint16_t. - For 32-bit source element types,
dstIdxmust useuint32_torint32_t.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Runtime checks follow the shared column-reduce check path:
src.GetValidRow() != 0src.GetValidCol() != 0src.GetValidCol() == dst.GetValidCol()
dstis checked through the shared column-reduce-index path and may use either of these non-fractal layouts:- ND layout with one row (
BLayout::RowMajor,Rows == 1), or - DN layout whose valid row count is 1.
- In the checked A5 implementation path,
tmpis accepted by the interface but not used byTCOLARGMAX_IMPL.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 1, 16>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
TmpT tmp;
TCOLARGMAX(dst, src, tmp);
}
Auto Value + Index¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto_value_index() {
using SrcT = Tile<TileType::Vec, float, 16, 256, BLayout::RowMajor, -1, -1>;
using DstValT = Tile<TileType::Vec, float, 1, 256, BLayout::RowMajor, -1, -1>;
using DstIdxT = Tile<TileType::Vec, int32_t, 1, 256, BLayout::RowMajor, -1, -1>;
using TmpT = Tile<TileType::Vec, float, 1, 32, BLayout::RowMajor, -1, -1>;
SrcT src(16, 255);
DstValT dstVal(1, 255);
DstIdxT dstIdx(1, 255);
TmpT tmp(1, 32);
TCOLARGMAX(dstVal, dstIdx, src, tmp);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 1, 16>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
TmpT tmp;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TASSIGN(tmp, 0x3000);
TCOLARGMAX(dst, src, tmp);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tcolargmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tcolargmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = tcolargmax %src : !pto.tile<...> -> !pto.tile<...>
%dstVal, %dstIdx = tcolargmax %src : !pto.tile<...> -> !pto.tile<...>, !pto.tile<...>
# AS Level 2 (DPS)
pto.tcolargmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
pto.tcolargmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Reduce And Expand
- Previous op in instruction set: pto.tcolmin
- Next op in instruction set: pto.trowmax