TROWEXPANDMAX

指令示意图

TROWEXPANDMAX tile operation

简介

行广播最大值:与每行标量向量取最大值。

数学语义

Let R = dst.GetValidRow() and C = dst.GetValidCol(). Let s_i be the per-row scalar taken from src1 (one value per row).

For 0 <= i < R and 0 <= j < C:

\[ \mathrm{dst}_{i,j} = \max(\mathrm{src0}_{i,j}, s_i) \]

汇编语法

PTO-AS 形式:参见 PTO-AS Specification.

同步形式:

%dst = trowexpandmax %src0, %src1 : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>

AS Level 1 (SSA)

%dst = pto.trowexpandmax %src0, %src1 : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>

AS Level 2 (DPS)

pto.trowexpandmax ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

AS Level 1(SSA)

%dst = pto.trowexpandmax %src0, %src1 : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>

AS Level 2(DPS)

pto.trowexpandmax ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

C++ 内建接口

声明于 include/pto/common/pto_instr.hpp:

template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1, typename... WaitEvents>
PTO_INST RecordEvent TROWEXPANDMAX(TileDataDst &dst, TileDataSrc0 &src0, TileDataSrc1 &src1, WaitEvents &... events);

template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1, typename TileDataTmp,
          typename... WaitEvents>
PTO_INST RecordEvent TROWEXPANDMAX(TileDataDst &dst, TileDataSrc0 &src0, TileDataSrc1 &src1, TileDataTmp &tmp, WaitEvents &... events);

约束

  • TileDataDst::DType == TileDataSrc0::DType == TileDataSrc1::DType
  • TileDataDst::DType, TileDataSrc0::DType, TileDataSrc1::DType must be one of: half, float.
  • Tile 形状/布局约束 (compile-time): TileDataDst::isRowMajor.
  • Mode 1: src1 is expected to provide one scalar per row (i.e., its valid shape must cover R values).
  • Mode 2: src1 is expected to provide 32 bytes data per row.
  • Exact layout/fractal constraints are target-specific; see backend headers under include/pto/npu/*/TRowExpand*.hpp.

示例

See related examples in docs/isa/ and docs/coding/tutorials/.