pto.tdivs¶
pto.tdivs is part of the Tile Scalar And Immediate instruction set.
Summary¶
Elementwise division with a scalar (tile/scalar or scalar/tile).
Mechanism¶
Elementwise division with a scalar (tile/scalar or scalar/tile). It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.
For each element (i, j) in the valid region:
- Tile/scalar:
$$ \mathrm{dst}{i,j} = \frac{\mathrm{src}{i,j}}{\mathrm{scalar}} $$
- Scalar/tile:
$$ \mathrm{dst}{i,j} = \frac{\mathrm{scalar}}{\mathrm{src}{i,j}} $$
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Tile/scalar form:
%dst = tdivs %src, %scalar : !pto.tile<...>, f32
Scalar/tile form:
%dst = tdivs %scalar, %src : f32, !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tdivs %src, %scalar : (!pto.tile<...>, dtype) -> !pto.tile<...>
%dst = pto.tdivs %scalar, %src : (dtype, !pto.tile<...>) -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tdivs ins(%src, %scalar : !pto.tile_buf<...>, dtype) outs(%dst : !pto.tile_buf<...>)
pto.tdivs ins(%scalar, %src : dtype, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <auto PrecisionType = DivAlgorithm::DEFAULT, typename TileDataDst, typename TileDataSrc,
typename... WaitEvents>
PTO_INST RecordEvent TDIVS(TileDataDst &dst, TileDataSrc &src0, typename TileDataSrc::DType scalar,
WaitEvents &... events);
template <auto PrecisionType = DivAlgorithm::DEFAULT, typename TileDataDst, typename TileDataSrc,
typename... WaitEvents>
PTO_INST RecordEvent TDIVS(TileDataDst &dst, typename TileDataDst::DType scalar, TileDataSrc &src0,
WaitEvents &... events)
PrecisionType has the following values available:
DivAlgorithm::DEFAULT: Normal algorithm, faster but with lower precision.DivAlgorithm::HIGH_PRECISION: High precision algorithm, but slower.
Inputs¶
srcis the source tile.scalaris the scalar value broadcast to all lanes.dstnames the destination tile.- The operation iterates over
dst's valid region.
Expected Outputs¶
dst carries the result tile or updated tile payload produced by the operation.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- Valid region:
- The op uses
dst.GetValidRow()/dst.GetValidCol()as the iteration domain.
- The op uses
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Implementation checks (A2A3) (both overloads):
TileData::DTypemust be one of:int32_t,int,int16_t,half,float16_t,float,float32_t.- Tile location must be vector (
TileData::Loc == TileType::Vec). - Static valid bounds:
TileData::ValidRow <= TileData::RowsandTileData::ValidCol <= TileData::Cols. - Runtime:
src0.GetValidRow() == dst.GetValidRow()andsrc0.GetValidCol() == dst.GetValidCol(). - Tile layout must be row-major (
TileData::isRowMajor).
-
Implementation checks (A5) (both overloads):
TileData::DTypemust be one of:uint8_t,int8_t,uint16_t,int16_t,uint32_t,int32_t,half,float.- Tile location must be vector (
TileData::Loc == TileType::Vec). - Static valid bounds:
TileData::ValidRow <= TileData::RowsandTileData::ValidCol <= TileData::Cols. - Runtime:
src0.GetValidRow() == dst.GetValidRow()andsrc0.GetValidCol() == dst.GetValidCol(). - Tile layout must be row-major (
TileData::isRowMajor).
-
Division-by-zero:
- Behavior is target-defined; on A5 the tile/scalar form maps to multiply-by-reciprocal and uses
1/0 -> +infforscalar == 0.
- Behavior is target-defined; on A5 the tile/scalar form maps to multiply-by-reciprocal and uses
-
High Precision Algorithm
- Only available on A5,
PrecisionTypeoption is ignored on A3.
- Only available on A5,
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src, dst;
TDIVS(dst, src, 2.0f);
TDIVS<DivAlgorithm::HIGH_PRECISION>(dst, src, 2.0f);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src, dst;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TDIVS(dst, 2.0f, src);
TDIVS<DivAlgorithm::HIGH_PRECISION>(dst, 2.0f, src);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tdivs %src, %scalar : (!pto.tile<...>, dtype) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tdivs %src, %scalar : (!pto.tile<...>, dtype) -> !pto.tile<...>
PTO Assembly Form¶
%dst = pto.tdivs %src, %scalar : (!pto.tile<...>, dtype) -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tdivs ins(%src, %scalar : !pto.tile_buf<...>, dtype) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Tile Scalar And Immediate
- Previous op in instruction set: pto.tsubs
- Next op in instruction set: pto.tmuls