pto.tfmods¶
pto.tfmods is part of the Tile Scalar And Immediate instruction set.
Summary¶
Elementwise remainder with a scalar: fmod(src, scalar).
Mechanism¶
Elementwise floor with a scalar: fmod(src, scalar). It operates on tile payloads rather than scalar control state, and its legality is constrained by tile shape, layout, valid-region, and target-profile support.
For each element (i, j) in the valid region:
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tfmods %src, %scalar : !pto.tile<...>, f32
AS Level 1 (SSA)¶
%dst = pto.tfmods %src, %scalar : !pto.tile<...>, f32
AS Level 2 (DPS)¶
pto.tfmods ins(%src, %scalar : !pto.tile_buf<...>, f32) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <auto PrecisionType = FmodSAlgorithm::DEFAULT, typename TileDataDst, typename TileDataSrc,
typename... WaitEvents>
PTO_INST RecordEvent TFMODS(TileDataDst &dst, TileDataSrc &src, typename TileDataSrc::DType scalar,
WaitEvents &... events);
PrecisionType selects the scalar remainder algorithm:
FmodSAlgorithm::DEFAULT: normal algorithm, faster with lower precision.FmodSAlgorithm::HIGH_PRECISION: high-precision algorithm, slower and supported only forfloat.
Inputs¶
srcis the source tile.scalaris the scalar value broadcast to all lanes.dstnames the destination tile.- The operation iterates over
dst's valid region.
Expected Outputs¶
dst carries the result tile or updated tile payload produced by the operation.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
-
Division-by-zero:
- Behavior is target-defined; the CPU simulator asserts in debug builds.
-
Valid region:
- The op uses
dst.GetValidRow()/dst.GetValidCol()as the iteration domain.
- The op uses
-
High-precision algorithm:
- Only available on A5; A2A3 ignores the
PrecisionTypeoption.
- Only available on A5; A2A3 ignores the
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Implementation checks (A2A3):
dstandsrcmust use the same element type.- Supported element types are
floatandfloat32_t. dstandsrcmust be vector tiles.dstandsrcmust be row-major.- Runtime:
dst.GetValidRow() == src.GetValidRow() > 0anddst.GetValidCol() == src.GetValidCol() > 0.
-
Implementation checks (A5):
dstandsrcmust use the same element type.- Supported element types are 2-byte or 4-byte types supported by the target implementation (including
halfandfloat). dstandsrcmust be vector tiles.- Static valid bounds must satisfy
ValidRow <= RowsandValidCol <= Colsfor both tiles. - Runtime:
dst.GetValidRow() == src.GetValidRow()anddst.GetValidCol() == src.GetValidCol().
Examples¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT x, out;
TFMODS(out, x, 3.0f);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tfmods %src, %scalar : !pto.tile<...>, f32
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tfmods %src, %scalar : !pto.tile<...>, f32
PTO Assembly Form¶
%dst = tfmods %src, %scalar : !pto.tile<...>, f32
# AS Level 2 (DPS)
pto.tfmods ins(%src, %scalar : !pto.tile_buf<...>, f32) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Tile Scalar And Immediate
- Previous op in instruction set: pto.tmuls
- Next op in instruction set: pto.trems