pto.tmul¶
pto.tmul is part of the Elementwise Tile Tile instruction set.
Summary¶
Elementwise multiply of two tiles.
Mechanism¶
Elementwise multiply of two tiles.
For each element (i, j) in the valid region:
\[ \mathrm{dst}_{i,j} = \mathrm{src0}_{i,j} \cdot \mathrm{src1}_{i,j} \]
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tmul %src0, %src1 : !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tmul %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tmul ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1, typename... WaitEvents>
PTO_INST RecordEvent TMUL(TileDataDst &dst, TileDataSrc0 &src0, TileDataSrc1 &src1, WaitEvents &... events);
Inputs¶
| Operand | Role | Description |
|---|---|---|
%src0 |
Left tile | First source tile; read at (i, j) for each (i, j) in dst valid region |
%src1 |
Right tile | Second source tile; read at (i, j) for each (i, j) in dst valid region |
WaitEvents... |
Optional synchronisation | RecordEvent tokens to wait on before issuing the operation |
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
%dst |
!pto.tile<...> |
Destination tile; all (i, j) in its valid region contain src0[i,j] * src1[i,j] after the operation |
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- Valid region:
- The op uses
dst.GetValidRow()/dst.GetValidCol()as the iteration domain.
- The op uses
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
-
Implementation checks (A2A3):
TileData::DTypemust be one of:int32_t,int16_t,half,float.- Tile location must be vector (
TileData::Loc == TileType::Vec). - Static valid bounds:
TileData::ValidRow <= TileData::RowsandTileData::ValidCol <= TileData::Cols. - Tile layout must be row-major (
TileData::isRowMajor). - Runtime:
src0,src1anddsttiles should have the samevalidRow/validCol.
-
Implementation checks (A5):
TileData::DTypemust be one of:int32_t,uint32_t,float,int16_t,uint16_t,half.- Tile location must be vector (
TileData::Loc == TileType::Vec). - Static valid bounds:
TileData::ValidRow <= TileData::RowsandTileData::ValidCol <= TileData::Cols. - Tile layout must be row-major (
TileData::isRowMajor). - Runtime:
src0,src1anddsttiles should have the samevalidRow/validCol.
Performance¶
A2/A3 Throughput¶
TMUL compiles to CCE vector instructions via the TBinOp.hpp performance model:
| Metric | Value (FP) | Value (INT) |
|---|---|---|
| Startup latency | 14 | 14 |
| Completion latency | 20 | 18 |
| Per-repeat throughput | 2 | 2 |
| Pipeline interval | 18 | 18 |
Repeat calculation: R = validRow × validCol / 8 (RowMajor layout, continuous path).
Example: 16×64 FP32 tile with RowMajor layout:
R = 16 × 64 / 8 = 128
total ≈ 14 + 20 + 256 + (128-1) × 18 = 2578 cycles
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src0, src1, dst;
TMUL(dst, src0, src1);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src0, src1, dst;
TASSIGN(src0, 0x1000);
TASSIGN(src1, 0x2000);
TASSIGN(dst, 0x3000);
TMUL(dst, src0, src1);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tmul %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tmul %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = tmul %src0, %src1 : !pto.tile<...>
# AS Level 2 (DPS)
pto.tmul ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Elementwise Tile Tile
- Previous op in instruction set: pto.tsub
- Next op in instruction set: pto.tmin