pto.tinsert¶
pto.tinsert is part of the Layout And Rearrangement instruction set.
Summary¶
TINSERT writes a source tile into a sub-region of a destination tile, starting at a configurable (indexRow, indexCol) offset. It is the natural counterpart to TEXTRACT — where TEXTRACT reads out of a source, TINSERT writes into a destination.
Two variants are documented here:
| Variant | Suffix | Description | Typical Use |
|---|---|---|---|
| Standard insert | (none) | Insert a sub-window with optional ReLU | Acc→Mat with row/col offsets |
| Fix-pipe insert | _fp |
Insert through the fix-pipe quantization path | Acc→Mat with per-channel scaling |
Mechanism¶
Conceptually copies the source tile's valid region into a window of dst starting at (indexRow, indexCol). For 0 <= i < SrcRows and 0 <= j < SrcCols:
Fix-pipe variant (TINSERT_FP): Routes through the hardware fix-pipe quantization pipeline. The fp tile carries per-channel quantization parameters:
Variants¶
Variant 1: Standard Insert¶
TINSERT(dst, src, indexRow, indexCol) — plain sub-tile insertion.
Variant 2: ReLU Insert¶
TINSERT<..., reluMode>(dst, src, indexRow, indexCol) — insert with ReLU pre-processing.
Variant 3: Scalar-Quant Insert¶
TINSERT<..., reluMode>(dst, src, preQuantScalar, indexRow, indexCol) — insert with a scalar quantization parameter.
Variant 4: Fix-Pipe Insert (TINSERT_FP)¶
TINSERT_FP(dst, src, fp, indexRow, indexCol) — insert through the fix-pipe quantization path. The _fp suffix means fix pipe, not floating point.
Variant 5: A5 Mode-Specific Insert¶
On A5 only, an additional template parameter selects specialized insertion modes:
template <TInsertMode mode, typename DstTileData, typename SrcTileData, typename... WaitEvents>
PTO_INST RecordEvent TINSERT(DstTileData &dst, SrcTileData &src,
uint32_t indexRow = 0, uint32_t indexCol = 0, WaitEvents &... events);
TInsertMode values:
| Mode | Source Tile | Destination Tile | Notes |
|---|---|---|---|
ND |
Row-major Vec |
Matrix in ND layout | Vec → Mat with ND layout |
ND_VEC |
Row-major Vec |
Row-major Vec |
Vec → Vec in-row insertion |
NZ |
NZ-format Vec |
Matrix | Inserts NZ vector into matrix tile |
NZ_PLUS_1 |
NZ-format Vec |
Matrix | Like NZ with +1 offset |
SPLIT2_NZ_PLUS_1 |
NZ-format Vec |
Matrix | Split-by-2 variant |
SPLIT4_NZ_PLUS_1 |
NZ-format Vec |
Matrix | Split-by-4 variant |
Supported Tile-Type Pairs¶
A2/A3¶
| Source Type | Destination Type | Notes |
|---|---|---|
TileType::Acc |
TileType::Mat |
Accumulator-to-matrix insertion |
A5¶
| Source Type | Destination Type | Notes |
|---|---|---|
TileType::Acc |
TileType::Mat |
Accumulator-to-matrix insertion |
TileType::Vec |
TileType::Mat |
Vector-to-matrix via mode parameter |
TileType::Vec |
TileType::Vec |
Vector-to-vector via mode parameter |
Supported Element Types¶
int8_t, hifloat8_t, float8_e5m2_t, float8_e4m3_t, half, bfloat16_t, float, float4_e2m1x2_t, float4_e1m2x2_t, float8_e8m0_t
Syntax¶
PTO Assembly Form¶
Standard insert:
%dst = tinsert %src[%r0, %r1] : !pto.tile<...> -> !pto.tile<...>
Fix-pipe insert:
%dst = tinsert.fp %src, %fp[%r0, %r1] : !pto.tile<...>, !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
// Standard
%dst = pto.tinsert %src, %idxrow, %idxcol : (!pto.tile<...>, dtype, dtype) -> !pto.tile<...>
// Fix-pipe
%dst = pto.tinsert_fp %src, %fp, %idxrow, %idxcol : (!pto.tile<...>, !pto.tile<...>, dtype, dtype) -> !pto.tile<...>
AS Level 2 (DPS)¶
// Standard
pto.tinsert ins(%src, %idxrow, %idxcol : !pto.tile_buf<...>, dtype, dtype) outs(%dst : !pto.tile_buf<...>)
// Fix-pipe
pto.tinsert_fp ins(%src, %fp, %idxrow, %idxcol : !pto.tile_buf<...>, !pto.tile_buf<...>, dtype, dtype) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
#include <pto/pto-inst.hpp>
using namespace pto;
// Variant 1: Plain insert
template <typename DstTileData, typename SrcTileData, typename... WaitEvents>
PTO_INST RecordEvent TINSERT(DstTileData &dst, SrcTileData &src,
uint16_t indexRow, uint16_t indexCol, WaitEvents &... events);
// Variant 2: ReLU insert
template <typename DstTileData, typename SrcTileData, ReluPreMode reluMode,
typename... WaitEvents>
PTO_INST RecordEvent TINSERT(DstTileData &dst, SrcTileData &src,
uint16_t indexRow, uint16_t indexCol, WaitEvents &... events);
// Variant 3: Scalar-quant insert
template <typename DstTileData, typename SrcTileData, ReluPreMode reluMode = ReluPreMode::NoRelu,
typename... WaitEvents>
PTO_INST RecordEvent TINSERT(DstTileData &dst, SrcTileData &src,
uint64_t preQuantScalar, uint16_t indexRow, uint16_t indexCol, WaitEvents &... events);
// Variant 4: Fix-pipe insert (TINSERT_FP)
template <typename DstTileData, typename SrcTileData, typename FpTileData,
ReluPreMode reluMode = ReluPreMode::NoRelu, typename... WaitEvents>
PTO_INST RecordEvent TINSERT_FP(DstTileData &dst, SrcTileData &src, FpTileData &fp,
uint16_t indexRow, uint16_t indexCol, WaitEvents &... events);
// Variant 5: A5 mode-specific insert
#ifdef PTO_NPU_ARCH_A5
template <TInsertMode mode, typename DstTileData, typename SrcTileData, typename... WaitEvents>
PTO_INST RecordEvent TINSERT(DstTileData &dst, SrcTileData &src,
uint32_t indexRow = 0, uint32_t indexCol = 0, WaitEvents &... events);
#endif
Inputs¶
src— the source tile to insert.indexRow— starting row offset indstwhere insertion begins.indexCol— starting column offset indstwhere insertion begins.dst— the destination tile. The operation writessrcinto the(indexRow, indexCol)sub-region ofdst.fp(fix-pipe variant only) — auxiliary fix-pipe tile. Must beTileType::Scaling.reluMode(optional) —ReluPreMode::{NoRelu, NormalRelu}.preQuantScalar(scalar-quant variant only) — scalar quantization factor.
Constraints¶
Constraints
- Runtime bounds:
indexRow + SrcTileData::Rows <= DstTileData::RowsandindexCol + SrcTileData::Cols <= DstTileData::Cols - Fp tile location:
FpTileData::Locmust beTileType::Scaling(enforced on both A2/A3 and A5) - Fix-pipe destination: On A2/A3, destination must be
TileType::Matwith fractal size 512 and column-width byte count divisible by 32 - A5 fix-pipe: Destination must be
TileType::MatwithBLayout::ColMajor + SLayout::RowMajor; source must befloatorint32_tAcc - Cpu simulator:
TINSERT_FPaccepts the interface but ignores thefpparameter, falling back to standardTINSERT
Common Patterns¶
Pattern 1: Accumulator Insert into Matrix¶
// Insert a small accumulator tile into a larger matrix at a specific position
using AccT = TileAcc<float, 16, 16>;
using MatT = Tile<TileType::Mat, int8_t, 32, 32, BLayout::ColMajor, -1, -1, SLayout::RowMajor, 512>;
AccT acc;
MatT mat;
TASSIGN(acc, 0x1000);
TASSIGN(mat, 0x2000);
// Insert the 16x16 accumulator tile into the matrix at row=8, col=16
TINSERT(mat, acc, /*indexRow=*/8, /*indexCol=*/16);
Pattern 2: Fix-Pipe Quantized Insert¶
// Accumulator tile quantized via fix-pipe and inserted into matrix at offset
using AccT = TileAcc<float, 16, 16>;
using MatT = Tile<TileType::Mat, int8_t, 32, 32, BLayout::ColMajor, -1, -1, SLayout::RowMajor, 512>;
using FpT = Tile<TileType::Scaling, uint64_t, 1, 16>;
AccT acc;
MatT mat;
FpT fp(16);
TASSIGN(acc, 0x1000);
TASSIGN(mat, 0x2000);
TASSIGN(fp, 0x3000);
// Insert with fix-pipe quantization applied
TINSERT_FP(mat, acc, fp, /*indexRow=*/0, /*indexCol=*/0);
Pattern 3: Accumulator Scatter via Staged Inserts¶
// Scatter multiple small accumulator results into a large output matrix
using AccT = TileAcc<float, 16, 16>;
using MatT = Tile<TileType::Mat, float, 64, 64>;
MatT outMat;
TASSIGN(outMat, 0x4000);
// Fill 4 quadrants by repeated inserts from temporary accumulators
AccT accQ1, accQ2, accQ3, accQ4;
TASSIGN(accQ1, 0x1010);
TASSIGN(accQ2, 0x1020);
TASSIGN(accQ3, 0x1030);
TASSIGN(accQ4, 0x1040);
// Insert each 16x16 accumulator into the corresponding 32x32 quadrant
TINSERT(outMat, accQ1, 0, 0); // Top-left
TINSERT(outMat, accQ2, 0, 32); // Top-right
TINSERT(outMat, accQ3, 32, 0); // Bottom-left
TINSERT(outMat, accQ4, 32, 32); // Bottom-right
See Also¶
- Layout And Rearrangement
- pto.textract — the inverse operation (read from a source tile)
- pto.tmov — full tile-to-tile movement including
TMOV_FPfix-pipe variant - Assembly Spelling And Operands