pto.tci¶
pto.tci is part of the Irregular And Complex instruction set.
Summary¶
Generate a contiguous integer sequence into a destination tile.
Mechanism¶
Generate a contiguous integer sequence into a destination tile. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
For a linearized index k over the valid elements:
- Ascending:
$$ \mathrm{dst}_{k} = S + k $$
- Descending:
$$ \mathrm{dst}_{k} = S - k $$
The linearization order depends on the tile layout. On A2/A3 and A5, the linearization order follows row-major order: elements are visited left-to-right within each row, then top-to-bottom across rows.
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tci %S {descending = false} : !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)
IR Level 1 (SSA)¶
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileData, typename T, int descending, typename... WaitEvents>
PTO_INST RecordEvent TCI(TileData &dst, T start, WaitEvents &... events);
template <typename TileData, typename TileDataTmp, typename T, int descending, typename... WaitEvents>
PTO_INST RecordEvent TCI(TileData &dst, T start, TileDataTmp &tmp, WaitEvents &... events);
Inputs¶
startis the starting integer value for the sequence.descending(template parameter): if true, generates descending sequence.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds a contiguous integer sequence starting from start.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
- Valid region:
- The implementation uses
dst.GetValidCol()as the sequence length and does not consultdst.GetValidRow().
- The implementation uses
- Temporary tile:
- A2/A3: The C++ API provides an overload with an explicit
tmptile for the vectorized implementation path. The no-tmpoverload uses a scalar loop.TileDataTmp::DTypemust be a 4-byte type (float,int32_t, oruint32_t). The implementation caststmptofloat *; size the tile by bytes, independent of the declaredTileDataTmp::DType. - b32 element types (
int32_t,uint32_t): minimum tmp size = 768 bytes (192 float elements). The vectorized path uses two float sub-buffers withintmp:tmp0at offset 0 andtmp1at offset +128 floats.tmp0holds up to 64 float elements (256 bytes) for the initial fractional sequence, andtmp1holds up to 64 float elements (256 bytes) for the accumulated result. The highest accessed byte is offset 128 x 4 + 64 x 4 = 768 bytes (192 float elements). - b16 element types (
int16_t,uint16_t): minimum tmp size = 1792 bytes (448 float elements). The vectorized path uses four sub-buffers withintmp:tmp0/tmp1(float) at offsets 0 and +128, andtmp2/tmp3(half) at offsets +256 and +384 (in float-index units).tmp0/tmp1each hold up to 64 floats (256 bytes) for the fractional sequence generation.tmp2holds up to 16 half elements (32 bytes) for the float-to-half conversion.tmp3holds up to 128 half elements (256 bytes) for the final half-precision accumulation. The highest accessed byte is offset 384 x 4 + 128 x 2 = 1792 bytes (448 float elements). - A convenient shape-independent allocation is 2048 bytes (2 KiB), for example
Tile<TileType::Vec, float, 1, 512>. - A5: The
tmptile is accepted and ignored. A5 hardware uses thevcivector instruction directly without requiring a scratch buffer.
- A2/A3: The C++ API provides an overload with an explicit
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Implementation checks (A2A3/A5):
TileData::DTypemust be exactly the same type as the scalar template parameterT.dst/scalarelement types must be identical, and must be one of:int32_t,uint32_t,int16_t,uint16_t.TileData::Cols != 1(this is the condition enforced by the implementation).
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
TileT dst;
TCI<TileT, int32_t, /*descending=*/0>(dst, /*S=*/0);
}
Auto With Tmp¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto_tmp() {
using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
using TmpT = Tile<TileType::Vec, float, 1, 512>;
TileT dst;
TmpT tmp;
TCI<TileT, TmpT, int32_t, /*descending=*/0>(dst, /*S=*/0, tmp);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
TileT dst;
TASSIGN(dst, 0x1000);
TCI<TileT, int32_t, /*descending=*/1>(dst, /*S=*/100);
}
Manual With Tmp¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual_tmp() {
using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
using TmpT = Tile<TileType::Vec, float, 1, 512>;
TileT dst;
TmpT tmp;
TASSIGN(dst, 0x1000);
TASSIGN(tmp, 0x2000);
TCI<TileT, TmpT, int32_t, /*descending=*/1>(dst, /*S=*/100, tmp);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>
PTO Assembly Form¶
%dst = tci %S {descending = false} : !pto.tile<...>
# AS Level 2 (DPS)
pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tgather
- Next op in instruction set: pto.ttri