pto.tci

pto.tci is part of the Irregular And Complex instruction set.

Summary

Generate a contiguous integer sequence into a destination tile.

Mechanism

Generate a contiguous integer sequence into a destination tile. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.

For a linearized index k over the valid elements:

  • Ascending:

$$ \mathrm{dst}_{k} = S + k $$

  • Descending:

$$ \mathrm{dst}_{k} = S - k $$

The linearization order depends on the tile layout. On A2/A3 and A5, the linearization order follows row-major order: elements are visited left-to-right within each row, then top-to-bottom across rows.

Syntax

Textual spelling is defined by the PTO ISA syntax-and-operands pages.

Synchronous form:

%dst = tci %S {descending = false} : !pto.tile<...>

AS Level 1 (SSA)

%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>

AS Level 2 (DPS)

pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)

IR Level 1 (SSA)

%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>

IR Level 2 (DPS)

pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <typename TileData, typename T, int descending, typename... WaitEvents>
PTO_INST RecordEvent TCI(TileData &dst, T start, WaitEvents &... events);

template <typename TileData, typename TileDataTmp, typename T, int descending, typename... WaitEvents>
PTO_INST RecordEvent TCI(TileData &dst, T start, TileDataTmp &tmp, WaitEvents &... events);

Inputs

  • start is the starting integer value for the sequence.
  • descending (template parameter): if true, generates descending sequence.
  • dst names the destination tile. The operation iterates over dst's valid region.

Expected Outputs

dst holds a contiguous integer sequence starting from start.

Side Effects

No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.

Constraints

Constraints

  • Valid region:
    • The implementation uses dst.GetValidCol() as the sequence length and does not consult dst.GetValidRow().
  • Temporary tile:
    • A2/A3: The C++ API provides an overload with an explicit tmp tile for the vectorized implementation path. The no-tmp overload uses a scalar loop. TileDataTmp::DType must be a 4-byte type (float, int32_t, or uint32_t). The implementation casts tmp to float *; size the tile by bytes, independent of the declared TileDataTmp::DType.
    • b32 element types (int32_t, uint32_t): minimum tmp size = 768 bytes (192 float elements). The vectorized path uses two float sub-buffers within tmp: tmp0 at offset 0 and tmp1 at offset +128 floats. tmp0 holds up to 64 float elements (256 bytes) for the initial fractional sequence, and tmp1 holds up to 64 float elements (256 bytes) for the accumulated result. The highest accessed byte is offset 128 x 4 + 64 x 4 = 768 bytes (192 float elements).
    • b16 element types (int16_t, uint16_t): minimum tmp size = 1792 bytes (448 float elements). The vectorized path uses four sub-buffers within tmp: tmp0/tmp1 (float) at offsets 0 and +128, and tmp2/tmp3 (half) at offsets +256 and +384 (in float-index units). tmp0/tmp1 each hold up to 64 floats (256 bytes) for the fractional sequence generation. tmp2 holds up to 16 half elements (32 bytes) for the float-to-half conversion. tmp3 holds up to 128 half elements (256 bytes) for the final half-precision accumulation. The highest accessed byte is offset 384 x 4 + 128 x 2 = 1792 bytes (448 float elements).
    • A convenient shape-independent allocation is 2048 bytes (2 KiB), for example Tile<TileType::Vec, float, 1, 512>.
    • A5: The tmp tile is accepted and ignored. A5 hardware uses the vci vector instruction directly without requiring a scratch buffer.

Exceptions

Exceptions

  • Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
  • Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.

Target-Profile Restrictions

Target-Profile Restrictions
  • Implementation checks (A2A3/A5):
    • TileData::DType must be exactly the same type as the scalar template parameter T.
    • dst/scalar element types must be identical, and must be one of: int32_t, uint32_t, int16_t, uint16_t.
    • TileData::Cols != 1 (this is the condition enforced by the implementation).

Examples

Auto

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto() {
  using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
  TileT dst;
  TCI<TileT, int32_t, /*descending=*/0>(dst, /*S=*/0);
}

Auto With Tmp

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto_tmp() {
  using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
  using TmpT = Tile<TileType::Vec, float, 1, 512>;
  TileT dst;
  TmpT tmp;
  TCI<TileT, TmpT, int32_t, /*descending=*/0>(dst, /*S=*/0, tmp);
}

Manual

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual() {
  using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
  TileT dst;
  TASSIGN(dst, 0x1000);
  TCI<TileT, int32_t, /*descending=*/1>(dst, /*S=*/100);
}

Manual With Tmp

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual_tmp() {
  using TileT = Tile<TileType::Vec, int32_t, 1, 16>;
  using TmpT = Tile<TileType::Vec, float, 1, 512>;
  TileT dst;
  TmpT tmp;
  TASSIGN(dst, 0x1000);
  TASSIGN(tmp, 0x2000);
  TCI<TileT, TmpT, int32_t, /*descending=*/1>(dst, /*S=*/100, tmp);
}

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>

Manual Mode

# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tci %scalar {descending = false} : dtype -> !pto.tile<...>

PTO Assembly Form

%dst = tci %S {descending = false} : !pto.tile<...>
# AS Level 2 (DPS)
pto.tci ins(%scalar {descending = false} : dtype) outs(%dst : !pto.tile_buf<...>)