pto.vci¶

pto.vci is part of the Conversion Ops instruction set.

Summary¶

Standalone contract page for pto.vci.

Mechanism¶

pto.vci is an index-generation operation. It produces a vector of indices starting from the scalar seed %index and incrementing or decrementing by 1 per lane. The generated indices are used to support indexed access patterns (gather/scatter) and argsort preparation.

Syntax¶

PTO Assembly Form¶

vci %index, %mask {order = "ORDER"} : !pto.vreg<Nxi32> -> !pto.vreg<Nxi32>

AS Level 1 (SSA)¶

%indices = pto.vci %index {order = "ASC"} : i32 -> !pto.vreg<64xi32>

AS Level 2 (DPS)¶

pto.vci ins(%index : i32) outs(%indices : !pto.vreg<64xi32>) {order = "ASC"}

Inputs¶

Operand	Type	Description
`%index`	scalar `i32`	Scalar seed or base index for index generation
`%mask`	`!pto.mask<G>`	Predication mask (optional in some forms); inactive lanes may produce zero or preserve existing values

Attributes:

Attribute	Values	Description
`order`	`"ASC"` / `"DESC"`	Sort order for index generation; `ASC` generates increasing indices, `DESC` generates decreasing

Expected Outputs¶

Operand	Type	Description
`%result`	`!pto.vreg<Nxi32>`	Generated index vector

C Semantics¶

// ASC order: indices = base, base+1, base+2, ..., base+N-1
// DESC order: indices = base, base-1, base-2, ..., base-(N-1)

The %index scalar is the starting value; each lane i produces base + i (ASC) or base - i (DESC).

This is an index-generation family, not a numeric conversion. ORDER and the result element type together determine how indices are generated.

Side Effects¶

This operation has no architectural side effect beyond producing its SSA results. It does not implicitly reserve buffers, signal events, or establish memory fences.

Constraints¶

Constraints

%result uses an integer element type (i32 in the common form).
The scalar %index type matches the result element type.
The order attribute is required when using sorted index generation.
For the standard form, N (lane count) is derived from the result type.

Exceptions¶

Exceptions

The verifier rejects illegal operand shapes, unsupported element types, and attribute combinations that are not valid for the selected instruction set or target profile.
Illegal order values are rejected by the verifier.

Target-Profile Restrictions¶

Target-Profile Restrictions

A5 is the most detailed concrete profile in the current manual; CPU simulation and A2/A3-class targets may support narrower subsets or emulate the behavior while preserving the visible PTO contract.
Under the current documented A5 profile contract, pto.vci maps to hardware trace with no vector RV_* in sampled veccore0 trace.

Performance¶

Execution Model¶

pto.vci is an index-generation operation executed within a pto.vecscope region. It produces lane-wise index values without invoking the Vector Core's main ALU — the latency is dominated by mask setup and predicate generation rather than compute units.

A5 Execution¶

On A5, pto.vci maps to hardware trace with no sampled RV_* in the veccore0 trace — it is implemented in the predicate/materialization layer, not as a standard vector compute instruction.

A2/A3 Throughput¶

vci does not map to a direct CCE vector instruction in the A2/A3 cost model. It is compiled as a scalar index-generation loop within the vecscope:

Metric	Value	Notes
Startup	~10 cycles	mask setup + loop overhead
Per-element	O(1)	simple arithmetic per lane
Complexity	O(N)	one operation per output lane

The actual throughput depends on the surrounding loop structure and the number of iterations in the vecscope.

Execution Note¶

vci is commonly used to initialize index buffers for gather/scatter operations and argsort:

// Initialize ascending index buffer: [0, 1, 2, 3, ..., 63] for 64-element gather
%base_idx = pto.vci %c0 {order = "ASC"} : i32 -> !pto.vreg<64xi32>
// Generates: [0, 1, 2, 3, ..., 63] in lane 0

Examples¶

Generate ascending indices (common use for gather/scatter)¶

%indices = pto.vci %c0 {order = "ASC"} : i32 -> !pto.vreg<64xi32>
// Result: [0, 1, 2, 3, ..., 63]

Generate descending indices¶

%indices = pto.vci %c63 {order = "DESC"} : i32 -> !pto.vreg<64xi32>
// Result: [63, 62, 61, 60, ..., 0]

Use with gather (indexed load)¶

// Generate indices, then use for indexed load
%idx = pto.vci %c0 {order = "ASC"} : i32 -> !pto.vreg<64xi32>
%data = pto.vgather2 %ub_table[%c0], %idx {dist = "DIST"} : !pto.ptr<f32, ub> -> !pto.vreg<64xf32>

Use with vsort32 (argsort)¶

// Generate ascending indices as sort keys
%indices = pto.vci %c0 {order = "ASC"} : i32 -> !pto.vreg<64xi32>
pto.vsort32 %sorted_indices, %indices, %config : !pto.ptr<i32, ub>, !pto.ptr<i32, ub>, i64

Detailed Notes¶

pto.vci generates lane indices from a scalar seed. The two primary use cases are:

Indexed access: Generate indices for vgather2 / vscatter operations to access arbitrary elements.
Argsort preparation: Generate sequential indices before sorting, then rearrange data based on sorted indices.

The generated indices are stable across invocations for the same %index seed, making them suitable as sort keys for indirect sort operations.

Instruction set overview: Conversion Ops
Previous op in instruction set: pto.vci (self-referential; see also pto.vcvt)
Next op in instruction set: pto.vcvt
Related index-generation: pto.vsort32 — argsort using index vectors
Related gather/scatter: pto.vgather2, pto.vscatter