Vector Instruction Set: Data Rearrangement¶

pto.v* rearrangement instruction sets are defined here. These operations permute or repack vector-visible data without turning into tile movement or DMA, so they remain part of the vector instructions.

Category: In-register data movement and permutation Pipeline: PIPE_V (Vector Core)

Operations that rearrange data within or between vector registers without memory access.

Common Operand Model¶

%lhs / %rhs are source vector register values.
%src is a single source vector register value.
%result is the destination vector register value unless an op explicitly returns multiple vectors.
These instruction sets do not access UB directly; they only rearrange register contents.

Interleave / Deinterleave¶

`pto.vintlv`¶

syntax: %low, %high = pto.vintlv %lhs, %rhs : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>, !pto.vreg<NxT>
semantics: Interleave elements from two sources.

// Interleave: merge even/odd elements from two sources
// low  = {src0[0], src1[0], src0[1], src1[1], ...}
// high = {src0[N/2], src1[N/2], src0[N/2+1], src1[N/2+1], ...}

inputs: %lhs and %rhs are the two source vectors.
outputs: %low and %high are the two destination vectors.
constraints and limitations: The two outputs form a paired interleave result. The PTO ISA vector instructions representation exposes that pair as two SSA results, and the pair ordering MUST be preserved.

`pto.vdintlv`¶

syntax: %low, %high = pto.vdintlv %lhs, %rhs : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>, !pto.vreg<NxT>
semantics: Deinterleave elements into even/odd.

// Deinterleave: separate even/odd elements
// low  = {src0[0], src0[2], src0[4], ...}  // even
// high = {src0[1], src0[3], src0[5], ...}  // odd

inputs: %lhs and %rhs represent the interleaved source stream in the current PTO ISA vector instructions representation.
outputs: %low and %high are the separated destination vectors.
constraints and limitations: The two outputs form the even/odd deinterleave result pair, and their ordering MUST be preserved.

Slide / Shift¶

`pto.vslide`¶

syntax: %result = pto.vslide %src0, %src1, %amt : !pto.vreg<NxT>, !pto.vreg<NxT>, i16 -> !pto.vreg<NxT>
semantics: Concatenate two vectors and extract N-element window at offset.

// Conceptually: tmp[0..2N-1] = {src1, src0}
// dst[i] = tmp[amt + i]
if (amt >= 0)
    for (int i = 0; i < N; i++)
        dst[i] = (i >= amt) ? src0[i - amt] : src1[N - amt + i];

Use case: Sliding window operations, shift register patterns.

inputs: %src0 and %src1 provide the concatenated source window and %amt selects the extraction offset.
outputs: %result is the extracted destination window.
constraints and limitations: pto.vslide operates on the logical concatenation of %src1 and %src0. The source order and extraction offset MUST be preserved exactly.

`pto.vshift`¶

syntax: %result = pto.vshift %src, %amt : !pto.vreg<NxT>, i16 -> !pto.vreg<NxT>
semantics: Single-source slide (shift with zero fill).

for (int i = 0; i < N; i++)
    dst[i] = (i >= amt) ? src[i - amt] : 0;

inputs: %src is the source vector and %amt is the slide amount.
outputs: %result is the shifted vector.
constraints and limitations: This instruction set represents the single-source slide/shift instruction set. Zero-fill versus other fill behavior MUST match the selected form.

Compress / Expand¶

`pto.vsqz`¶

syntax: %result = pto.vsqz %src, %mask : !pto.vreg<NxT>, !pto.mask<G> -> !pto.vreg<NxT>
semantics: Compress — pack active lanes to front.

int j = 0;
for (int i = 0; i < N; i++)
    if (mask[i]) dst[j++] = src[i];
while (j < N) dst[j++] = 0;

Use case: Sparse data compaction, filtering.

inputs: %src is the source vector and %mask selects which elements are kept.
outputs: %result is the compacted vector.
constraints and limitations: This is a reduction-style compaction instruction set. Preserved element order MUST match source lane order.

`pto.vusqz`¶

syntax: %result = pto.vusqz %mask : !pto.mask<G> -> !pto.vreg<NxT>
semantics: Expand — scatter front elements to active positions.

int j = 0;
for (int i = 0; i < N; i++)
    if (mask[i]) dst[i] = src_front[j++];
    else dst[i] = 0;

inputs: %mask is the expansion/placement predicate.
outputs: %result is the expanded vector image.
constraints and limitations: The source-front stream is implicit in the current instruction set. Lane placement for active and inactive positions MUST be preserved exactly.

Permutation¶

`pto.vperm`¶

syntax: %result = pto.vperm %src, %index : !pto.vreg<NxT>, !pto.vreg<NxI> -> !pto.vreg<NxT>
semantics: In-register permute (table lookup). Not memory gather.

for (int i = 0; i < N; i++)
    dst[i] = src[index[i] % N];

Note: This operates on register contents, unlike pto.vgather2 which reads from UB memory.

inputs: %src is the source vector and %index supplies per-lane source indices.
outputs: %result is the permuted vector.
constraints and limitations: This is an in-register permutation instruction set. %index values outside the legal range follow the wrap/clamp behavior of the selected form.

`pto.vselr`¶

syntax: %result = pto.vselr %src0, %src1 : !pto.vreg<NxT>, !pto.vreg<NxI> -> !pto.vreg<NxT>
semantics: Register select with reversed mask semantics.

for (int i = 0; i < N; i++)
    dst[i] = mask[i] ? src1[i] : src0[i];

inputs: %src0 and %src1 are source vectors.
outputs: %result is the selected vector.
constraints and limitations: The rearrangement use of the instruction set; the compare/select page documents the same name from the predicate selection perspective.

Pack / Unpack¶

`pto.vpack`¶

syntax: %result = pto.vpack %src0, %src1, %part : !pto.vreg<NxT_wide>, !pto.vreg<NxT_wide>, index -> !pto.vreg<2NxT_narrow>
semantics: Narrowing pack — two wide vectors to one narrow vector.

// e.g., two vreg<64xi32> → one vreg<128xi16>
for (int i = 0; i < N; i++) {
    dst[i]     = truncate(src0[i]);
    dst[N + i] = truncate(src1[i]);
}

inputs: %src0 and %src1 are wide source vectors and %part selects the packing submode.
outputs: %result is the packed narrow vector.
constraints and limitations: Packing is a narrowing conversion. Source values that do not fit the destination width follow the truncation semantics of the selected packing mode.

`pto.vsunpack`¶

syntax: %result = pto.vsunpack %src, %part : !pto.vreg<NxT_narrow>, index -> !pto.vreg<N/2xT_wide>
semantics: Sign-extending unpack — narrow to wide (half).

// e.g., vreg<128xi16> → vreg<64xi32> (one half)
for (int i = 0; i < N/2; i++)
    dst[i] = sign_extend(src[part_offset + i]);

inputs: %src is the packed narrow vector and %part selects which half is unpacked.
outputs: %result is the widened vector.
constraints and limitations: This is the sign-extending unpack instruction set.

`pto.vzunpack`¶

syntax: %result = pto.vzunpack %src, %part : !pto.vreg<NxT_narrow>, index -> !pto.vreg<N/2xT_wide>
semantics: Zero-extending unpack — narrow to wide (half).

for (int i = 0; i < N/2; i++)
    dst[i] = zero_extend(src[part_offset + i]);

inputs: %src is the packed narrow vector and %part selects which half is unpacked.
outputs: %result is the widened vector.
constraints and limitations: This is the zero-extending unpack instruction set.

Typical Usage¶

// AoS → SoA conversion using deinterleave
%even, %odd = pto.vdintlv %interleaved0, %interleaved1
    : !pto.vreg<64xf32>, !pto.vreg<64xf32> -> !pto.vreg<64xf32>, !pto.vreg<64xf32>

// Filter: keep only elements passing condition
%pass_mask = pto.vcmps %values, %threshold, %all, "gt"
    : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.mask<b32>
%compacted = pto.vsqz %values, %pass_mask
    : !pto.vreg<64xf32>, !pto.mask<b32> -> !pto.vreg<64xf32>

// Sliding window sum
%prev_window = pto.vslide %curr, %prev, %c1
    : !pto.vreg<64xf32>, !pto.vreg<64xf32>, i16 -> !pto.vreg<64xf32>
%window_sum = pto.vadd %curr, %prev_window, %all
    : !pto.vreg<64xf32>, !pto.vreg<64xf32>, !pto.mask<b32> -> !pto.vreg<64xf32>

// Type narrowing via pack
%packed_i16 = pto.vpack %wide0_i32, %wide1_i32, %c0
    : !pto.vreg<64xi32>, !pto.vreg<64xi32>, index -> !pto.vreg<128xi16>

V2 Interleave Forms¶

`pto.vintlvv2`¶

syntax: %result = pto.vintlvv2 %lhs, %rhs, "PART" : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>
inputs: %lhs and %rhs are source vectors and PART selects the returned half of the V2 interleave result.
outputs: %result is the selected interleave half.
constraints and limitations: This op exposes only one half of the V2 result in SSA form.

`pto.vdintlvv2`¶

syntax: %result = pto.vdintlvv2 %lhs, %rhs, "PART" : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>
inputs: %lhs and %rhs are source vectors and PART selects the returned half of the V2 deinterleave result.
outputs: %result is the selected deinterleave half.
constraints and limitations: This op exposes only one half of the V2 result in SSA form.

Vector Instruction Set: Data Rearrangement¶

Common Operand Model¶

Interleave / Deinterleave¶

pto.vintlv¶

pto.vdintlv¶

Slide / Shift¶

pto.vslide¶

pto.vshift¶

Compress / Expand¶

pto.vsqz¶

pto.vusqz¶