Vector Instruction Set: Data Rearrangement¶
pto.v* rearrangement instruction sets are defined here. These operations permute or repack vector-visible data without turning into tile movement or DMA, so they remain part of the vector instructions.
Category: In-register data movement and permutation Pipeline: PIPE_V (Vector Core)
Operations that rearrange data within or between vector registers without memory access.
Common Operand Model¶
%lhs/%rhsare source vector register values.%srcis a single source vector register value.%resultis the destination vector register value unless an op explicitly returns multiple vectors.- These instruction sets do not access UB directly; they only rearrange register contents.
Interleave / Deinterleave¶
pto.vintlv¶
- syntax:
%low, %high = pto.vintlv %lhs, %rhs : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>, !pto.vreg<NxT> - semantics: Interleave elements from two sources.
// Interleave: merge even/odd elements from two sources
// low = {src0[0], src1[0], src0[1], src1[1], ...}
// high = {src0[N/2], src1[N/2], src0[N/2+1], src1[N/2+1], ...}
- inputs:
%lhsand%rhsare the two source vectors. - outputs:
%lowand%highare the two destination vectors. - constraints and limitations: The two outputs form a paired interleave result. The PTO ISA vector instructions representation exposes that pair as two SSA results, and the pair ordering MUST be preserved.
pto.vdintlv¶
- syntax:
%low, %high = pto.vdintlv %lhs, %rhs : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT>, !pto.vreg<NxT> - semantics: Deinterleave elements into even/odd.
// Deinterleave: separate even/odd elements
// low = {src0[0], src0[2], src0[4], ...} // even
// high = {src0[1], src0[3], src0[5], ...} // odd
- inputs:
%lhsand%rhsrepresent the interleaved source stream in the current PTO ISA vector instructions representation. - outputs:
%lowand%highare the separated destination vectors. - constraints and limitations: The two outputs form the even/odd deinterleave result pair, and their ordering MUST be preserved.
Slide / Shift¶
pto.vslide¶
- syntax:
%result = pto.vslide %src0, %src1, %amt : !pto.vreg<NxT>, !pto.vreg<NxT>, i16 -> !pto.vreg<NxT> - semantics: Concatenate two vectors and extract N-element window at offset.
// Conceptually: tmp[0..2N-1] = {src1, src0}
// dst[i] = tmp[amt + i]
if (amt >= 0)
for (int i = 0; i < N; i++)
dst[i] = (i >= amt) ? src0[i - amt] : src1[N - amt + i];
Use case: Sliding window operations, shift register patterns.
- inputs:
%src0and%src1provide the concatenated source window and%amtselects the extraction offset. - outputs:
%resultis the extracted destination window. - constraints and limitations:
pto.vslideoperates on the logical concatenation of%src1and%src0. The source order and extraction offset MUST be preserved exactly.
pto.vshift¶
- syntax:
%result = pto.vshift %src, %amt : !pto.vreg<NxT>, i16 -> !pto.vreg<NxT> - semantics: Single-source slide (shift with zero fill).
for (int i = 0; i < N; i++)
dst[i] = (i >= amt) ? src[i - amt] : 0;
- inputs:
%srcis the source vector and%amtis the slide amount. - outputs:
%resultis the shifted vector. - constraints and limitations: This instruction set represents the single-source slide/shift instruction set. Zero-fill versus other fill behavior MUST match the selected form.
Compress / Expand¶
pto.vsqz¶
- syntax:
%result = pto.vsqz %src, %mask : !pto.vreg<NxT>, !pto.mask<G> -> !pto.vreg<NxT> - semantics: Compress — pack active lanes to front.
int j = 0;
for (int i = 0; i < N; i++)
if (mask[i]) dst[j++] = src[i];
while (j < N) dst[j++] = 0;
Use case: Sparse data compaction, filtering.
- inputs:
%srcis the source vector and%maskselects which elements are kept. - outputs:
%resultis the compacted vector. - constraints and limitations: This is a reduction-style compaction instruction set. Preserved element order MUST match source lane order.
pto.vusqz¶
- syntax:
%result = pto.vusqz %mask : !pto.mask<G> -> !pto.vreg<NxT> - semantics: Expand — scatter front elements to active positions.
int j = 0;
for (int i = 0; i < N; i++)
if (mask[i]) dst[i] = src_front[j++];
else dst[i] = 0;
- inputs:
%maskis the expansion/placement predicate. - outputs:
%resultis the expanded vector image. - constraints and limitations: The source-front stream is implicit in the current instruction set. Lane placement for active and inactive positions MUST be preserved exactly.
Permutation¶
pto.vperm¶
- syntax:
%result = pto.vperm %src, %index : !pto.vreg<NxT>, !pto.vreg<NxI> -> !pto.vreg<NxT> - semantics: In-register permute (table lookup). Not memory gather.
for (int i = 0; i < N; i++)
dst[i] = src[index[i] % N];
Note: This operates on register contents, unlike pto.vgather2 which reads from UB memory.
- inputs:
%srcis the source vector and%indexsupplies per-lane source indices. - outputs:
%resultis the permuted vector. - constraints and limitations: This is an in-register permutation instruction set.
%indexvalues outside the legal range follow the wrap/clamp behavior of the selected form.
pto.vselr¶
- syntax:
%result = pto.vselr %src0, %src1 : !pto.vreg<NxT>, !pto.vreg<NxI> -> !pto.vreg<NxT> - semantics: Register select with reversed mask semantics.
for (int i = 0; i < N; i++)
dst[i] = mask[i] ? src1[i] : src0[i];
- inputs:
%src0and%src1are source vectors. - outputs:
%resultis the selected vector. - constraints and limitations: The rearrangement use of the instruction set; the compare/select page documents the same name from the predicate selection perspective.
Pack / Unpack¶
pto.vpack¶
- syntax:
%result = pto.vpack %src0, %src1, %part : !pto.vreg<NxT_wide>, !pto.vreg<NxT_wide>, index -> !pto.vreg<2NxT_narrow> - semantics: Narrowing pack — two wide vectors to one narrow vector.
// e.g., two vreg<64xi32> → one vreg<128xi16>
for (int i = 0; i < N; i++) {
dst[i] = truncate(src0[i]);
dst[N + i] = truncate(src1[i]);
}
- inputs:
%src0and%src1are wide source vectors and%partselects the packing submode. - outputs:
%resultis the packed narrow vector. - constraints and limitations: Packing is a narrowing conversion. Source values that do not fit the destination width follow the truncation semantics of the selected packing mode.
pto.vsunpack¶
- syntax:
%result = pto.vsunpack %src, %part : !pto.vreg<NxT_narrow>, index -> !pto.vreg<N/2xT_wide> - semantics: Sign-extending unpack — narrow to wide (half).
// e.g., vreg<128xi16> → vreg<64xi32> (one half)
for (int i = 0; i < N/2; i++)
dst[i] = sign_extend(src[part_offset + i]);
- inputs:
%srcis the packed narrow vector and%partselects which half is unpacked. - outputs:
%resultis the widened vector. - constraints and limitations: This is the sign-extending unpack instruction set.
pto.vzunpack¶
- syntax:
%result = pto.vzunpack %src, %part : !pto.vreg<NxT_narrow>, index -> !pto.vreg<N/2xT_wide> - semantics: Zero-extending unpack — narrow to wide (half).
for (int i = 0; i < N/2; i++)
dst[i] = zero_extend(src[part_offset + i]);
- inputs:
%srcis the packed narrow vector and%partselects which half is unpacked. - outputs:
%resultis the widened vector. - constraints and limitations: This is the zero-extending unpack instruction set.
Typical Usage¶
// AoS → SoA conversion using deinterleave
%even, %odd = pto.vdintlv %interleaved0, %interleaved1
: !pto.vreg<64xf32>, !pto.vreg<64xf32> -> !pto.vreg<64xf32>, !pto.vreg<64xf32>
// Filter: keep only elements passing condition
%pass_mask = pto.vcmps %values, %threshold, %all, "gt"
: !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.mask<b32>
%compacted = pto.vsqz %values, %pass_mask
: !pto.vreg<64xf32>, !pto.mask<b32> -> !pto.vreg<64xf32>
// Sliding window sum
%prev_window = pto.vslide %curr, %prev, %c1
: !pto.vreg<64xf32>, !pto.vreg<64xf32>, i16 -> !pto.vreg<64xf32>
%window_sum = pto.vadd %curr, %prev_window, %all
: !pto.vreg<64xf32>, !pto.vreg<64xf32>, !pto.mask<b32> -> !pto.vreg<64xf32>
// Type narrowing via pack
%packed_i16 = pto.vpack %wide0_i32, %wide1_i32, %c0
: !pto.vreg<64xi32>, !pto.vreg<64xi32>, index -> !pto.vreg<128xi16>
V2 Interleave Forms¶
pto.vintlvv2¶
- syntax:
%result = pto.vintlvv2 %lhs, %rhs, "PART" : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT> - inputs:
%lhsand%rhsare source vectors andPARTselects the returned half of the V2 interleave result. - outputs:
%resultis the selected interleave half. - constraints and limitations: This op exposes only one half of the V2 result in SSA form.
pto.vdintlvv2¶
- syntax:
%result = pto.vdintlvv2 %lhs, %rhs, "PART" : !pto.vreg<NxT>, !pto.vreg<NxT> -> !pto.vreg<NxT> - inputs:
%lhsand%rhsare source vectors andPARTselects the returned half of the V2 deinterleave result. - outputs:
%resultis the selected deinterleave half. - constraints and limitations: This op exposes only one half of the V2 result in SSA form.