Vector Instruction Set: Vector-Scalar Instructions¶
pto.v* instruction sets that combine one vector register with one scalar operand are defined here. Scalar broadcasting, carry-chain rules, and active-lane behavior are architecture-visible constraints.
Category: Vector-scalar operations Pipeline: PIPE_V (Vector Core)
Operations that combine a vector with a scalar value, applying the scalar to every lane.
Common Operand Model¶
%inputis the source vector register value.%scalaris the scalar operand in SSA form.%maskis the predicate operand.%resultis the destination vector register value.- For 32-bit scalar forms, the scalar source MUST satisfy the backend's legal scalar-source constraints for this instruction set.
Arithmetic¶
pto.vadds¶
- syntax:
%result = pto.vadds %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] + scalar;
- inputs:
%inputis the source vector,%scalaris broadcast logically to each active lane, and%maskselects active lanes. - outputs:
%resultis the lane-wise sum. - constraints and limitations: Inactive lanes follow the predication behavior defined for this instruction set. On the current instruction set, inactive lanes are treated as zeroing lanes.
pto.vsubs¶
- syntax:
%result = pto.vsubs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] - scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise difference. - constraints and limitations: Integer or floating-point legality depends on the selected type instruction set in lowering.
pto.vmuls¶
- syntax:
%result = pto.vmuls %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] * scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise product. - constraints and limitations: Supported element types are hardware-instruction set specific; the current PTO ISA vector instructions documentation covers the common numeric cases.
pto.vmaxs¶
- syntax:
%result = pto.vmaxs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = (src[i] > scalar) ? src[i] : scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise maximum. - constraints and limitations: Input and result types MUST match.
pto.vmins¶
- syntax:
%result = pto.vmins %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = (src[i] < scalar) ? src[i] : scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise minimum. - constraints and limitations: Input and result types MUST match.
Bitwise¶
pto.vands¶
- syntax:
%result = pto.vands %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] & scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise bitwise AND. - constraints and limitations: Integer element types only.
pto.vors¶
- syntax:
%result = pto.vors %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] | scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise bitwise OR. - constraints and limitations: Integer element types only.
pto.vxors¶
- syntax:
%result = pto.vxors %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] ^ scalar;
- inputs:
%input,%scalar, and%maskas above. - outputs:
%resultis the lane-wise bitwise XOR. - constraints and limitations: Integer element types only.
Shift¶
pto.vshls¶
- syntax:
%result = pto.vshls %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] << scalar;
- inputs:
%inputis the value vector,%scalaris the uniform shift amount, and%maskselects active lanes. - outputs:
%resultis the shifted vector. - constraints and limitations: Integer element types only. The shift amount SHOULD stay within the source element width.
pto.vshrs¶
- syntax:
%result = pto.vshrs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = src[i] >> scalar;
- inputs:
%inputis the value vector,%scalaris the uniform shift amount, and%maskselects active lanes. - outputs:
%resultis the shifted vector. - constraints and limitations: Integer element types only.
pto.vlrelu¶
- syntax:
%result = pto.vlrelu %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>
for (int i = 0; i < N; i++)
dst[i] = (src[i] >= 0) ? src[i] : scalar * src[i];
- inputs:
%inputis the activation vector,%scalaris the leaky slope, and%maskselects active lanes. - outputs:
%resultis the lane-wise leaky-ReLU result. - constraints and limitations: Only
f16andf32forms are currently documented forpto.vlrelu.
Carry Operations¶
pto.vaddcs¶
- syntax:
%result, %carry = pto.vaddcs %lhs, %rhs, %carry_in, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>, !pto.mask<G> -> !pto.vreg<NxT>, !pto.mask<G> - semantics: Add with carry-in and carry-out.
for (int i = 0; i < N; i++) {
uint64_t r = (uint64_t)src0[i] + src1[i] + carry_in[i];
dst[i] = (T)r;
carry_out[i] = (r >> bitwidth);
}
- inputs:
%lhsand%rhsare the value vectors,%carry_inis the incoming carry predicate, and%maskselects active lanes. - outputs:
%resultis the arithmetic result and%carryis the carry-out predicate. - constraints and limitations: This is the scalar-extended carry-chain instruction set. Treat it as an unsigned integer operation unless the verifier states a wider legal domain.
pto.vsubcs¶
- syntax:
%result, %borrow = pto.vsubcs %lhs, %rhs, %borrow_in, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>, !pto.mask<G> -> !pto.vreg<NxT>, !pto.mask<G> - semantics: Subtract with borrow-in and borrow-out.
for (int i = 0; i < N; i++) {
dst[i] = src0[i] - src1[i] - borrow_in[i];
borrow_out[i] = (src0[i] < src1[i] + borrow_in[i]);
}
- inputs:
%lhsand%rhsare the value vectors,%borrow_inis the incoming borrow predicate, and%maskselects active lanes. - outputs:
%resultis the arithmetic result and%borrowis the borrow-out predicate. - constraints and limitations: This is the scalar-extended borrow-chain instruction set and SHOULD be treated as an unsigned integer operation.
Typical Usage¶
// Add bias to all elements
%biased = pto.vadds %activation, %bias_scalar, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>
// Scale by constant
%scaled = pto.vmuls %input, %scale, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>
// Clamp to [0, 255] for uint8 quantization
%clamped_low = pto.vmaxs %input, %c0, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>
%clamped = pto.vmins %clamped_low, %c255, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>
// Shift right by fixed amount
%shifted = pto.vshrs %data, %c4, %mask : !pto.vreg<64xi32>, i32, !pto.mask<b32> -> !pto.vreg<64xi32>