Vector Instruction Set: Vector-Scalar Instructions¶

pto.v* instruction sets that combine one vector register with one scalar operand are defined here. Scalar broadcasting, carry-chain rules, and active-lane behavior are architecture-visible constraints.

Category: Vector-scalar operations Pipeline: PIPE_V (Vector Core)

Operations that combine a vector with a scalar value, applying the scalar to every lane.

Common Operand Model¶

%input is the source vector register value.
%scalar is the scalar operand in SSA form.
%mask is the predicate operand.
%result is the destination vector register value.
For 32-bit scalar forms, the scalar source MUST satisfy the backend's legal scalar-source constraints for this instruction set.

Arithmetic¶

`pto.vadds`¶

syntax: %result = pto.vadds %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] + scalar;

inputs: %input is the source vector, %scalar is broadcast logically to each active lane, and %mask selects active lanes.
outputs: %result is the lane-wise sum.
constraints and limitations: Inactive lanes follow the predication behavior defined for this instruction set. On the current instruction set, inactive lanes are treated as zeroing lanes.

`pto.vsubs`¶

syntax: %result = pto.vsubs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] - scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise difference.
constraints and limitations: Integer or floating-point legality depends on the selected type instruction set in lowering.

`pto.vmuls`¶

syntax: %result = pto.vmuls %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] * scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise product.
constraints and limitations: Supported element types are hardware-instruction set specific; the current PTO ISA vector instructions documentation covers the common numeric cases.

`pto.vmaxs`¶

syntax: %result = pto.vmaxs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = (src[i] > scalar) ? src[i] : scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise maximum.
constraints and limitations: Input and result types MUST match.

`pto.vmins`¶

syntax: %result = pto.vmins %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = (src[i] < scalar) ? src[i] : scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise minimum.
constraints and limitations: Input and result types MUST match.

Bitwise¶

`pto.vands`¶

syntax: %result = pto.vands %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] & scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise bitwise AND.
constraints and limitations: Integer element types only.

`pto.vors`¶

syntax: %result = pto.vors %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] | scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise bitwise OR.
constraints and limitations: Integer element types only.

`pto.vxors`¶

syntax: %result = pto.vxors %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] ^ scalar;

inputs: %input, %scalar, and %mask as above.
outputs: %result is the lane-wise bitwise XOR.
constraints and limitations: Integer element types only.

Shift¶

`pto.vshls`¶

syntax: %result = pto.vshls %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] << scalar;

inputs: %input is the value vector, %scalar is the uniform shift amount, and %mask selects active lanes.
outputs: %result is the shifted vector.
constraints and limitations: Integer element types only. The shift amount SHOULD stay within the source element width.

`pto.vshrs`¶

syntax: %result = pto.vshrs %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = src[i] >> scalar;

inputs: %input is the value vector, %scalar is the uniform shift amount, and %mask selects active lanes.
outputs: %result is the shifted vector.
constraints and limitations: Integer element types only.

`pto.vlrelu`¶

syntax: %result = pto.vlrelu %input, %scalar, %mask : !pto.vreg<NxT>, T, !pto.mask<G> -> !pto.vreg<NxT>

for (int i = 0; i < N; i++)
    dst[i] = (src[i] >= 0) ? src[i] : scalar * src[i];

inputs: %input is the activation vector, %scalar is the leaky slope, and %mask selects active lanes.
outputs: %result is the lane-wise leaky-ReLU result.
constraints and limitations: Only f16 and f32 forms are currently documented for pto.vlrelu.

Carry Operations¶

`pto.vaddcs`¶

syntax: %result, %carry = pto.vaddcs %lhs, %rhs, %carry_in, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>, !pto.mask<G> -> !pto.vreg<NxT>, !pto.mask<G>
semantics: Add with carry-in and carry-out.

for (int i = 0; i < N; i++) {
    uint64_t r = (uint64_t)src0[i] + src1[i] + carry_in[i];
    dst[i] = (T)r;
    carry_out[i] = (r >> bitwidth);
}

inputs: %lhs and %rhs are the value vectors, %carry_in is the incoming carry predicate, and %mask selects active lanes.
outputs: %result is the arithmetic result and %carry is the carry-out predicate.
constraints and limitations: This is the scalar-extended carry-chain instruction set. Treat it as an unsigned integer operation unless the verifier states a wider legal domain.

`pto.vsubcs`¶

syntax: %result, %borrow = pto.vsubcs %lhs, %rhs, %borrow_in, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>, !pto.mask<G> -> !pto.vreg<NxT>, !pto.mask<G>
semantics: Subtract with borrow-in and borrow-out.

for (int i = 0; i < N; i++) {
    dst[i] = src0[i] - src1[i] - borrow_in[i];
    borrow_out[i] = (src0[i] < src1[i] + borrow_in[i]);
}

inputs: %lhs and %rhs are the value vectors, %borrow_in is the incoming borrow predicate, and %mask selects active lanes.
outputs: %result is the arithmetic result and %borrow is the borrow-out predicate.
constraints and limitations: This is the scalar-extended borrow-chain instruction set and SHOULD be treated as an unsigned integer operation.

Typical Usage¶

// Add bias to all elements
%biased = pto.vadds %activation, %bias_scalar, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>

// Scale by constant
%scaled = pto.vmuls %input, %scale, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>

// Clamp to [0, 255] for uint8 quantization
%clamped_low = pto.vmaxs %input, %c0, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>
%clamped = pto.vmins %clamped_low, %c255, %mask : !pto.vreg<64xf32>, f32, !pto.mask<b32> -> !pto.vreg<64xf32>

// Shift right by fixed amount
%shifted = pto.vshrs %data, %c4, %mask : !pto.vreg<64xi32>, i32, !pto.mask<b32> -> !pto.vreg<64xi32>

Vector Instruction Set: Vector-Scalar Instructions¶

Common Operand Model¶

Arithmetic¶

pto.vadds¶

pto.vsubs¶

pto.vmuls¶

pto.vmaxs¶

pto.vmins¶

Bitwise¶

pto.vands¶

pto.vors¶

pto.vxors¶