pto.vshl¶

pto.vshl is part of the Binary Vector Instructions instruction set.

Summary¶

Lane-wise left shift: dst[i] = lhs[i] << rhs[i] for each active lane. The shift count is per-lane and unsigned.

Mechanism¶

Shifts each element of the left-hand vector left by the per-lane count from the right-hand vector. For each lane i where the predicate is true:

\[ \mathrm{dst}_i = \mathrm{lhs}_i \ll \mathrm{rhs}_i \]

The shift count rhs[i] is treated as unsigned. Bits shifted out are discarded. Inactive lanes leave the destination unchanged.

Syntax¶

PTO Assembly Form¶

vshl %dst, %lhs, %rhs, %mask : !pto.vreg<NxT>

AS Level 1 (SSA)¶

%result = pto.vshl %lhs, %rhs, %mask : (!pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>) -> !pto.vreg<NxT>

AS Level 2 (DPS)¶

pto.vshl ins(%lhs, %rhs, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>)
          outs(%result : !pto.vreg<NxT>)

Supported element types: all integer types (i8–i64, u8–u64).

Inputs¶

Operand	Type	Description
`%lhs`	`!pto.vreg<NxT>`	Value to be shifted (left operand)
`%rhs`	`!pto.vreg<NxT>`	Per-lane unsigned shift count
`%mask`	`!pto.mask<G>`	Predicate mask; lanes where mask bit is 1 are active

Both source registers MUST have the same integer element type and the same vector width N. The mask width MUST match N.

Expected Outputs¶

Result	Type	Description
`%result`	`!pto.vreg<NxT>`	Lane-wise left shift: `dst[i] = lhs[i] << rhs[i]` on active lanes; inactive lanes are unmodified

Side Effects¶

This operation has no architectural side effect beyond producing its destination vector register. It does not implicitly reserve buffers, signal events, or establish memory fences.

Constraints¶

Constraints

Type: Integer element types only (no floating-point).
Type match: %lhs, %rhs, and %result MUST have identical element types.
Width match: All three registers MUST have the same vector width N.
Mask width: %mask MUST have width equal to N.
Shift count: Shift counts SHOULD stay within [0, bitwidth(T) - 1]; out-of-range behavior is target-defined unless the verifier narrows it further.
Active lanes: Only lanes where the mask bit is 1 participate.
Inactive lanes: Destination elements at inactive lanes are unmodified.

Exceptions¶

Exceptions

The verifier rejects non-integer element types, type mismatches, width mismatches, or mask width mismatches.
Any additional illegality stated in the Binary Vector Instructions instruction set page is also part of the contract.

Target-Profile Restrictions¶

Target-Profile Restrictions

	Element Type	CPU Simulator	A2/A3	A5
	Integer types	Simulated	Simulated	Supported

A5 is the primary concrete profile for the vector instructions.

Performance¶

A5 Latency¶

	Element Type	Latency (cycles)	A5 RV
	`i32`	7	`RV_VSHL`

A2/A3 Throughput¶

Metric	Value	Constant
Startup latency	14	`A2A3_STARTUP_BINARY`
Completion latency	17	`A2A3_COMPL_INT_BINOP`
Per-repeat throughput	2	`A2A3_RPT_2`
Pipeline interval	18	`A2A3_INTERVAL`

Examples¶

C Semantics¶

for (int i = 0; i < N; i++)
    dst[i] = src0[i] << src1[i];

MLIR Usage¶

// Left shift by scalar count (broadcast to all lanes)
%count = pto.vbroadcast %c3 : i32 -> !pto.vreg<64xi32>
%shifted = pto.vshl %data, %count, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>

// Per-lane variable shift
%shifted2 = pto.vshl %data, %counts, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>

Instruction set overview: Binary Vector Instructions
Previous op in instruction set: pto.vxor
Next op in instruction set: pto.vshr