pto.vshr¶

pto.vshr is part of the Binary Vector Instructions instruction set.

Summary¶

Lane-wise right shift: dst[i] = lhs[i] >> rhs[i] for each active lane. Signed types use arithmetic shift; unsigned types use logical shift.

Mechanism¶

Shifts each element of the left-hand vector right by the per-lane count from the right-hand vector. For each lane i where the predicate is true:

\[ \mathrm{dst}_i = \mathrm{lhs}_i \gg \mathrm{rhs}_i \]

Signed element types (i8–i64): arithmetic shift — the sign bit is replicated.
Unsigned element types (u8–u64): logical shift — zeros are shifted in.
The shift count rhs[i] is treated as unsigned.
Bits shifted out are discarded.
Inactive lanes leave the destination unchanged.

Syntax¶

PTO Assembly Form¶

vshr %dst, %lhs, %rhs, %mask : !pto.vreg<NxT>

AS Level 1 (SSA)¶

%result = pto.vshr %lhs, %rhs, %mask : (!pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>) -> !pto.vreg<NxT>

AS Level 2 (DPS)¶

pto.vshr ins(%lhs, %rhs, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>)
          outs(%result : !pto.vreg<NxT>)

Supported element types: all integer types (i8–i64, u8–u64).

Inputs¶

Operand	Type	Description
`%lhs`	`!pto.vreg<NxT>`	Value to be shifted (left operand)
`%rhs`	`!pto.vreg<NxT>`	Per-lane unsigned shift count
`%mask`	`!pto.mask<G>`	Predicate mask; lanes where mask bit is 1 are active

Both source registers MUST have the same integer element type and the same vector width N. The mask width MUST match N.

Expected Outputs¶

Result	Type	Description
`%result`	`!pto.vreg<NxT>`	Lane-wise right shift on active lanes; inactive lanes are unmodified

Side Effects¶

This operation has no architectural side effect beyond producing its destination vector register. It does not implicitly reserve buffers, signal events, or establish memory fences.

Constraints¶

Constraints

Type: Integer element types only (no floating-point). Signedness of the element type determines arithmetic vs logical behavior.
Type match: %lhs, %rhs, and %result MUST have identical element types.
Width match: All three registers MUST have the same vector width N.
Mask width: %mask MUST have width equal to N.
Shift count: Shift counts SHOULD stay within [0, bitwidth(T) - 1]; out-of-range behavior is target-defined.
Active lanes: Only lanes where the mask bit is 1 participate.
Inactive lanes: Destination elements at inactive lanes are unmodified.

Exceptions¶

Exceptions

The verifier rejects non-integer element types, type mismatches, width mismatches, or mask width mismatches.
Any additional illegality stated in the Binary Vector Instructions instruction set page is also part of the contract.

Target-Profile Restrictions¶

Target-Profile Restrictions

	Element Type	CPU Simulator	A2/A3	A5
	Integer types	Simulated	Simulated	Supported

A5 is the primary concrete profile for the vector instructions.

Performance¶

A5 Latency¶

	Element Type	Latency (cycles)	A5 RV
	`i32`	7	`RV_VSHR`

A2/A3 Throughput¶

Metric	Value	Constant
Startup latency	14	`A2A3_STARTUP_BINARY`
Completion latency	17	`A2A3_COMPL_INT_BINOP`
Per-repeat throughput	2	`A2A3_RPT_2`
Pipeline interval	18	`A2A3_INTERVAL`

Examples¶

C Semantics¶

for (int i = 0; i < N; i++)
    dst[i] = src0[i] >> src1[i];  // arithmetic for signed, logical for unsigned

MLIR Usage¶

// Right shift by scalar count (broadcast to all lanes)
%count = pto.vbroadcast %c2 : i32 -> !pto.vreg<64xi32>
%shifted = pto.vshr %data, %count, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>

// Per-lane variable shift
%shifted2 = pto.vshr %data, %counts, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>

Instruction set overview: Binary Vector Instructions
Previous op in instruction set: pto.vshl
Next op in instruction set: pto.vaddc