pto.vshr¶
pto.vshr is part of the Binary Vector Instructions instruction set.
Summary¶
Lane-wise right shift: dst[i] = lhs[i] >> rhs[i] for each active lane. Signed types use arithmetic shift; unsigned types use logical shift.
Mechanism¶
Shifts each element of the left-hand vector right by the per-lane count from the right-hand vector. For each lane i where the predicate is true:
- Signed element types (
i8–i64): arithmetic shift — the sign bit is replicated. - Unsigned element types (
u8–u64): logical shift — zeros are shifted in. - The shift count
rhs[i]is treated as unsigned. - Bits shifted out are discarded.
- Inactive lanes leave the destination unchanged.
Syntax¶
PTO Assembly Form¶
vshr %dst, %lhs, %rhs, %mask : !pto.vreg<NxT>
AS Level 1 (SSA)¶
%result = pto.vshr %lhs, %rhs, %mask : (!pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>) -> !pto.vreg<NxT>
AS Level 2 (DPS)¶
pto.vshr ins(%lhs, %rhs, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>)
outs(%result : !pto.vreg<NxT>)
Supported element types: all integer types (i8–i64, u8–u64).
Inputs¶
| Operand | Type | Description |
|---|---|---|
%lhs |
!pto.vreg<NxT> |
Value to be shifted (left operand) |
%rhs |
!pto.vreg<NxT> |
Per-lane unsigned shift count |
%mask |
!pto.mask<G> |
Predicate mask; lanes where mask bit is 1 are active |
Both source registers MUST have the same integer element type and the same vector width N. The mask width MUST match N.
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
%result |
!pto.vreg<NxT> |
Lane-wise right shift on active lanes; inactive lanes are unmodified |
Side Effects¶
This operation has no architectural side effect beyond producing its destination vector register. It does not implicitly reserve buffers, signal events, or establish memory fences.
Constraints¶
Constraints
- Type: Integer element types only (no floating-point). Signedness of the element type determines arithmetic vs logical behavior.
- Type match:
%lhs,%rhs, and%resultMUST have identical element types. - Width match: All three registers MUST have the same vector width
N. - Mask width:
%maskMUST have width equal toN. - Shift count: Shift counts SHOULD stay within
[0, bitwidth(T) - 1]; out-of-range behavior is target-defined. - Active lanes: Only lanes where the mask bit is 1 participate.
- Inactive lanes: Destination elements at inactive lanes are unmodified.
Exceptions¶
Exceptions
- The verifier rejects non-integer element types, type mismatches, width mismatches, or mask width mismatches.
- Any additional illegality stated in the Binary Vector Instructions instruction set page is also part of the contract.
Target-Profile Restrictions¶
Target-Profile Restrictions
| Element Type | CPU Simulator | A2/A3 | A5 | |
|---|---|---|---|---|
| Integer types | Simulated | Simulated | Supported |
A5 is the primary concrete profile for the vector instructions.
Performance¶
A5 Latency¶
| Element Type | Latency (cycles) | A5 RV | |
|---|---|---|---|
i32 |
7 | RV_VSHR |
A2/A3 Throughput¶
| Metric | Value | Constant | |
|---|---|---|---|
| Startup latency | 14 | A2A3_STARTUP_BINARY |
|
| Completion latency | 17 | A2A3_COMPL_INT_BINOP |
|
| Per-repeat throughput | 2 | A2A3_RPT_2 |
|
| Pipeline interval | 18 | A2A3_INTERVAL |
Examples¶
C Semantics¶
for (int i = 0; i < N; i++)
dst[i] = src0[i] >> src1[i]; // arithmetic for signed, logical for unsigned
MLIR Usage¶
// Right shift by scalar count (broadcast to all lanes)
%count = pto.vbroadcast %c2 : i32 -> !pto.vreg<64xi32>
%shifted = pto.vshr %data, %count, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>
// Per-lane variable shift
%shifted2 = pto.vshr %data, %counts, %active : (!pto.vreg<64xi32>, !pto.vreg<64xi32>, !pto.mask<b32>) -> !pto.vreg<64xi32>
Related Ops / Instruction Set Links¶
- Instruction set overview: Binary Vector Instructions
- Previous op in instruction set: pto.vshl
- Next op in instruction set: pto.vaddc