pto.vmul¶
pto.vmul is part of the Binary Vector Instructions instruction set.
Summary¶
%result is the lane-wise product.
Mechanism¶
pto.vmul is a pto.v* compute operation.
Syntax¶
PTO Assembly Form¶
vmul %dst, %lhs, %rhs, %mask : !pto.vreg<NxT>
AS Level 1 (SSA)¶
%result = pto.vmul %lhs, %rhs, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G> -> !pto.vreg<NxT>
AS Level 2 (DPS)¶
pto.vmul ins(%lhs, %rhs, %mask : !pto.vreg<NxT>, !pto.vreg<NxT>, !pto.mask<G>)
outs(%result : !pto.vreg<NxT>)
C++ Intrinsic¶
PTO_VMUL_IMPL(result, lhs, rhs, mask);
Inputs¶
| Operand | Type | Description |
|---|---|---|
%lhs |
!pto.vreg<NxT> |
First source vector |
%rhs |
!pto.vreg<NxT> |
Second source vector (multiplied with %lhs lane-wise) |
%mask |
!pto.mask<G> |
Predication mask; inactive lanes produce zero |
Documented A5 types: i16-i32, f16, bf16, f32 (NOT i8/u8).
Expected Outputs¶
| Operand | Type | Description |
|---|---|---|
%result |
!pto.vreg<NxT> |
Lane-wise product of %lhs and %rhs |
Side Effects¶
This operation has no architectural side effect beyond producing its SSA results. It does not implicitly reserve buffers, signal events, or establish memory fences.
Constraints¶
Constraints
- Source and result element types MUST match.
- The A5 profile excludes
i8/u8forms from this instruction set. mask[i] == 0lanes in the result are set to zero (zero-merge predication model).
Exceptions¶
Exceptions
- The verifier rejects illegal operand shapes, unsupported element types, and attribute combinations that are not valid for the selected instruction set or target profile.
- Any additional illegality stated in the constraints section is also part of the contract.
Target-Profile Restrictions¶
Target-Profile Restrictions
- Documented A5 coverage:
i16-i32,f16,bf16,f32(NOTi8/u8). - A5 is the most detailed concrete profile in the current manual; CPU simulation and A2/A3-class targets may support narrower subsets or emulate the behavior while preserving the visible PTO contract.
- Code that depends on an instruction-set-specific type list, distribution mode, or fused form should treat that dependency as target-profile-specific unless the PTO manual states cross-target portability explicitly.
Performance¶
A5 Latency¶
| Element Type | Latency (cycles) | A5 RV |
|---|---|---|
f32 |
8 | RV_VMUL |
f16 |
8 | RV_VMUL |
i32 |
8 | RV_VMUL |
i16 |
8 | RV_VMUL |
A2/A3 Throughput¶
| Metric | Value | Constant |
|---|---|---|
| Startup latency | 14 | A2A3_STARTUP_BINARY |
| Completion: FP | 20 | A2A3_COMPL_FP_MUL |
| Completion: INT | 18 | A2A3_COMPL_INT_MUL |
| Per-repeat throughput | 2 | A2A3_RPT_2 |
| Pipeline interval | 18 | A2A3_INTERVAL |
Examples¶
C Semantics¶
for (int i = 0; i < N; i++)
dst[i] = src0[i] * src1[i];
Integer overflow follows the target-defined behavior. Predicated lanes (where mask[i] == 0) produce zero in the destination.
Related Ops / Instruction Set Links¶
- Instruction set overview: Binary Vector Instructions
- Previous op in instruction set: pto.vsub
- Next op in instruction set: pto.vdiv