pto.vgather2_bc¶
pto.vgather2_bc is part of the Vector Load Store instruction set.
Summary¶
Gather with broadcast, conditioned by mask.
Mechanism¶
pto.vgather2_bc is part of the PTO vector memory/data-movement instruction set. It keeps UB addressing, distribution, mask behavior, and any alignment-state threading explicit in SSA form rather than hiding those details in backend-specific lowering.
Syntax¶
PTO Assembly Form¶
vgather2_bc %result, %source, %offsets, %mask
AS Level 1 (SSA)¶
%result = pto.vgather2_bc %source, %offsets, %mask : !pto.ptr<T, ub>, !pto.vreg<NxI>, !pto.mask<G> -> !pto.vreg<NxT>
Inputs¶
%source is the UB base pointer, %offsets contains gather indices, and
%mask gates which lanes participate.
Expected Outputs¶
%result is the gathered vector.
Side Effects¶
This operation reads UB-visible storage and returns SSA results. It does not by itself allocate buffers, signal events, or establish a fence.
Constraints¶
Constraints
This is a backward-compatible instruction set. Masked-off lanes do not participate in address coalescing and do not trigger address overflow exceptions; their destination lanes are zero-filled.
Exceptions¶
Exceptions
- It is illegal to use addresses outside the required UB-visible space or to violate the alignment/distribution contract of the selected form.
- Masked-off lanes or inactive blocks do not make an otherwise-illegal address valid unless the operation text explicitly says so.
- Any additional illegality stated in the constraints section is also part of the contract.
Target-Profile Restrictions¶
Target-Profile Restrictions
- A5 is the most detailed concrete profile in the current manual; CPU simulation and A2/A3-class targets may support narrower subsets or emulate the behavior while preserving the visible PTO contract.
- Code that depends on an instruction-set-specific type list, distribution mode, or fused form should treat that dependency as target-profile-specific unless the PTO manual states cross-target portability explicitly.
Performance¶
Timing Disclosure¶
The current public VPTO timing material for PTO micro instructions remains limited.
For pto.vgather2_bc, those public sources describe the instruction semantics, operand legality, and pipeline placement, but they do not publish a numeric latency or steady-state throughput.
| Metric | Status | Source Basis |
|---|---|---|
| A5 latency | Not publicly published | Current public VPTO timing material |
| Steady-state throughput | Not publicly published | Current public VPTO timing material |
If software scheduling or performance modeling depends on the exact cost of pto.vgather2_bc, treat that cost as target-profile-specific and measure it on the concrete backend rather than inferring a manual constant.
Examples¶
%result = pto.vgather2_bc %source, %offsets, %mask : !pto.ptr<T, ub>, !pto.vreg<NxI>, !pto.mask<G> -> !pto.vreg<NxT>
Related Ops / Instruction Set Links¶
- Instruction set overview: Vector Load Store
- Previous op in instruction set: pto.vgatherb
- Next op in instruction set: pto.vsts