pto.pstu¶
pto.pstu is part of the Predicate Load Store instruction set.
Summary¶
Stream predicate register to UB with alignment state tracking. High-throughput variant that relaxes alignment requirements at the cost of weaker write atomicity guarantees.
Mechanism¶
pto.pstu writes a predicate word from !pto.mask<G> to a UB address while tracking and updating alignment state. Unlike psts, this operation does not require 64-bit alignment and may batch multiple predicate writes into a single DMA transaction.
For alignment state align_in, predicate mask, and base address base:
The %align_out state carries forward into the next pstu call, enabling streaming writes without per-op synchronization.
Syntax¶
PTO Assembly Form¶
%align_out, %base_out = pto.pstu %align_in, %mask, %base_in : !pto.align, !pto.mask<G>, !pto.ptr<T, ub> -> !pto.align, !pto.ptr<T, ub>
AS Level 1 (SSA)¶
%align_out, %base_out = pto.pstu %align_in, %mask, %base_in : !pto.align, !pto.mask<G>, !pto.ptr<T, ub> -> !pto.align, !pto.ptr<T, ub>
AS Level 2 (DPS)¶
pto.pstu ins(%align_in, %mask, %base_in : !pto.align, !pto.mask<G>, !pto.ptr<T, ub>)
outs(%align_out, %base_out : !pto.align, !pto.ptr<T, ub>)
C++ Intrinsic¶
vector_align alignData;
vector_bool src;
__ubuf__ uint32_t *base;
pstu(alignData, src, base);
Inputs¶
| Operand | Type | Description |
|---|---|---|
%align_in |
!pto.align |
Alignment state from previous pstu or pld-instruction set operation |
%mask |
!pto.mask<G> |
Predicate register to stream-store |
%base_in |
!pto.ptr<T, ub> |
UB base address (no alignment requirement) |
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
%align_out |
!pto.align |
Updated alignment state for next pstu call |
%base_out |
!pto.ptr<T, ub> |
Incremented base address (base + predicate width in bytes) |
Side Effects¶
- Writes the predicate register value to UB memory at the target address.
- Updates alignment state for use by subsequent
pstucalls. - UB memory at the target address is modified; write atomicity per 64-bit word is not guaranteed.
Constraints¶
Constraints
- Alignment state:
%align_inMUST be the alignment state from the previouspstucall, or from apld-instruction set operation. Using an uninitialized alignment state is illegal. - Alignment state chaining: Programs MUST pass
%align_outfrom onepstuto the%align_inof the next. Breaking the chain without re-initializing the alignment state is illegal. - Write atomicity: Unlike
psts, the 64-bit predicate word is NOT guaranteed to be atomically written. Programs that require exact predicate state restoration MUST usepsts, notpstu. - UB address space:
%base_inMUST have address spaceub.
Exceptions¶
Exceptions
- Illegal if
%align_inis not initialized from a priorpstuorpldoperation. - Illegal if alignment state chain is broken.
- Illegal if
%base_inis not a UB-space pointer. pstuMUST NOT be used when exact predicate save/restore is required.
Target-Profile Restrictions¶
Target-Profile Restrictions
| Aspect | CPU Sim | A2/A3 | A5 |
|---|---|---|---|
| Stream predicate store | Not supported | Supported | Supported |
| Alignment state tracking | Not applicable | Supported | Supported |
| Write atomicity guarantee | Not applicable | Not guaranteed | Not guaranteed |
CPU simulator does not implement pstu. Portable programs MUST use psts for exact predicate persistence or provide a CPU-sim fallback.
Examples¶
Streaming predicate writes¶
#include <pto/pto-inst.hpp>
using namespace pto;
void stream_masks(Ptr<ub_space_t, ub_t> dst_base,
predicate_t* masks,
int count) {
predicate_t align_state = 0;
for (int i = 0; i < count; ++i) {
PSTU(masks[i], dst_base, align_state, align_state);
dst_base = dst_base + (predicate_width_bytes);
}
}
SSA form — chaining stream stores¶
// Initialize alignment state (e.g., from a dummy load or zero)
%align0 = pto.plds %ub_dummy : !pto.ptr<i64, ub> -> !pto.mask<G>
// Stream store first predicate; align_out carries forward
%align1, %base1 = pto.pstu %align0, %mask0, %base0 : !pto.align, !pto.mask<G>, !pto.ptr<i64, ub> -> !pto.align, !pto.ptr<i64, ub>
// Stream store second predicate using updated alignment state
%align2, %base2 = pto.pstu %align1, %mask1, %base1 : !pto.align, !pto.mask<G>, !pto.ptr<i64, ub> -> !pto.align, !pto.ptr<i64, ub>
Note
For exact predicate save/restore across kernel boundaries, use psts instead. pstu is intended for high-throughput streaming scenarios where some loss of per-word atomicity is acceptable.
Related Ops / Instruction Set Links¶
- Instruction set overview: Predicate Load Store
- Previous op in instruction set: pto.psti
- Next op in instruction set: (none — last in instruction set)
- Control-shell overview: Control and configuration