pto.ppack¶
pto.ppack is part of the Predicate Generation And Algebra instruction set.
Summary¶
Narrowing pack: concatenate two N-bit predicate segments into one 2N-bit predicate register, selecting one segment by a partition token.
Mechanism¶
pto.ppack takes a source predicate register and a partition token, and writes a 2N-bit predicate register by filling the selected half with the source bits and zero-filling the other half. It is the inverse of punpack.
For source predicate src with N bits and partition token P:
\[ \mathrm{dst}_{2N} = \begin{cases} \mathrm{ZERO}(N) \Vert \mathrm{src}_N & \text{if } P = \text{LOWER} \\ \mathrm{src}_N \Vert \mathrm{ZERO}(N) & \text{if } P = \text{HIGHER} \end{cases} \]
Syntax¶
PTO Assembly Form¶
%dst = pto.ppack %src, "PART" : !pto.mask<G> -> !pto.mask<G>
AS Level 1 (SSA)¶
%dst = pto.ppack %src, "PART" : !pto.mask<G> -> !pto.mask<G>
AS Level 2 (DPS)¶
pto.ppack ins(%src, "PART" : !pto.mask<G>) outs(%dst : !pto.mask<G>)
C++ Intrinsic¶
vector_bool dst;
vector_bool src;
ppack(dst, src, __cce_simd::LOWER);
Inputs¶
| Operand | Type | Description |
|---|---|---|
%src |
!pto.mask<G> |
Source N-bit predicate |
"PART" |
string attribute | Partition token: "LOWER" or "HIGHER" |
Expected Outputs¶
| Result | Type | Description |
|---|---|---|
%dst |
!pto.mask<G> |
2N-bit predicate with the source in the selected half |
Side Effects¶
None.
Constraints¶
Constraints
- Partition token: MUST be
"LOWER"or"HIGHER". Other tokens are illegal. - Destination width: The destination predicate is always 2N bits. Programs MUST ensure the destination context expects a 2N-bit predicate. Attempting to use a 2N-bit result in an N-bit context without explicit extraction via
punpackis illegal. - Source width: The source predicate MUST be N bits (half the destination width). Mismatched widths are illegal.
- Zero-fill behavior: The non-selected half of the destination is always zero-filled, not sign-extended or replicated.
Exceptions¶
Exceptions
- Illegal if the partition token is not
"LOWER"or"HIGHER". - Illegal if source and destination predicate widths are not in a 1:2 ratio.
- Illegal if the operation is used in a context that does not expect a 2N-bit result.
Target-Profile Restrictions¶
Target-Profile Restrictions
| Aspect | CPU Sim | A2/A3 | A5 |
|---|---|---|---|
| Pack operation | Simulated | Supported | Supported |
LOWER / HIGHER tokens |
Supported | Supported | Supported |
Examples¶
Combine two b32 predicates for f32 (64 lanes)¶
#include <pto/pto-inst.hpp>
using namespace pto;
void pack_for_f32(RegBuf<predicate_t>& dst,
const RegBuf<predicate_t>& lo,
const RegBuf<predicate_t>& hi) {
// dst = [ZERO(32) | lo] = hi concatenated with zero
PPACK(dst, lo, "LOWER");
}
SSA form¶
// %rem = 47
// %lo: lanes 0-31 active (from plt_b32 iteration 1)
// %hi: lanes 0-14 active (from plt_b32 iteration 2, rem = 15)
// Pack %lo into lower half of 64-bit predicate
%full_lo = pto.ppack %lo, "LOWER" : !pto.mask<G> -> !pto.mask<G>
// Pack %hi into upper half of 64-bit predicate
%full_hi = pto.ppack %hi, "HIGHER" : !pto.mask<G> -> !pto.mask<G>
// OR them together to get full 64-lane tail mask
%tail = pto.por %full_lo, %full_hi, %full_lo : !pto.mask<G>, !pto.mask<G>, !pto.mask<G> -> !pto.mask<G>
Construct a full-width mask from two half-width masks¶
// Pack lower half
%dst_lower = pto.ppack %src_lower, "LOWER" : !pto.mask<G> -> !pto.mask<G>
// Pack upper half
%dst_upper = pto.ppack %src_upper, "HIGHER" : !pto.mask<G> -> !pto.mask<G>
// Combine with OR to get full-width predicate
%combined = pto.por %dst_lower, %dst_upper, %dst_lower : !pto.mask<G>, !pto.mask<G>, !pto.mask<G> -> !pto.mask<G>
Related Ops / Instruction Set Links¶
- Instruction set overview: Predicate Generation And Algebra
- Previous op in instruction set: pto.plt_b32
- Next op in instruction set: pto.punpack
- Control-shell overview: Control and configuration