pto.vcgmin

pto.vcgmin is part of the Reduction Instructions instruction set.

Summary

Per-VLane-group minimum reduction.

Mechanism

The instruction reduces each hardware 32-byte VLane group independently. Within each group, it finds the minimum of the active lanes and writes the result to the low slot of that group; the remaining lanes in each group are zero-filled.

Syntax

PTO Assembly Form

vcgmin %dst, %src, %mask : !pto.vreg<NxT>

AS Level 1 (SSA)

%result = pto.vcgmin %input, %mask : !pto.vreg<NxT>, !pto.mask<G> -> !pto.vreg<NxT>

Inputs

Operand Type Description
%input !pto.vreg<NxT> Source vector register to reduce per VLane group
%mask !pto.mask<G> Predicate mask; inactive lanes do not participate

Expected Outputs

Result Type Description
%result !pto.vreg<NxT> One minimum per 32-byte VLane group, written to the low lane of each group

Side Effects

This operation has no architectural side effect beyond producing its destination values. It does not implicitly reserve buffers, signal events, or establish memory fences.

Constraints

Constraints

  • Grouping is by the hardware 32-byte VLane, not by an arbitrary software subvector.
  • The mask width MUST match N.

Exceptions

Exceptions

  • The verifier rejects illegal operand shapes, unsupported element types, and attribute combinations that are not valid for the selected instruction set or target profile.
  • Any additional illegality stated in the constraints section is also part of the contract.

Target-Profile Restrictions

Target-Profile Restrictions
  • Documented A5 coverage: i16-i32, f16, f32.
  • A5 is the most detailed concrete profile in the current manual; CPU simulation and A2/A3-class targets may support narrower subsets or emulate the behavior while preserving the visible PTO contract.

Examples

for (int g = 0; g < GROUPS; g++) {
    T mn = INF;
    for (int i = 0; i < LANES_PER_GROUP; i++)
        if (mask[g*LANES_PER_GROUP + i] && src[g*LANES_PER_GROUP + i] < mn) mn = src[g*LANES_PER_GROUP + i];
    dst[g*LANES_PER_GROUP] = mn;
}
%result = pto.vcgmin %input, %mask : !pto.vreg<64xf32>, !pto.mask<b32> -> !pto.vreg<64xf32>