pto.set_cross_core¶
pto.set_cross_core is part of the Pipeline Sync instruction set.
Summary¶
Signal an event to another core in a cluster (A2A3). Uses mode2 broadcast semantics: one signal reaches both vector subblocks simultaneously; the Cube blocks until both subblocks have signaled back.
Mechanism¶
pto.set_cross_core signals an event between execution units in a Core Cluster on the A2A3 platform.
Mode2 semantics (A2A3 cluster, 1 Cube : 2 Vector subblocks):
- C→V broadcast: One
set_cross_corefrom the Cube (AIC) atomically increments the semaphore for both AIV0 and AIV1 subblocks simultaneously. - V→C reduce: When the Cube calls
wait_flag_dev, it blocks until both AIV0 and AIV1 have calledset_cross_coreon the same semaphore. Only then does the Cube unblock.
This is a hardware reduce operation: the Cube need only issue one wait_flag_dev to synchronize with both subblocks, rather than one per subblock.
The semaphore is a counter: incremented by set_cross_core, decremented by wait_flag_dev. A wait_flag_dev unblocks when the counter reaches zero.
Syntax¶
PTO Assembly Form¶
pto.set_cross_core %core_id, %event_id : i64, i64
AS Level 1 (SSA)¶
pto.set_cross_core %core_id, %event_id : i64, i64
C++ Intrinsic¶
The installed 3510 public CCE headers do not expose a same-name set_cross_core(...) intrinsic. The shipped sync implementation uses the internal cross-core helper shown below.
ffts_cross_core_sync(PIPE_MTE3, AscendC::GetffstMsg(0x02, AscendC::SYNC_AIC_AIV_FLAG));
Inputs¶
| Operand | Type | Description |
|---|---|---|
%core_id |
i64 |
Target core identifier (subblock selector: 0 = AIV0, 1 = AIV1) |
%event_id |
i64 |
Semaphore/event identifier |
Expected Outputs¶
None. This form is defined by its side effect on inter-core synchronization state.
Side Effects¶
- Atomically signals the named event to the target core.
- On mode2: increments semaphore for both subblocks simultaneously.
Constraints¶
Constraints
- A2A3 only:
set_cross_coreis only available on the A2A3 profile. Programs that use this operation MUST provide a fallback path for other profiles. - Semaphore pool: The pool has 16 physical semaphore IDs per cluster. The hardware implements a 4-bit counter (0–15).
set_cross_coreincrements the counter;wait_flag_devdecrements it. On A2/A3: if the counter would overflow past 15, the hardware wraps the counter and signals an overflow error. On CPU simulator: not applicable. - Broadcast vs. per-subblock: The broadcast behavior is specific to mode2. Other modes (if supported) may have different semantics.
- core_id meaning:
core_id = 0targets AIV0 subblock;core_id = 1targets AIV1 subblock. Other values are illegal.
Exceptions¶
Exceptions
- Illegal on non-A2A3 profiles.
- Illegal if
%event_idis outside the valid range (0–15). - Illegal if the hardware counter would overflow.
Target-Profile Restrictions¶
Target-Profile Restrictions
| Aspect | CPU Sim | A2/A3 | A5 |
|---|---|---|---|
set_cross_core |
Not available | Supported (mode2) | Use set_intra_block |
| Mode2 broadcast semantics | Not applicable | Supported | Not applicable |
| Semaphore pool size | Not applicable | 16 IDs | Not applicable |
| Per-subblock signaling | Not applicable | 1 set reaches both | 1 set per subblock |
CPU simulator does not implement set_cross_core. Portable programs MUST guard this operation with profile checks or provide CPU-sim fallback.
Examples¶
A2A3: Cube broadcasts to both vector subblocks¶
#include <pto/pto-inst.hpp>
using namespace pto;
// Cube signals both AIV subblocks simultaneously
SET_CROSS_CORE(/* core_id */ 0, /* event_id */ 0); // broadcast to both AIV0 and AIV1
SSA form — Cube→Vector broadcast¶
// Cube: broadcast completion signal to both AIV0 and AIV1
pto.set_cross_core %c0_i64, %c0_i64 : i64, i64
// Both AIV subblocks receive the signal (atomic broadcast)
// AIV0: signals back when its work is done
pto.set_cross_core %c0_i64, %c1_i64 : i64, i64 // signals event 1
// AIV1: signals back when its work is done
pto.set_cross_core %c1_i64, %c1_i64 : i64, i64 // signals event 1
// Cube: waits for BOTH AIV0 and AIV1 (reduce)
pto.wait_flag_dev %c0_i64, %c1_i64 : i64, i64
// Unblocks only when both subblocks have signaled
SSA form — Vector→Cube reduce¶
// AIV0: signal Cube that vector work segment is done
pto.set_cross_core %c0_i64, %c2_i64 : i64, i64
// AIV1: signal Cube that vector work segment is done
pto.set_cross_core %c1_i64, %c2_i64 : i64, i64
// Cube: waits for both subblocks on one semaphore (reduce)
pto.wait_flag_dev %c2_i64 : i64
Related Ops / Instruction Set Links¶
- Instruction set overview: Pipeline Sync
- Previous op in instruction set: pto.mem_bar
- Next op in instruction set: pto.wait_flag_dev
- Control-shell overview: Control and configuration