pto.trowexpand

pto.trowexpand is part of the Reduce And Expand instruction set.

Summary

Broadcast the first element of each source row across the destination row.

Mechanism

Broadcast the first element of each source row across the destination row.

Let R = dst.GetValidRow() and C = dst.GetValidCol(). For 0 <= i < R and 0 <= j < C:

\[ \mathrm{dst}_{i,j} = \mathrm{src}_{i,0} \]

Syntax

Textual spelling is defined by the PTO ISA syntax-and-operands pages.

Synchronous form:

%dst = trowexpand %src : !pto.tile<...> -> !pto.tile<...>

AS Level 1 (SSA)

%dst = pto.trowexpand %src : !pto.tile<...> -> !pto.tile<...>

AS Level 2 (DPS)

pto.trowexpand ins(%src : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <typename TileDataDst, typename TileDataSrc, typename... WaitEvents>
PTO_INST RecordEvent TROWEXPAND(TileDataDst &dst, TileDataSrc &src, WaitEvents &... events);

Inputs

  • src is the source tile.
  • dst names the destination tile. The operation iterates over dst's valid region.

Expected Outputs

dst holds the row-wise broadcast: each row i of dst is filled with src[i,0].

Side Effects

No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.

Constraints

Constraints

  • Tile Type: dst and src must be TileType::Vec.

  • Tile layout: ND fractal (isRowMajor and SLayout::NoneBox) for both src and dst.

Exceptions

Exceptions

  • Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
  • Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.

Target-Profile Restrictions

Target-Profile Restrictions

Implementation Checks (NPU)

  • Data type: A2A3/A5 element types must be one of: int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, half, bfloat16_t, or float.

  • Runtime valid-region checks:

    • A2A3: returns early if any of dstValidRow, dstValidCol, srcValidRow, srcValidCol is zero.
    • A5: asserts srcValidRow == dstValidRow and asserts srcValidRow != 0 && srcValidCol != 0.

Examples

Auto

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto() {
  using SrcT = Tile<TileType::Vec, float, 16, 16>;
  using DstT = Tile<TileType::Vec, float, 16, 16>;
  SrcT src;
  DstT dst;
  TROWEXPAND(dst, src);
}

Manual

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual() {
  using SrcT = Tile<TileType::Vec, float, 16, 16>;
  using DstT = Tile<TileType::Vec, float, 16, 16>;
  SrcT src;
  DstT dst;
  TASSIGN(src, 0x1000);
  TASSIGN(dst, 0x2000);
  TROWEXPAND(dst, src);
}

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.trowexpand %src : !pto.tile<...> -> !pto.tile<...>

Manual Mode

# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.trowexpand %src : !pto.tile<...> -> !pto.tile<...>

PTO Assembly Form

%dst = trowexpand %src : !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.trowexpand ins(%src : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)