pto.tlog

pto.tlog is part of the Elementwise Tile Tile instruction set.

Summary

Elementwise natural logarithm of a tile.

Mechanism

Elementwise natural logarithm of a tile.

For each element (i, j) in the valid region:

\[ \mathrm{dst}_{i,j} = \log(\mathrm{src}_{i,j}) \]

Syntax

Textual spelling is defined by the PTO ISA syntax-and-operands pages.

Synchronous form:

%dst = tlog %src : !pto.tile<...>

AS Level 1 (SSA)

%dst = pto.tlog %src : !pto.tile<...> -> !pto.tile<...>

AS Level 2 (DPS)

pto.tlog ins(%src : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <auto PrecisionType = LogAlgorithm::DEFAULT, typename TileDataDst, typename TileDataSrc,
          typename... WaitEvents>
PTO_INST RecordEvent TLOG(TileDataDst &dst, TileDataSrc &src, WaitEvents &... events);

PrecisionType has the following values available:

  • LogAlgorithm::DEFAULT: Normal algorithm, faster but with lower precision.
  • LogAlgorithm::HIGH_PRECISION: High precision algorithm, but slower.

Inputs

Operand Role Description
%src Source tile Source tile; read at (i, j) for each (i, j) in dst valid region
%dst Destination tile Destination tile receiving the result
WaitEvents... Optional synchronisation RecordEvent tokens to wait on before issuing the operation

Expected Outputs

Result Type Description
%dst !pto.tile<...> Destination tile; all (i, j) in its valid region contain log(src[i,j]) after the operation

Side Effects

No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.

Constraints

Constraints

  • Valid region:

    • The op uses dst.GetValidRow() / dst.GetValidCol() as the iteration domain.
  • Domain / NaN:

    • Domain behavior (e.g., log(<=0)) is target-defined.

Exceptions

Exceptions

  • Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
  • Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.

Target-Profile Restrictions

Target-Profile Restrictions
  • Implementation checks (NPU):

    • TileData::DType must be one of: float or half;
    • Tile location must be vector (TileData::Loc == TileType::Vec);
    • Static valid bounds: TileData::ValidRow <= TileData::Rows and TileData::ValidCol <= TileData::Cols;
    • Runtime: src.GetValidRow() == dst.GetValidRow() and src.GetValidCol() == dst.GetValidCol();
    • Tile layout must be row-major (TileData::isRowMajor).
  • High precision algorithm:

    • Only available on A5. PrecisionType is ignored on A3.

Performance

A2/A3 Throughput

TLOG compiles to CCE vector instructions via the TUnaryOp.hpp performance model:

Metric Value
Startup latency 13
Completion latency 26 (FP transcendental)
Per-repeat throughput 1
Pipeline interval 18

Examples

#include <pto/pto-inst.hpp>

using namespace pto;

void example() {
  using TileT = Tile<TileType::Vec, float, 16, 16>;
  TileT x, out;
  TLOG(out, x);
  TLOG<LogAlgorithm::HIGH_PRECISION>(out, x);  // A5 only
}

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tlog %src : !pto.tile<...> -> !pto.tile<...>

Manual Mode

# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tlog %src : !pto.tile<...> -> !pto.tile<...>

PTO Assembly Form

%dst = tlog %src : !pto.tile<...>
# AS Level 2 (DPS)
pto.tlog ins(%src : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)