Tile-Scalar And Immediate Instruction Set¶
Tile-scalar operations combine a tile operand with a scalar value or immediate operand. The scalar is broadcast to match the tile shape. Comparison variants produce a predicate tile.
Operations¶
| Operation | Description | Category | C++ Intrinsic |
|---|---|---|---|
| pto.tadds | Elementwise addition with scalar | Binary | TADDS(dst, src, scalar) |
| pto.taxpy | AXPY-style fused tile-scalar update | Fused binary | TAXPY(dst, src0, scalar, src1) |
| pto.tsubs | Elementwise subtraction with scalar | Binary | TSUBS(dst, src, scalar) |
| pto.tmuls | Elementwise multiplication with scalar | Binary | TMULS(dst, src, scalar) |
| pto.tpows | Elementwise power with scalar exponent | Binary | TPOWS(dst, base, exp, tmp) |
| pto.tdivs | Elementwise division with scalar | Binary | TDIVS(dst, src, scalar) |
| pto.tfmods | Elementwise modulo with scalar | Binary | TFMODS(dst, src, scalar) |
| pto.trems | Elementwise remainder with scalar | Binary | TREMS(dst, src, scalar) |
| pto.tmins | Elementwise minimum with scalar | Binary | TMINS(dst, src, scalar) |
| pto.tmaxs | Elementwise maximum with scalar | Binary | TMAXS(dst, src, scalar) |
| pto.tands | Elementwise AND with scalar | Binary | TANDS(dst, src, scalar) |
| pto.tors | Elementwise OR with scalar | Binary | TORS(dst, src, scalar) |
| pto.txors | Elementwise XOR with scalar | Binary | TXORS(dst, src, scalar) |
| pto.tshls | Shift left by scalar | Binary | TSHLS(dst, src, shift) |
| pto.tshrs | Shift right by scalar | Binary | TSHRS(dst, src, shift) |
| pto.tlrelu | Leaky ReLU with scalar slope | Binary | TLRELU(dst, src, slope) |
| pto.taddsc | Saturating add with scalar | Binary | TADDSC(dst, src, scalar) |
| pto.tsubsc | Saturating subtract with scalar | Binary | TSUBSC(dst, src, scalar) |
| pto.texpands | Compare tile with scalar, produce predicate | Comparison | TEXPMDS(dst, src, scalar) |
| pto.tcmps | Compare tile with scalar, produce predicate | Comparison | TCMPS(dst, src, scalar, cmp) |
| pto.tsels | Select from two tiles based on scalar predicate | Selection | TSELS(dst, src0, src1, pred) |
Mechanism¶
For each lane (r, c) in the destination's valid region:
The scalar operand is broadcast to all lanes. Comparison operations produce a predicate tile: lane (r, c) is 1 where the condition holds, 0 otherwise.
TAXPY belongs in this family because its defining contract is a tile payload update using a scalar coefficient:
Scalar Operand¶
The scalar operand may be:
- A scalar register value (
!pto.scalar<T>) - A compile-time immediate constant
- A runtime scalar value passed as a parameter
The scalar type must be compatible with the tile element type. No implicit type conversion is performed.
Saturating Variants¶
TADDSC and TSUBSC perform saturating arithmetic (clamp to type min/max on overflow/underflow), in contrast to TADDS/TSUBS which use wrapping semantics.
Type Support by Target Profile¶
| Element Type | CPU Simulator | A2/A3 | A5 |
|---|---|---|---|
| f32 (float) | Yes | Yes | Yes |
| f16 (half) | Yes | Yes | Yes |
| bf16 (bfloat16_t) | Yes | Yes | Yes |
| i8 / u8 | Yes | Yes | Yes |
| i16 / u16 | Yes | Yes | Yes |
| i32 / u32 | Yes | Yes | Yes |
| i64 / u64 | Yes | Yes | Yes |
Constraints¶
Constraints
- The scalar type MUST be compatible with the tile element type.
- Shift operations (
TSHLS,TSHRS) interpret the scalar as an unsigned integer shift count. - Saturating variants (
TADDSC,TSUBSC) clamp results to type min/max on overflow/underflow. - Comparison variants produce a predicate tile, not a numeric tile.
Cases That Are Not Allowed¶
Cases That Are Not Allowed
- MUST NOT use a scalar type that is not compatible with the tile element type.
- MUST NOT use a shift count
>=element bit-width. - MUST NOT rely on implicit type promotion between scalar and tile types.
C++ Intrinsic¶
#include <pto/pto-inst.hpp>
using namespace pto;
// Arithmetic with scalar
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TADDS(TileDst& dst, TileSrc& src, ScalarT scalar);
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TMULS(TileDst& dst, TileSrc& src, ScalarT scalar);
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TMAXS(TileDst& dst, TileSrc& src, ScalarT scalar);
// Saturating arithmetic with scalar
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TADDSC(TileDst& dst, TileSrc& src, ScalarT scalar);
// Shift by scalar (shift is unsigned integer)
template <typename TileDst, typename TileSrc>
PTO_INST RecordEvent TSHLS(TileDst& dst, TileSrc& src, uint32_t shift);
// Leaky ReLU: dst[i,j] = (src[i,j] > 0) ? src[i,j] : slope * src[i,j]
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TLRELU(TileDst& dst, TileSrc& src, ScalarT slope);
// Comparison with scalar (produces predicate tile)
template <typename TileDst, typename TileSrc, typename ScalarT>
PTO_INST RecordEvent TCMPS(TileDst& dst, TileSrc& src, ScalarT scalar, CompareMode cmp);
See Also¶
- Tile instruction set — Instruction set overview
- Tile instruction set — Instruction Set description