Type System¶

PTO uses a compact visible type system, but legality does not stop at raw type names. Type classes tell you what kind of architectural object you are dealing with. Other legality dimensions such as layout, location, valid region, and target profile determine whether a use is actually allowed.

Element Types¶

PTO supports a rich set of element types across floating-point, integer, and specialized categories.

Floating-Point Types¶

Type	SSA Name	Bits	Description	A2/A3	A5
IEEE FP16	`f16` / `half`	16	IEEE 754 half-precision	Yes	Yes
BF16	`bf16` / `bfloat16_t`	16	Brain float 16 (8-bit exponent)	Yes	Yes
IEEE FP32	`f32`	32	IEEE 754 single-precision	Yes	Yes
FP8 E4M3	`f8e4m3` / `float8_e4m3_t`	8	4-bit exponent, 3-bit mantissa	No	Yes
FP8 E5M2	`f8e5m2` / `float8_e5m2_t`	8	5-bit exponent, 2-bit mantissa	No	Yes
HI Float8	`hifloat8_t`	8	High-precision float8	No	Yes
Float4 E1M2x2	`float4_e1m2x2_t`	4	4-bit float4, packed 2x2	No	Yes
Float4 E2M1x2	`float4_e2m1x2_t`	4	4-bit float4, packed 2x2	No	Yes

Integer Types¶

Type	SSA Name	Bits	Signedness	A2/A3	A5
int8	`i8`	8	Signed	Yes	Yes
uint8	`u8`	8	Unsigned	Yes	Yes
int16	`i16`	16	Signed	Yes	Yes
uint16	`u16`	16	Unsigned	Yes	Yes
int32	`i32`	32	Signed	Yes	Yes
uint32	`u32`	32	Unsigned	Yes	Yes
int64	`i64`	64	Signed	Yes	Yes
uint64	`u64`	64	Unsigned	Yes	Yes

Vector Width¶

The vector register width N (the number of lanes) is determined by the element type and the target profile:

Element Type	Vector Width N	Bytes/Register	Notes
f32	64	256 B	64 × 32-bit
f16, bf16	128	256 B	128 × 16-bit
i16, u16	128	256 B	128 × 16-bit
i8, u8	256	256 B	256 × 8-bit
f8e4m3, f8e5m2	256	256 B	256 × 8-bit

Vector width is portable across all profiles: CPU simulation, A2/A3, and A5 all present the same N value for each element type. The difference is that A5 executes vector operations natively on hardware, while CPU/A2/A3 emulate them.

Vector Register Types¶

Vector register SSA type: !pto.vreg<NxDTYPE>

!pto.vreg<64xf32>   -- 64 lanes of f32
!pto.vreg<128xf16>  -- 128 lanes of f16
!pto.vreg<256xi8>   -- 256 lanes of i8

Tile Buffer Types¶

Tile buffer SSA type (see Tiles And Valid Regions for full parameter list):

!pto.tile<loc=vec, f32, 16, 16, RowMajor, NoneBox, None, Zero>
!pto.tile_buf<loc=mat, bf16, 16, 16, RowMajor, NoneBox, None, Null>
!pto.tile_buf<loc=left, int8, 16, 16, RowMajor, RowMajor, NZ, Null>

NaN and Inf Behavior¶

For floating-point types, PTO follows IEEE 754 semantics with the following hardware-specific variation points:

Behavior	Rule
Quiet NaN propagation	Quiet NaN in → quiet NaN out (preserves signaling bit)
Signaling NaN	Signaling NaN may be quieted by hardware before use
Inf arithmetic	Inf is produced and propagated as IEEE 754 requires
Denormalized numbers	Hardware may flush denormals to zero (FTZ behavior)
Rounding	Controlled by `rnd` attribute: `rne` (default), `rz`, `rp`, `rm`

The FTZ (flush-to-zero) behavior for denormals is hardware-specific:

A2/A3: Denormals are flushed to zero by default. Hardware does not generate subnormal results during arithmetic; operands that are denormal are treated as zero.
A5: Denormals are flushed to zero by default. The sethf32mode/settf32mode instructions can modify this behavior on A5-class targets.
CPU simulator: Denormals may be preserved or flushed depending on the host CPU's SSE/AVX floating-point configuration; behavior is not guaranteed to match the hardware targets.

Type Conversion Rules¶

Between Floating-Point Types¶

Source	Dest	Behavior
f16 → bf16	Conversion	Reinterpret f16 bits as bf16 (no numerical conversion)
bf16 → f16	Conversion	Reinterpret bf16 bits as f16 (no numerical conversion)
f16/bf16 → f32	Promotion	Extend to f32; exact representable values are preserved
f32 → f16/bf16	Narrowing	Round according to `rnd` attribute; NaN/Inf handled per IEEE 754
f8 → f16/f32	Promotion	Extend; exact representable values are preserved
f16/f32 → f8	Narrowing	Round according to `rnd` attribute; may overflow to Inf

Between Integer Types¶

Source	Dest	Behavior
Widening (e.g., i8 → i16)	Zero/_sign extend	Zero-extend for unsigned; sign-extend for signed
Narrowing (e.g., i16 → i8)	Truncation	Truncate high bits; may lose significant bits
i32 → f32	Conversion	Exact for values in [-2^24, 2^24]; may lose precision outside
f32 → i32	Conversion	Truncates toward zero; on overflow, result is undefined behavior on A2/A3 and A5 (saturates on CPU simulator)

Between Float and Integer¶

Source	Dest	Behavior
f32 → i8/u8/i16/u16	Narrowing	Truncate; may overflow
f32 → i32/u32	Narrowing	Truncate; may overflow
i8/u8 → f32	Promotion	Exact for small values; may lose precision for large values

Type Conversion Operations¶

Operation	Instruction Set	Description
`pto.tcvt`	Tile	Elementwise type conversion on tile buffers
`pto.vcvt`	Vector	Vector register type conversion
`pto.vtrc`	Vector	Vector truncate/round (e.g., f32 → f16)
`pto.vci`	Vector	Compress to integer (vector → integer result)

Constraints¶

Constraints

Instruction set pages must define accepted operand/result classes.
Type errors must stay distinguishable from deeper legality failures (shape, layout, location intent, target profile).
Vector instruction docs must make vector-register, mask, pointer, and alignment state explicit.
Tile instruction docs must make tile role, shape, and valid-region interactions explicit.
No implicit type promotion: tadd(t, i8_tile, f32_immediate) is illegal unless an explicit tcvt converts one operand first.

Cases That Are Not Allowed¶

Cases That Are Not Allowed

Treating type class checks as though they cover every backend legality fact.
Conflating scalar state with tile or vector payload state.
Documenting vector and tile payload classes as if they were interchangeable.
Relying on implicit type conversion without an explicit tcvt/vcvt.