pto.tpartmul¶
pto.tpartmul is part of the Irregular And Complex instruction set.
Summary¶
Partial elementwise multiply with handling of mismatched valid regions. On A2/A3 and A5, when only one input is valid at an element, the result copies that input value; on the CPU simulator, the same behavior applies.
Mechanism¶
Performs elementwise multiplication over the destination valid region. When both src0 and src1 are valid at an element, the result is their product; when only one input is valid there, the result copies that input value. On A2/A3 and A5, when neither input is valid at an element, the result is undefined; on the CPU simulator, the same behavior applies. It belongs to the tile instructions and carries architecture-visible behavior that is not reducible to a plain elementwise compute pattern.
For each element (i, j) in the destination valid region:
Syntax¶
Textual spelling is defined by the PTO ISA syntax-and-operands pages.
Synchronous form:
%dst = tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
AS Level 1 (SSA)¶
%dst = pto.tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
AS Level 2 (DPS)¶
pto.tpartmul ins(%src0, %src1 : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
IR Level 1 (SSA)¶
%dst = pto.tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.tpartmul ins(%src0, %src1 : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1, typename... WaitEvents>
PTO_INST RecordEvent TPARTMUL(TileDataDst &dst, TileDataSrc0 &src0, TileDataSrc1 &src1, WaitEvents &... events);
Inputs¶
src0is the first source tile.src1is the second source tile.dstnames the destination tile. The operation iterates over dst's valid region.
Expected Outputs¶
dst holds the elementwise partial product: both valid gives product; one valid gives the valid value.
Side Effects¶
No architectural side effects beyond producing the destination tile. Does not implicitly fence unrelated traffic.
Constraints¶
Constraints
General constraints / checks¶
-
dst,src0, andsrc1must use the same element type. -
The destination valid region defines the result domain.
-
For each element in the destination valid region:
- if both inputs are valid, the instruction applies its elementwise operator;
-
if only one input is valid, the result copies that input value.
-
If
dsthas a zero valid region, the instruction returns early. -
Supported partial-validity patterns require at least one source tile to have a valid region exactly equal to
dst, while the other source tile's valid region must not exceeddstin either dimension. -
Supported element types:
int32_t,int16_t,half,float. -
Supported element types:
uint8_t,int8_t,uint16_t,int16_t,uint32_t,int32_t,half,float,bfloat16_t.
Exceptions¶
Exceptions
- Illegal operand tuples, unsupported types, invalid layout combinations, or unsupported target-profile modes are rejected by the verifier or by the selected backend instruction set.
- Programs must not rely on behavior outside the documented legal domain of this operation, even if one backend currently accepts it.
Target-Profile Restrictions¶
Target-Profile Restrictions
- On A2/A3 and A5, handling of validity patterns not explicitly listed above results in undefined behavior; on the CPU simulator, the same behavior applies.
dst,src0, andsrc1must all be row-major (isRowMajor).
No additional restriction is documented for this target.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src0, src1, dst;
TPARTMUL(dst, src0, src1);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using TileT = Tile<TileType::Vec, float, 16, 16>;
TileT src0, src1, dst;
TASSIGN(src0, 0x1000);
TASSIGN(src1, 0x2000);
TASSIGN(dst, 0x3000);
TPARTMUL(dst, src0, src1);
}
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
PTO Assembly Form¶
%dst = tpartmul %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tpartmul ins(%src0, %src1 : !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
Related Ops / Instruction Set Links¶
- Instruction set overview: Irregular And Complex
- Previous op in instruction set: pto.tpartadd
- Next op in instruction set: pto.tpartmax