Current PTO ISA Scope¶
The current PTO ISA manual defines the instruction inventory and architecture surfaces documented below. This page is the current scope statement for the manual.
Instruction Sets¶
PTO defines five named instruction sets with explicit per-op reference pages:
- Tile instructions:
pto.t*operations together withpto.mgatherandpto.mscatter - Vector micro instructions:
pto.v*operations - Scalar and control instructions:
pto.*operations used for synchronization, DMA control, predicate construction, and machine-visible control - Communication instructions: inter-NPU collective, point-to-point, and notification operations under
docs/isa/comm/ - System scheduling instructions: TPipe/TMPipe producer-consumer and resource-lifetime operations under
docs/isa/system/
Inventory Summary¶
The current manual documents:
- 127 tile instructions
- 100 vector micro instructions
- 48 scalar and control instructions
- 11 communication instructions
- 3 system scheduling instructions
That yields 289 named instructions in the current reference set.
Tile Instruction Inventory¶
View and Tile Buffer¶
make_tensor_view, get_tensor_view_dim, get_tensor_view_stride, tensor_view_addr, partition_view, alloc_tile, subset, set_validshape, tile_buf_addr
Sync And Config¶
tsync, tassign, talias, sethf32mode, settf32mode, setfmatrix, set_img2col_rpt, set_img2col_padding, subview, get_scale_addr
Elementwise Tile-Tile¶
tabs, tadd, taddc, tand, tcmp, tcvt, tdiv, texp, tpow, tfmod, tlog, tmax, tmin, tmul, tneg, tnot, tor, tprelu, trecip, trelu, trem, trsqrt, tsel, tshl, tshr, tsqrt, tsub, tsubc, txor
Tile-Scalar And Immediate¶
tadds, taddsc, taxpy, tands, tcmps, tdivs, texpands, tfmods, tlrelu, tmaxs, tmins, tmuls, tpows, tors, trems, tsels, tshls, tshrs, tsubs, tsubsc, txors
Reduce And Expand¶
tcolexpand, tcolexpandadd, tcolexpanddiv, tcolexpandexpdif, tcolexpandmax, tcolexpandmin, tcolexpandmul, tcolexpandsub, tcolargmax, tcolargmin, tcolmax, tcolmin, tcolprod, tcolsum, trowargmax, trowargmin, trowexpand, trowexpandadd, trowexpanddiv, trowexpandexpdif, trowexpandmax, trowexpandmin, trowexpandmul, trowexpandsub, trowmax, trowmin, trowprod, trowsum
Memory And Data Movement¶
tload, tprefetch, tstore, mgather, mscatter
Matrix And Matrix-Vector¶
tgemv, tgemv_acc, tgemv_bias, tgemv_mx, tmatmul, tmatmul_acc, tmatmul_bias, tmatmul_mx
Layout And Rearrangement¶
tconcat, textract, tfillpad, tfillpad_expand, tfillpad_inplace, timg2col, tinsert, tmov, tpack, treshape, ttrans
Irregular And Complex¶
tci, tdequant, tgather, tgatherb, thistogram, tmrgsort, tpartadd, tpartmax, tpartmin, tpartmul, tprint, tquant, trandom, tscatter, tsort32, ttri
Vector Micro-Instruction Inventory¶
Vector Load-Store¶
vgather2, vgather2_bc, vgatherb, vldas, vlds, vldus, vldsx2, vscatter, vsld, vsldb, vsst, vsstb, vsta, vstar, vstas, vsts, vstu, vstur, vstus, vstsx2
Predicate And Materialization¶
vbr, vdup
Unary Vector Operations¶
vabs, vbcnt, vcls, vexp, vln, vmov, vneg, vnot, vrec, vrelu, vrsqrt, vsqrt
Binary Vector Operations¶
vadd, vaddc, vand, vdiv, vmax, vmin, vmul, vor, vshl, vshr, vsub, vsubc, vxor
Vector-Scalar Operations¶
vaddcs, vadds, vands, vlrelu, vmaxs, vmins, vmuls, vors, vshls, vshrs, vsubcs, vsubs, vxors
Conversion Operations¶
vci, vcvt, vtrc
Reduction Operations¶
vcadd, vcgadd, vcgmax, vcgmin, vcmax, vcmin, vcpadd
Compare And Select¶
vcmp, vcmps, vsel, vselr, vselrv2
Data Rearrangement¶
vdintlv, vdintlvv2, vintlv, vintlvv2, vpack, vperm, vshift, vslide, vsqz, vsunpack, vusqz, vzunpack
SFU And DSA Operations¶
vaddrelu, vaddreluconv, vaxpy, vexpdif, vmrgsort, vmula, vmulconv, vmull, vprelu, vsort32, vsubrelu, vtranspose
Scalar And Control Instruction Inventory¶
Pipeline Sync¶
get_buf, mem_bar, pipe_barrier, rls_buf, set_cross_core, set_flag, set_intra_block, wait_flag, wait_flag_dev, wait_intra_core
DMA Copy¶
copy_gm_to_ubuf, copy_ubuf_to_gm, copy_ubuf_to_ubuf, set_loop_size_outtoub, set_loop_size_ubtoout, set_loop1_stride_outtoub, set_loop1_stride_ubtoout, set_loop2_stride_outtoub, set_loop2_stride_ubtoout
Predicate Load-Store¶
pld, pldi, plds, pst, psti, psts, pstu
Predicate Generation And Algebra¶
pand, pdintlv_b8, pge_b16, pge_b32, pge_b8, pintlv_b16, plt_b16, plt_b32, plt_b8, pnot, por, ppack, psel, pset_b16, pset_b32, pset_b8, punpack, pxor
Communication Instruction Inventory¶
tbroadcast, tget, tget_async, tgather, tnotify, tput, tput_async, treduce, tscatter, ttest, twait
System Scheduling Instruction Inventory¶
tfree, tpop, tpush