Scalable Analyses<p>We upstreamed an SME code generator for tensor processing primitives to the LIBXSMM library. Small matrix-matrix multiplications are one of the supported primitives, and the code generation can be tested with a few commands:</p><p>```<br>git clone <a href="https://github.com/libxsmm/libxsmm.git" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/libxsmm/libxsmm.git</span><span class="invisible"></span></a><br>cd libxsmm; make -j BLAS=0<br>cd samples/xgemm; make -j<br>./gemm_kernel F32 F32 F32 F32 512 512 512 512 512 512 \<br> 1 1 0 0 0 1 0 0 0 nopf nobr 0 1 10000 0<br>```</p><p><a href="https://scalable.uni-jena.de/research/2025/03/31/sme-kernels.html" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">scalable.uni-jena.de/research/</span><span class="invisible">2025/03/31/sme-kernels.html</span></a></p><p><a href="https://fosstodon.org/tags/aarch64" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>aarch64</span></a> <a href="https://fosstodon.org/tags/sme" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>sme</span></a> <a href="https://fosstodon.org/tags/apple" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>apple</span></a> <a href="https://fosstodon.org/tags/m4" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>m4</span></a> <a href="https://fosstodon.org/tags/hpc" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>hpc</span></a> <a href="https://fosstodon.org/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a></p>