Matrix Multiplication Optimization


The examples shows two ways of performing matrix multiplication, a simple one that gives moderate performance, and a slightly more complex versions that achieves optimal performance using the e-gcc compiler. The optimal code was partially unrolled to allow the compiler to take advantage of the double-load store of the architecture  and to avoid unnecessary pipeline stalls.

Naive Code:

unsigned matmul_naive(float * restrict a, float * restrict b, float * restrict c)
	int i, j, k;

	for (i=0; i

Optimized Code:

unsigned matmul(float * restrict aa, float * restrict bb, float * restrict cc)
    int i = 0;

	for (i=0; i

Compile Switches:
{-Wall -O3 -std=c99 -mlong-calls -mfp-mode=round-nearest -ffp-contract=fast -funroll-loops}

Posted in White Papers.


  1. I want to try these out, is there a repository with a complete example?
    If not, what are N and K. I’ve been trying different values and the multiplication is not giving me the correct results.

Leave a Reply Using Facebook or Twitter Account