Multiplication of matrices is a fundamental operation in linear algebra and is applied in many engineering and scientific areas. The popular SGEMM function is an example of a standard API used for implementing this operation in its most general form. It is not uncommon to encounter products of very large matrices. For example, in problems solved in finite elements methods, relationships between the model elements are expressed as elements of the state matrices. Stimuli to the state are expressed as
input matrices or vectors and the new state can be calculated. In a previous paper  we demonstrated how the Epiphany multicore processor can perform the NxN multiplication when written as a single-thread, pure standard C program. The compiled C program
achieved about 70% of the peak theoretical performance. Another implementation, written in native ASM, achieved a ratio of about 90% of the peak theoretical performance. These benchmark implementations were limited by the amount of per-core on-chip memory to relatively small matrices (up to 45×45). In this paper we will present a multi-core highly parallel implementation of matrix multiplication of arbitrary size. In order to implement this operation, we used the single -thread C multiplication mentioned above as a building block for the larger operation.
Here is the source code for the project: (matmul.tar)