Matrix Multiplication in C: Guide from Basics to Optimization

目次

1. Introduction

C is a language widely used in system development and embedded programming, and matrix calculations also play an important role as part of it. This article explains how to implement matrix multiplication in C, covering everything from basics to optimization and acceleration. Target audience for this article:
  • Beginners learning C
  • People who want to know how to implement matrix calculations
  • People interested in optimizing or speeding up matrix multiplication
By reading this article, you can learn the basic methods for implementing matrix multiplication in C and techniques for computing efficiently. First, let’s learn the fundamentals of matrix multiplication.

2. Basics of Matrix Multiplication

2.1 What is a Matrix? A Clear Explanation of the Basics of Matrix Multiplication

A matrix (Matrix) is a collection of data with numbers arranged in rows and columns. It is generally represented as a two-dimensional array with rows and columns. For example, here is a 2×3 matrix:
A =
[ 1  2  3 ]
[ 4  5  6 ]
Matrix multiplication is an operation that computes the result of multiplying two matrices, and it must satisfy the following condition:
  • The number of columns of the left matrix A and the number of rows of the right matrix B must be equal.
For example, multiplying a 2×3 matrix by a 3×2 matrix yields a 2×2 matrix.
A =
[ 1  2  3 ]
[ 4  5  6 ]

B =
[ 7  8 ]
[ 9 10 ]
[11 12 ]
The matrix product C = A × B is computed as follows.
C =
[ 58  64 ]
[139 154 ]

2.2 How to Compute Matrix Multiplication | Understanding with Concrete Examples and Formulas

Matrix multiplication is obtained by multiplying each row element with each column element and summing the results. In general, when multiplying matrix A (size m × n) by matrix B (size n × p), the resulting matrix C has size m × p. General formula for matrix multiplication:
C[i][j] = Σ(A[i][k] * B[k][j])  (k=0 to n-1)
Here,
  • C[i][j] is the element in row i, column j of matrix C
  • A[i][k] is the element in row i, column k of matrix A
  • B[k][j] is the element in row k, column j of matrix B

2.3 Properties of Matrix Multiplication (Associativity, Distributive Law, etc.)

Matrix multiplication has the following properties.
  1. Associativity (Associativity)
   (A × B) × C = A × (B × C)
However, matrix multiplication is not commutative.
  1. Distributive Law (Distributive Law)
   A × (B + C) = A × B + A × C
  1. Identity Matrix (Identity Matrix) The identity matrix I is a matrix with all diagonal elements equal to 1 and all other elements equal to 0, and it satisfies the following relationship for any matrix A:
   A × I = I × A = A
Understanding the properties of matrix multiplication enables optimization of calculations and broadens its applications.

3. How to Implement Matrix Multiplication in C | Sample Code Included

3.1 How to Represent a Matrix as a 2D Array in C

Using Fixed-Size 2D Arrays

Representing a matrix with a fixed-size 2D array is simple, easy to understand, and suitable for beginners.
#include <stdio.h>

#define ROWS 2
#define COLS 3

int main() {
    int matrix[ROWS][COLS] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    // Display matrix
    for (int i = 0; i < ROWS; i++) {
        for (int j = 0; j < COLS; j++) {
            printf("%d ", matrix[i][j]);
        }
        printf("n");
    }
    return 0;
}

Using Dynamic Memory Allocation

If you want to handle matrix sizes flexibly, you use malloc() to allocate memory dynamically.
#include <stdio.h>
#include <stdlib.h>

int main() {
    int rows = 2, cols = 3;
    int **matrix;

    // Allocate memory
    matrix = (int **)malloc(rows * sizeof(int *));
    for (int i = 0; i < rows; i++) {
        matrix[i] = (int *)malloc(cols * sizeof(int));
    }

    // Input data
    int value = 1;
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            matrix[i][j] = value++;
        }
    }

    // Display matrix
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            printf("%d ", matrix[i][j]);
        }
        printf("n");
    }

    // Free memory
    for (int i = 0; i < rows; i++) {
        free(matrix[i]);
    }
    free(matrix);

    return 0;
}

3.2 Implementing Matrix Multiplication in C | Explanation of Basic Code

Here we present a basic program that calculates the product of two matrices.
#include <stdio.h>

#define N 3  // Size of the matrix

// Function to calculate matrix product
void matrixMultiplication(int A[N][N], int B[N][N], int C[N][N]) {
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            C[i][j] = 0; // Initialize
            for (int k = 0; k < N; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
}

// Function to display a matrix
void printMatrix(int matrix[N][N]) {
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            printf("%d ", matrix[i][j]);
        }
        printf("n");
    }
}

int main() {
    int A[N][N] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9}
    };

    int B[N][N] = {
        {9, 8, 7},
        {6, 5, 4},
        {3, 2, 1}
    };

    int C[N][N];  // Result matrix

    // Compute matrix product
    matrixMultiplication(A, B, C);

    // Display results
    printf("Matrix A:n");
    printMatrix(A);
    printf("Matrix B:n");
    printMatrix(B);
    printf("Matrix product C:n");
    printMatrix(C);

    return 0;
}
In this code, the matrixMultiplication() function uses three nested loops to compute the product and sum element.

4. Matrix Multiplication Optimization Techniques

4.1 Speeding Up Matrix Multiplication with Loop Optimization

In basic matrix multiplication code, a triple loop is used for the computation. However, by reordering these loops, it is possible to improve cache efficiency and increase processing speed.

Memory Access and Cache Efficiency

CPU caches operate efficiently when memory access locality is high. In matrix multiplication, the following loop order changes are effective.
#include <stdio.h>

#define N 3  // matrix size

// Change loop order to improve cache efficiency
void optimizedMatrixMultiplication(int A[N][N], int B[N][N], int C[N][N]) {
    for (int i = 0; i < N; i++) {
        for (int k = 0; k < N; k++) {  // make k outermost
            for (int j = 0; j < N; j++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
}

int main() {
    int A[N][N] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
    int B[N][N] = {{9, 8, 7}, {6, 5, 4}, {3, 2, 1}};
    int C[N][N] = {0};

    optimizedMatrixMultiplication(A, B, C);

    // display results
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            printf("%d ", C[i][j]);
        }
        printf("n");
    }

    return 0;
}

4.2 Optimizing Matrix Multiplication with the Strassen Method

The Strassen method (Strassen Algorithm) is an algorithm that can reduce the computational complexity of matrix multiplication from O(N³) → O(N^{2.81}).

C Implementation of the Strassen Method

#include <stdio.h>
#include <stdlib.h>

// Strassen method for 2×2 matrices
void strassen(int A[2][2], int B[2][2], int C[2][2]) {
    int M1 = (A[0][0] + A[1][1]) * (B[0][0] + B[1][1]);
    int M2 = (A[1][0] + A[1][1]) * B[0][0];
    int M3 = A[0][0] * (B[0][1] - B[1][1]);
    int M4 = A[1][1] * (B[1][0] - B[0][0]);
    int M5 = (A[0][0] + A[0][1]) * B[1][1];
    int M6 = (A[1][0] - A[0][0]) * (B[0][0] + B[0][1]);
    int M7 = (A[0][1] - A[1][1]) * (B[1][0] + B[1][1]);

    C[0][0] = M1 + M4 - M5 + M7;
    C[0][1] = M3 + M5;
    C[1][0] = M2 + M4;
    C[1][1] = M1 - M2 + M3 + M6;
}

int main() {
    int A[2][2] = {{1, 2}, {3, 4}};
    int B[2][2] = {{5, 6}, {7, 8}};
    int C[2][2];

    strassen(A, B, C);

    // display results
    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 2; j++) {
            printf("%d ", C[i][j]);
        }
        printf("n");
    }

    return 0;
}

4.3 Parallel Processing with OpenMP

Since each element of a matrix multiplication can be computed independently, parallel processing can accelerate it. In C, using OpenMP makes it easy to implement parallel processing.
#include <stdio.h>
#include <omp.h>

#define N 3  // matrix size

// Parallel matrix multiplication using OpenMP
void parallelMatrixMultiplication(int A[N][N], int B[N][N], int C[N][N]) {
    #pragma omp parallel for collapse(2)
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            C[i][j] = 0;
            for (int k = 0; k < N; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
}

int main() {
    int A[N][N] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
    int B[N][N] = {{9, 8, 7}, {6, 5, 4}, {3, 2, 1}};
    int C[N][N] = {0};

    parallelMatrixMultiplication(A, B, C);

    // display results
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            printf("%d ", C[i][j]);
        }
        printf("n");
    }

    return 0;
}

5. Common Errors and Solutions in C Matrix Multiplication

5.1 How to Prevent Memory Leaks

What Is a Memory Leak?

If you allocate dynamic memory (malloc) and then free is not used to release it, the unnecessary memory remains allocated even after the program ends, resulting in a memory leak. This can cause memory usage to continuously grow and may make the program’s operation unstable.

Incorrect Code Example

#include <stdio.h>
#include <stdlib.h>

int main() {
    int rows = 3, cols = 3;
    int **matrix;

    // Allocate memory
    matrix = (int **)malloc(rows * sizeof(int *));
    for (int i = 0; i < rows; i++) {
        matrix[i] = (int *)malloc(cols * sizeof(int));
    }

    return 0;  // Memory leak occurs because free is omitted
}

Solution

Allocated memory must always be freed.
#include <stdio.h>
#include <stdlib.h>

int main() {
    int rows = 3, cols = 3;
    int **matrix;

    // Allocate memory
    matrix = (int **)malloc(rows * sizeof(int *));
    for (int i = 0; i < rows; i++) {
        matrix[i] = (int *)malloc(cols * sizeof(int));
    }

    // Free memory
    for (int i = 0; i < rows; i++) {
        free(matrix[i]);
    }
    free(matrix);

    return 0;
}

5.2 Causes and Solutions for Out-of-Bounds Array Access Errors

What Is an Out-of-Bounds Array Access?

Accessing memory outside the bounds of an array can cause unexpected behavior and, in some cases, result in a segmentation fault (Segmentation Fault) error.

Incorrect Code Example

#include <stdio.h>

#define N 3

int main() {
    int matrix[N][N] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9}
    };

    // Out-of-bounds access (when N = 3, matrix[3][0] does not exist)
    printf("%dn", matrix[3][0]);  

    return 0;
}

Solution

#include <stdio.h>

#define N 3

int main() {
    int matrix[N][N] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9}
    };

    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {
            printf("%d ", matrix[i][j]);
        }
        printf("n");
    }

    return 0;
}

5.3 Error Analysis Using the GDB Debugger

Using the C language debugging tool GDB (GNU Debugger) is handy for pinpointing where errors occur.

Basic Usage of GDB

  1. Include debug information when compiling
   gcc -g program.c -o program
  1. Start GDB
   gdb ./program
  1. Run the program
   run
  1. Check the stack trace when an error occurs
   backtrace

Example Using GDB

(gdb) run
Starting program: ./program

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555142 in main () at program.c:10
10    printf("%dn", matrix[3][0]);  // Error occurs here

6. Applications of Matrix Multiplication

6.1 Transformation Matrices in Graphics

2D/3D Transformations Using Matrices

In computer graphics (CG), object translation (linear translation), rotation, and scaling (enlargement/reduction) are computed using matrices. 2D transformation matrix 2D coordinate transformations use a 3×3 matrix as follows.
[x']   [ a  b  tx ]   [x]
[y'] = [ c  d  ty ] * [y]
[ 1]   [ 0  0   1 ]   [1]

Applying 2D Transformation Matrices in C

#include <stdio.h>
#include <math.h>

#define PI 3.14159265

// Function to rotate a 2D coordinate
void rotate2D(float x, float y, float angle, float *x_out, float *y_out) {
    float rad = angle * PI / 180.0;  // Convert angle to radians
    float rotationMatrix[2][2] = {
        {cos(rad), -sin(rad)},
        {sin(rad), cos(rad)}
    };

    // Compute matrix product
    *x_out = rotationMatrix[0][0] * x + rotationMatrix[0][1] * y;
    *y_out = rotationMatrix[1][0] * x + rotationMatrix[1][1] * y;
}

int main() {
    float x = 1.0, y = 0.0;
    float angle = 90.0;
    float x_new, y_new;

    rotate2D(x, y, angle, &x_new, &y_new);
    printf("Rotated coordinates: (%.2f, %.2f)n", x_new, y_new);

    return 0;
}

6.2 Weight Computation in Neural Networks for Machine Learning

In a neural network, the input data X is multiplied by the weight W to compute the input to the next layer.
Y = X × W

Weight Computation for a Simple Neural Network in C

#include <stdio.h>

// Function to compute matrix product
void matrixMultiply(float X[2][3], float W[3][2], float Y[2][2]) {
    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 2; j++) {
            Y[i][j] = 0;
            for (int k = 0; k < 3; k++) {
                Y[i][j] += X[i][k] * W[k][j];
            }
        }
    }
}

int main() {
    float X[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    float W[3][2] = {
        {0.1, 0.2},
        {0.3, 0.4},
        {0.5, 0.6}
    };

    float Y[2][2];

    matrixMultiply(X, W, Y);

    // Display results
    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 2; j++) {
            printf("%.2f ", Y[i][j]);
        }
        printf("n");
    }

    return 0;
}

6.3 State Transition in Physical Simulations

State Transition Matrix

For example, if a 2D object’s position is (x, y) and its velocity is (vx, vy), the new state S' after a time interval t can be expressed by the following matrix product.
S' = T × S
[x']   [ 1  0  t  0 ]   [ x ]
[y'] = [ 0  1  0  t ] * [ y ]
[vx']  [ 0  0  1  0 ]   [vx ]
[vy']  [ 0  0  0  1 ]   [vy ]

Implementation Example in C

#include <stdio.h>

// Compute state transition
void stateTransition(float T[4][4], float S[4], float S_new[4]) {
    for (int i = 0; i < 4; i++) {
        S_new[i] = 0;
        for (int j = 0; j < 4; j++) {
            S_new[i] += T[i][j] * S[j];
        }
    }
}

int main() {
    float T[4][4] = {
        {1, 0, 1, 0},
        {0, 1, 0, 1},
        {0, 0, 1, 0},
        {0, 0, 0, 1}
    };

    float S[4] = {0, 0, 1, 1};  // Initial position (0,0) velocity (1,1)
    float S_new[4];

    stateTransition(T, S, S_new);

    printf("New state: %.2f %.2f %.2f %.2fn", S_new[0], S_new[1], S_new[2], S_new[3]);

    return 0;
}

7. FAQ (Frequently Asked Questions)

7.1 Why does the computation speed slow down when the matrix size becomes larger?

Because the computational complexity grows as O(N³)

The basic matrix multiplication uses three nested loops, so the computational complexity is O(N³). When the matrix size doubles, the number of operations becomes 8 times larger, causing processing time to increase dramatically as the size grows.

Solutions

  1. Leverage optimization techniques
  • Change loop order (improve cache efficiency)
  • Use Strassen’s algorithm (reduce complexity to O(N^{2.81}))
  • Leverage parallelization technologies such as OpenMP
  1. Use specialized numerical computation libraries
  • BLAS (Basic Linear Algebra Subprograms)
  • Intel MKL (Math Kernel Library)
  • OpenBLAS (open-source high-performance linear algebra library)

7.2 When is Strassen’s algorithm effective?

When computing the product of large matrices

Strassen’s algorithm is one where the larger the matrix size, the greater the speedup effect. However, there are the following limitations.

Advantages

  • The complexity becomes O(N^{2.81}), enabling fast computation of large matrix products
  • Compared to the usual O(N³) method, it is especially advantageous when N is large

Disadvantages

  • For small matrices, overhead is large and it can actually be slower
  • Recursive calls can increase memory usage

7.3 What numerical errors should be watched for in matrix multiplication?

Accumulation of errors due to floating-point arithmetic

In C, calculations using floating-point types (float or double) can produce rounding errors. Since matrix multiplication involves many multiplications and additions, these errors tend to accumulate.

Ways to reduce numerical errors

  1. Use double type
  • float (32bit) vs double (64bit) which has higher precision
  • Example: float precision is about 7 digits, double about 15 digits
  1. Leverage high-precision arithmetic libraries
  • GNU MP (GMP) and Intel MKL can be used to perform high-precision calculations

7.4 How is matrix multiplication implemented in languages other than C?

Python (using NumPy)

In Python, using NumPy makes it very easy to compute matrix products.
import numpy as np

A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])

C = np.dot(A, B)  # matrix multiplication

print(C)

Java (using standard library)

In Java, an implementation using 2D arrays is also possible.
public class MatrixMultiplication {
    public static void main(String[] args) {
        int[][] A = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
        int[][] B = { {9, 8, 7}, {6, 5, 4}, {3, 2, 1} };
        int[][] C = new int[3][3];

        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                for (int k = 0; k < 3; k++) {
                    C[i][j] += A[i][k] * B[k][j];
                }
            }
        }

        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                System.out.print(C[i][j] + " ");
            }
            System.out.println();
        }
    }
}