← Linear Algebra Intuition Matrix Transformations →

Vectors, Matrices & Operations

> Every neural network is just matrix multiplication with extra steps.

Type: Build

Languages: Python, Julia

Prerequisites: Phase 1, Lesson 01 (Linear Algebra Intuition)

Time: ~60 minutes

Learning Objectives

Build a Matrix class with element-wise operations, matrix multiplication, transpose, determinant, and inverse
Distinguish element-wise multiplication from matrix multiplication and explain when each applies
Implement a single dense neural network layer (relu(W @ x + b)) using only the from-scratch Matrix class
Explain broadcasting rules and how bias addition works in neural network frameworks

The Problem

You want to build a neural network. You read the code and see this:

output = activation(weights @ input + bias)

That @ is matrix multiplication. The weights are a matrix. The input is a vector. If you do not know what those operations do, this line is magic. If you do know, it is the entire forward pass of a layer in three operations.

Every image your model processes is a matrix of pixel values. Every word embedding is a vector. Every layer of every neural network is a matrix transformation. You cannot build AI systems without being fluent in matrix operations the same way you cannot write code without understanding variables.

This lesson builds that fluency from scratch.

The Concept

Vectors: ordered lists of numbers

A vector is a list of numbers with a direction and magnitude. In AI, vectors represent data points, features, or parameters.

v = [3, 4]        -- a 2D vector
w = [1, 0, -2]    -- a 3D vector

A 2D vector [3, 4] points to coordinates (3, 4) on a plane. Its length (magnitude) is 5 (the 3-4-5 triangle).

Matrices: grids of numbers

A matrix is a 2D grid. Rows and columns. An m x n matrix has m rows and n columns.

A = | 1  2  3 |     -- 2x3 matrix (2 rows, 3 columns)
    | 4  5  6 |

In neural networks, weight matrices transform input vectors into output vectors. A layer with 784 inputs and 128 outputs uses a 128x784 weight matrix.

Why shapes matter

Matrix multiplication has a strict rule: (m x n) @ (n x p) = (m x p). The inner dimensions must match.

(128 x 784) @ (784 x 1) = (128 x 1)
  weights       input       output

Inner dimensions: 784 = 784  -- valid

If you get a shape mismatch error in PyTorch, this is why.

The operations map

Operation	What it does	Neural network use
Addition	Element-wise combine	Adding bias to output
Scalar multiply	Scale every element	Learning rate * gradients
Matrix multiply	Transform vectors	Layer forward pass
Transpose	Flip rows and columns	Backpropagation
Determinant	Single number summary	Checking invertibility
Inverse	Undo a transformation	Solving linear systems
Identity	Do-nothing matrix	Initialization, residual connections

Element-wise vs matrix multiplication

This distinction trips up beginners constantly.

Element-wise: multiply matching positions. Both matrices must be the same shape.

| 1  2 |   | 5  6 |   | 5  12 |
| 3  4 | * | 7  8 | = | 21 32 |

Matrix multiplication: dot products of rows and columns. Inner dimensions must match.

| 1  2 |   | 5  6 |   | 1*5+2*7  1*6+2*8 |   | 19  22 |
| 3  4 | @ | 7  8 | = | 3*5+4*7  3*6+4*8 | = | 43  50 |

Different operations, different results, different rules.

Broadcasting

When you add a bias vector to a matrix of outputs, the shapes do not match. Broadcasting stretches the smaller array to fit.

| 1  2  3 |   +   [10, 20, 30]
| 4  5  6 |

Broadcasting stretches the vector across rows:

| 1  2  3 |   | 10  20  30 |   | 11  22  33 |
| 4  5  6 | + | 10  20  30 | = | 14  25  36 |

Every modern framework does this automatically. Understanding it prevents confusion when shapes seem wrong but the code runs.

Build It

Step 1: Vector class

class Vector:
    def __init__(self, data):
        self.data = list(data)
        self.size = len(self.data)

    def __repr__(self):
        return f"Vector({self.data})"

    def __add__(self, other):
        return Vector([a + b for a, b in zip(self.data, other.data)])

    def __sub__(self, other):
        return Vector([a - b for a, b in zip(self.data, other.data)])

    def __mul__(self, scalar):
        return Vector([x * scalar for x in self.data])

    def dot(self, other):
        return sum(a * b for a, b in zip(self.data, other.data))

    def magnitude(self):
        return sum(x ** 2 for x in self.data) ** 0.5

Step 2: Matrix class with core operations

class Matrix:
    def __init__(self, data):
        self.data = [list(row) for row in data]
        self.rows = len(self.data)
        self.cols = len(self.data[0])
        self.shape = (self.rows, self.cols)

    def __repr__(self):
        rows_str = "\n  ".join(str(row) for row in self.data)
        return f"Matrix({self.shape}):\n  {rows_str}"

    def __add__(self, other):
        return Matrix([
            [self.data[i][j] + other.data[i][j] for j in range(self.cols)]
            for i in range(self.rows)
        ])

    def __sub__(self, other):
        return Matrix([
            [self.data[i][j] - other.data[i][j] for j in range(self.cols)]
            for i in range(self.rows)
        ])

    def scalar_multiply(self, scalar):
        return Matrix([
            [self.data[i][j] * scalar for j in range(self.cols)]
            for i in range(self.rows)
        ])

    def element_wise_multiply(self, other):
        return Matrix([
            [self.data[i][j] * other.data[i][j] for j in range(self.cols)]
            for i in range(self.rows)
        ])

    def matmul(self, other):
        return Matrix([
            [
                sum(self.data[i][k] * other.data[k][j] for k in range(self.cols))
                for j in range(other.cols)
            ]
            for i in range(self.rows)
        ])

    def transpose(self):
        return Matrix([
            [self.data[j][i] for j in range(self.rows)]
            for i in range(self.cols)
        ])

    def determinant(self):
        if self.shape == (1, 1):
            return self.data[0][0]
        if self.shape == (2, 2):
            return self.data[0][0] * self.data[1][1] - self.data[0][1] * self.data[1][0]
        det = 0
        for j in range(self.cols):
            minor = Matrix([
                [self.data[i][k] for k in range(self.cols) if k != j]
                for i in range(1, self.rows)
            ])
            det += ((-1) ** j) * self.data[0][j] * minor.determinant()
        return det

    def inverse_2x2(self):
        det = self.determinant()
        if det == 0:
            raise ValueError("Matrix is singular, no inverse exists")
        return Matrix([
            [self.data[1][1] / det, -self.data[0][1] / det],
            [-self.data[1][0] / det, self.data[0][0] / det]
        ])

    @staticmethod
    def identity(n):
        return Matrix([
            [1 if i == j else 0 for j in range(n)]
            for i in range(n)
        ])

Step 3: See it work

A = Matrix([[1, 2], [3, 4]])
B = Matrix([[5, 6], [7, 8]])

print("A + B =", (A + B).data)
print("A @ B =", A.matmul(B).data)
print("A^T =", A.transpose().data)
print("det(A) =", A.determinant())
print("A^-1 =", A.inverse_2x2().data)

I = Matrix.identity(2)
print("A @ A^-1 =", A.matmul(A.inverse_2x2()).data)

Step 4: Connect to neural networks

import random

inputs = Matrix([[0.5], [0.8], [0.2]])
weights = Matrix([
    [random.uniform(-1, 1) for _ in range(3)]
    for _ in range(2)
])
bias = Matrix([[0.1], [0.1]])

def relu_matrix(m):
    return Matrix([[max(0, val) for val in row] for row in m.data])

pre_activation = weights.matmul(inputs) + bias
output = relu_matrix(pre_activation)

print(f"Input shape: {inputs.shape}")
print(f"Weight shape: {weights.shape}")
print(f"Output shape: {output.shape}")
print(f"Output: {output.data}")

This is a single dense layer: output = relu(W @ x + b). Every dense layer in every neural network does exactly this.

Use It

NumPy does everything above in fewer lines and orders of magnitude faster.

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("A + B =\n", A + B)
print("A * B (element-wise) =\n", A * B)
print("A @ B (matrix multiply) =\n", A @ B)
print("A^T =\n", A.T)
print("det(A) =", np.linalg.det(A))
print("A^-1 =\n", np.linalg.inv(A))
print("I =\n", np.eye(2))

inputs = np.random.randn(3, 1)
weights = np.random.randn(2, 3)
bias = np.array([[0.1], [0.1]])
output = np.maximum(0, weights @ inputs + bias)

print(f"\nNeural network layer: {weights.shape} @ {inputs.shape} = {output.shape}")
print(f"Output:\n{output}")

The @ operator in Python calls __matmul__. NumPy implements it with optimized BLAS routines written in C and Fortran. Same math, 100x faster.

Broadcasting in NumPy:

matrix = np.array([[1, 2, 3], [4, 5, 6]])
bias = np.array([10, 20, 30])
print(matrix + bias)

NumPy automatically broadcasts the 1D bias across both rows. This is how bias addition works in every neural network framework.

Ship It

This lesson produces a prompt for teaching matrix operations through geometric intuition. See outputs/prompt-matrix-operations.md.

The Matrix class built here is the foundation for the mini neural network framework we build in Phase 3, Lesson 10.

Exercises

Verify the inverse. Multiply A @ A.inverse_2x2() and confirm you get the identity matrix. Try it with three different 2x2 matrices. What happens when the determinant is zero?

Implement 3x3 inverse. Extend the Matrix class to compute inverses for 3x3 matrices using the adjugate method. Test it against NumPy's np.linalg.inv.

Build a two-layer network. Using only your Matrix class (no NumPy), create a two-layer neural network: input (3) -> hidden (4) -> output (2). Initialize random weights, run a forward pass, and verify all shapes are correct.

Key Terms

Term	What people say	What it actually means
Vector	"An arrow"	An ordered list of numbers. In AI: a point in high-dimensional space.
Matrix	"A table of numbers"	A linear transformation. It maps vectors from one space to another.
Matrix multiply	"Just multiply the numbers"	Dot products between every row of the first matrix and every column of the second. Order matters.
Transpose	"Flip it"	Swap rows and columns. Turns an m x n matrix into n x m. Critical in backpropagation.
Determinant	"Some number from the matrix"	Measures how much the matrix scales area (2D) or volume (3D). Zero means the transformation crushes a dimension.
Inverse	"Undo the matrix"	The matrix that reverses the transformation. Only exists when the determinant is not zero.
Identity matrix	"The boring matrix"	The matrix equivalent of multiplying by 1. Used in residual connections (ResNets).
Broadcasting	"Magic shape fixing"	Stretching a smaller array to match a larger one by repeating along missing dimensions.
Element-wise	"Regular multiplication"	Multiply matching positions. Both arrays must have the same shape (or be broadcastable).