Visual explainer · Neural networks

Weights & Harness

The two essential ideas behind every trained model — what the numbers are, how they're shaped, and what keeps the whole process from flying apart.

The numbers that hold knowledge

A neural network is, at its core, a large pile of floating-point numbers — the weights. They live on the connections between nodes. When input flows in, every weight multiplies and scales the signal passing through it. The shape of those numbers is what the model has learned.

Think of each weight as a dial. Turn it up and that signal path becomes loud. Turn it toward zero and the path goes quiet. A negative weight flips the signal.

Network signal flow — select a mode
input hidden 1 hidden 2 output

Forward pass: a signal (colored bead) travels from each input, multiplied at every connection, accumulating at each node before reaching the output.

Tune the weights, change the answer

Below is a tiny 3-input, 1-output network. Drag any input weight and watch the prediction shift in real time. This is exactly what gradient descent does — but automatically, across millions of weights at once.

Input weights
w₁ +1.20
w₂ −0.50
w₃ +0.80
net = σ(w₁·x₁ + w₂·x₂ + w₃·x₃)
bias +0.10
Prediction
73%
class A (positive)
class A
class B

Insight: flip w₁ to a large negative value while w₃ is strongly positive — the model becomes uncertain. This is the same tension a trained model resolves by finding weights that minimise error across thousands of examples.

The system that shapes the weights

A harness is the scaffolding that controls the training process — the data pipelines, loss functions, optimisers, and evaluation loops that steer a network toward useful weight values.

Without a harness, you have random weights and noise. With a well-designed harness, those same weights converge to something that can recognise cats, translate sentences, or write code.

Training loop — step through the cycle
01
Forward pass
Input flows through the network, weights multiply each signal.
02
Loss
How wrong was the output? Loss measures the gap from truth.
03
Backward pass
Gradients flow back — each weight learns its share of blame.
04
Update
Optimiser nudges every weight slightly in the right direction.
Epoch 0, step 0
Loss over steps
Weight matrix (4×4 hidden)
negative
positive

What the harness contains

A production training harness is more than a loop. It includes the tools that keep training stable, reproducible, and measurable.

DATA PIPELINE
Feeds the network
Shuffles, batches, augments. Keeps GPUs fed and prevents the network from memorising the order examples arrive.
LOSS FUNCTION
Defines "wrong"
Cross-entropy for classification, MSE for regression, RLHF reward for alignment. The loss is a needle pointing at how to improve.
OPTIMISER
Moves the weights
Adam, SGD, Adafactor. Uses gradients and momentum to decide how large each weight update should be.
CHECKPOINTING
Saves the state
Snapshots of weights at intervals. If training diverges, you roll back. The checkpoint is the model — it ships weights, not code.

The key insight: the harness is transient — it exists only during training. What it produces, the weights, is permanent. When you download a model or call an API, you're using nothing but the final weight values that the harness shaped.