MathLabs

High school & University 29 Rooms Neural networks AI

Teaching objectives

Neural networks are the engine behind voice recognition, machine translation, and language models. This lab guides upper-secondary/university students from a single artificial neuron to a multilayer network capable of learning on its own, building every piece with pencil and calculator before watching it work on screen. Across 29 rooms in eight sections, the journey goes from "what is a neuron?" all the way to training a real network with backpropagation.

What you'll learn

Classification is introduced visually: separating one region from another inside a square.
An artificial neuron (weighted sum + threshold) is modelled and its output computed by hand for concrete inputs.
Weights and bias are designed to implement the AND and OR logical functions; students discover that XOR is not linearly separable with a single neuron (one straight line isn't enough).
The sigmoid function σ(z) = 1/(1+e⁻ᶻ) is studied: its shape, approximate computation, and its elegant derivative σ′ = σ(1−σ).
Squared error E = ½(y−ŷ)² is introduced and computed numerically for different weights.
Gradient descent is understood: the "walking downhill" metaphor, the effect of learning rate η, and the problem of neuronal saturation.
A 2-2-1 network (hidden layer) is analysed: the full forward pass and why stacking layers solves XOR.
Backpropagation is understood: error flows backwards, each weight is adjusted in proportion to its contribution to the mistake.
The lab closes with linear regression as a limiting case: a neuron without activation trained on data points recovers least squares.
Extra rooms offer a free Playground to explore architectures and datasets in real time.

Key mathematical ideas

A neuron's output is y = f(w₁x₁ + w₂x₂ + b), where f is the activation function.
The sigmoid σ(z) = 1/(1+e⁻ᶻ) compresses any real value into the interval (0, 1); its derivative is σ′(z) = σ(z)·(1−σ(z)).
Mean squared error E = ½(y−ŷ)² measures the distance between prediction and target; minimising it drives learning.
Gradient descent updates each weight as w ← w − η·∂E/∂w; η (learning rate) controls the step size.
Backpropagation applies the chain rule layer by layer to compute ∂E/∂w throughout the network.
A linear neuron trained with gradient descent converges to the least-squares solution.
Linear separability divides what one layer can learn from what requires several: XOR needs at least one hidden layer.

Room-by-room contents

Room 1 · What is a neural network?

Laboratory introduction: what neural networks are, what they are used for, and what will be explored room by room.

Student tasks

Read the introduction and get a general sense of the journey ahead.

Room 2 · What is a neuron?

An interactive artificial neuron is shown: inputs, weights, bias and a step activation function. The student moves the sliders and observes when the output switches from 0 to 1.

Student tasks

Move the weight and bias sliders.
Identify the input combinations that cause the neuron to fire (output = 1).

Room 3 · Calculate the output

Neuron with weights w=(0.3; 0.4; −0.6) and bias b=0.2, step function. The student evaluates all binary input combinations and marks those that cause the neuron to fire.

Student tasks

Calculate the weighted sum for each input combination (0/1).
Mark all inputs for which the output is 1.

Room 4 · The majority function

With weights (1, 1, 1) and bias −1.5 the neuron implements the majority function. The student is asked which logical function is obtained if the weight of x₁ changes to 2.

Student tasks

Study the truth table with the new weight x₁ = 2.
Choose which logical function describes the resulting behaviour.

Room 5 · The decision boundary

A neuron with 2 inputs draws a dividing line in the plane. The student moves the weights and bias and observes how that boundary rotates and shifts.

Student tasks

Move the weights and bias to explore different decision boundaries.
Identify which logical function is produced by the current configuration.

Room 6 · Implement AND

The student must find weights and a bias such that the neuron implements the AND function: it fires only when both inputs are 1. The truth table turns green once achieved.

Student tasks

Adjust weights and bias until the AND truth table turns green.

Room 7 · Implement OR (and discover XOR)

The student implements OR (fires if any input is 1) and then attempts XOR (fires only if exactly one input is 1), discovering that a single neuron cannot separate XOR with a straight line.

Student tasks

Find weights and bias for OR.
Attempt XOR and observe why it is impossible with a single neuron.

Room 8 · The sigmoid

The sigmoid function σ(z) = 1/(1+e⁻ᶻ) is presented as a smooth version of the step function: it equals 0.5 at z=0 and tends to 0 or 1 at the extremes. The student moves a point along the curve and observes its behaviour.

Student tasks

Move the point along the curve and observe the values of σ(z).

Room 9 · What is σ(1)?

Mental calculation of σ(1) using the approximation e ≈ 2.7, giving σ(1) ≈ 0.73. The student applies the same idea to calculate σ(−1).

Student tasks

Follow the reasoning e⁻¹ ≈ 0.37 → σ(1) ≈ 0.73.
Calculate σ(−1) by symmetry or by the same approach.

Room 10 · The elegant derivative

The remarkable property σ'(z) = σ(z)·(1−σ(z)) is presented. The student verifies that the formula holds for three different values of z.

Student tasks

Calculate σ(z) and σ'(z) for three values of z and check the identity.

Room 11 · Other activation functions

Interactive gallery with sigmoid, tanh, ReLU, Leaky ReLU, step and linear. The student selects each function from a dropdown and observes the graph and its derivative.

Student tasks

Select each activation in the dropdown.
Observe the shape of the curve and its derivative for each function.

Room 12 · A network is not exact mathematics

Conceptual bridge: a neural network does not seek the perfect answer — it accepts small errors and reduces them step by step. The idea is introduced before formalising error measurement.

Student tasks

Read and observe the simple case that illustrates iterative error reduction.

Room 13 · The quadratic error

The loss function E = ½(y − ŷ)² is presented. The student moves the actual-output and target sliders and observes how E varies, including when it equals zero.

Student tasks

Move the y and ŷ sliders and observe E.
Identify the condition under which the error is zero and when it is maximum.

Room 14 · Calculate the error

Sigmoid neuron with w=(2, −1), b=0.5, input (1,1) and target ŷ=0. The student calculates the output y and determines which range the error E falls in.

Student tasks

Calculate z = w·x + b and then y = σ(z).
Determine E = ½(y − 0)² and indicate the correct range.

Room 15 · Error with sigmoid AND

With sigmoid AND (w=(1,1), b=−1.5) and target ŷ=0, the student compares the error for inputs (1,0) and (0,0) to understand which example produces more loss.

Student tasks

Calculate the output and error for each input.
Answer which of the two cases produces greater error E.

Room 16 · Going downhill

Visual analogy: standing at the top of a hill and taking steps downhill following the slope. The learning rate η is introduced and the student tries different values to see how they affect convergence.

Student tasks

Try different values of η (small, medium, large).
Observe how the speed and stability of the descent changes.

Room 17 · Two valleys: learning rate

The loss function can have two local minima. With a small η the algorithm gets stuck in the first; with a large η it may jump between them. The winning strategy is to start with a large η and reduce it.

Student tasks

Explore trajectories with small, large and variable η.
Identify when the global minimum is reached and why.

Room 18 · Backpropagation live

Animated demo of how a sigmoid neuron is trained: each step adjusts the weights by distributing the error across the inputs, with larger changes for more active inputs. The student moves the target and observes the weights in real time.

Student tasks

Move the target and observe how the weights change step by step.

Room 19 · Saturated neurons

The variables x (input), z (weighted sum) and y (sigmoid output) are distinguished. The student studies 4 typical saturation cases and trains 3 generations in each, observing how the sigmoid's derivative slows down learning.

Student tasks

Select each of the 4 cases and train for 3 generations.
Observe the size of the update step in each case.

Room 20 · One gradient-descent step

Numerical calculation of one gradient-descent step with η = 1. The student calculates the new value of w₁ after applying the update rule.

Student tasks

Apply the formula w₁ ← w₁ − η·∂E/∂w₁ and calculate the new w₁.

Room 21 · Second step and saturation

A second descent step is applied and the student compares its size with the first. The reason why each step gets smaller as the neuron approaches the saturated region of the sigmoid is explained.

Student tasks

Calculate the second descent step.
Explain why the step is smaller than the first.

Room 22 · Expanding the hidden layer

XOR is not linearly separable, as shown in Room 7. The solution is to stack neurons: architecture 2 → 2 → 1 (2 inputs, 2 hidden neurons, 1 output). Each layer transforms the space until it becomes separable.

Student tasks

Read the explanation of why XOR requires more than one layer.
Observe the 2-2-1 architecture and its connections.

Room 23 · Forward pass — input (1, 0)

The 2-2-1 network processes the input in two stages: the hidden neurons calculate h₁ and h₂, and the output combines them. An animation of the forward pass for (1, 0) is shown and the student predicts what will happen with (0, 1).

Student tasks

Follow the forward pass animation for input (1, 0).
Predict the result of the same network for input (0, 1).

Room 24 · How does a network learn?

Conceptual bridge before backpropagation: so far weights have been set by hand; in practice the network adjusts them on its own. The idea of automatic parameter learning is introduced.

Student tasks

Read the transition from "manual weights" to "learned weights".

Room 25 · The idea of backpropagation

To train a multi-layer network, the error calculated at the output flows backwards through the weights, being distributed among the hidden neurons and reaching all the way back to the inputs. The student watches the animation of error flow.

Student tasks

Watch the animation of the error propagating backwards layer by layer.

Room 26 · Regression and least squares

A linear neuron (no activation) trained on a scatter of points performs linear regression. The student drags points and presses Train, observing how the line gradually shifts towards the optimum.

Student tasks

Drag one or more points on the graph.
Press Train and observe how the line converges step by step.

Room 27 · Activation functions: which one and when?

In-depth comparison of the five activations (sigmoid, tanh, ReLU, Leaky ReLU, linear) with a shared slider. Below each panel the typical use in real networks is shown.

Student tasks

Move the shared slider and compare the curves of all five functions.
Read the typical use of each function and associate it with its shape.

Room 28 · Playground

Free exploration zone: the student chooses a dataset, adjusts the network architecture (number of layers and neurons) and trains it, observing the prediction map in real time.

Student tasks

Choose a dataset and an architecture.
Train the network and observe how the prediction map changes.

Room 29 · Playground 2.0: two hidden layers

Extended playground version with two hidden layers: the first detects features, the second combines them. The student configures the network, chooses the activation function and tackles more complex challenges.

Student tasks

Configure a network with two hidden layers.
Choose the activation and train on non-linear datasets.
Compare the results with those of a single hidden layer.

Rooms to project

The most striking ones to show and discuss in class.

★

Room 7 · Implement OR (and discover XOR) — The "impossible" moment of the lab: the class tries to achieve XOR with a single neuron and discovers there is no line that can separate it. Ideal for projecting and discussing why linearity is a fundamental limitation.

★

Room 18 · Backpropagation live — The animation shows in real time how each step adjusts the weights by distributing the error. Projecting it while moving the target lets the whole class see and discuss the intuition behind the gradient.

★

Room 25 · The idea of backpropagation — The algorithmic core of the lab: error flowing backwards layer by layer. The animation is very visual and allows reasoning in class about why the chain rule works before diving into the formulas.

★

Room 29 · Playground 2.0: two hidden layers — Experimental ending: each student builds their own network and tests it on difficult datasets. Projecting two different configurations and comparing their decision boundaries is a perfect synthesis of the complete journey.