Markov chains

← Back to specs
High school & University 19 Rooms Probability Modelling

Teaching objectives

Can a single mathematical idea predict the weather, your next WhatsApp message, which web pages matter, and whether Bitcoin will keep falling tomorrow? Yes — and that idea is the Markov chain: what happens tomorrow depends only on what happens today. This upper-secondary lab covers 19 rooms, moving from concrete intuition (your own mood, Barcelona's weather) all the way to real-world applications like PageRank and financial time series.

What you'll learn

Students build intuition for memoryless dependence using their own real data, then formalize it with matrices to predict, simulate, and optimize. The core mechanic is a live graph with a jumping frog and real-time counters that make transition probabilities visible.

  • Count real transitions (actual weather observations made over one month in Barcelona) and normalize rows to obtain the transition matrix P.
  • Multiply a probability vector by P to predict 1 and 2 steps ahead; repeat the operation to watch the initial state get forgotten in ~9 days.
  • Identify the stationary distribution (eigenvector) π such that π·P = π by moving sliders until the fixed point is reached.
  • Run 100 parallel simulations and confirm that the proportions converge to π.
  • Distinguish absorbing chains and states with no return.
  • Compare order-1 vs. order-2 models (yesterday-and-the-day-before vs. yesterday only) and discuss the trade-off between predictive power and data requirements.
  • Explore PageRank (Google's web-page ranking algorithm) as a Markov chain.
  • Generate text with an order-1/2/3 predictor trained on 3 million Spanish words and choose the optimal order for a real keyboard.
  • Encode Bitcoin into 5 daily/weekly/monthly variation states, read the diagonal of P as market inertia, and run an honest back-test comparing Markov order 1, order 2, and buy-and-hold on 2026 test data.

Key mathematical ideas

  • A Markov chain is a stochastic process where P(next state | full history) = P(next state | current state) — the Markov property, or "memorylessness".
  • The transition matrix P has non-negative rows that sum to 1 (stochastic matrix); each entry P[i][j] is the probability of moving from state i to state j.
  • The distribution after n steps is v·Pⁿ; as n→∞ the distribution converges to π regardless of the initial state (for ergodic chains).
  • The stationary distribution π satisfies π·P = π and can be found as the left eigenvector of P with eigenvalue 1.
  • An absorbing state has P[i][i] = 1; if one exists, any long trajectory ends up trapped there with probability 1.
  • Increasing the order of the model expands the state space (pairs, triples of days) and captures more dependence, but requires exponentially more data to estimate each row well.

Room-by-room contents

Room 1 · How do we predict the weather?

Two real January series (Madrid and A Coruña) showing day-by-day weather are presented: Sunny, Cloudy, or Rainy. The student observes streak patterns before the system makes a prediction for day 31.

Student tasks

  • Choose a city and observe the generated sequence of days.
  • Answer what weather you think day 31 will bring and see whether the model anticipates it.

Room 2 · A chain of states

The student builds their own mood chain (Happy / Sad / Neutral) by recording at least 50 transitions. With that record the key idea is demonstrated: the next state depends only on the current one.

Student tasks

  • Record at least 20 different states and 5 unique transitions until reaching 50.
  • Click "Done" to see the diagnosis of the generated chain.

Room 3 · How I build the prediction

Animation explaining how to turn a chain into a frequency table (transition matrix). Then the student reads their own table and answers a concrete probability question.

Student tasks

  • Watch the animation showing how the table is built.
  • Read your own table and write the probability of Sun the next day given that it is raining today (value between 0 and 1).

Room 4 · Real data: Barcelona in October

30 real days of Barcelona weather are presented (first entries of the 2019-2023 training set). The student classifies each consecutive pair of days to build matrix P1 by hand, then reflects on why the 30-day sample differs from the reference matrix.

Student tasks

  • Click the correct cell for each consecutive pair of days (Sun→Cld, Cld→Rn, etc.).
  • Check the score (maximum 20) and choose the correct explanation for the difference.

Room 5 · The frog jumps between clouds

Live graph with the frog mascot jumping according to the P1 matrix for Barcelona (Sunny/Cloudy/Rainy). Visit counters update in real time; the student estimates by eye the long-run percentage of Sunny days.

Student tasks

  • Watch the graph and the counters as the frog jumps.
  • Write the percentage of Sunny visits you expect in the long run (target ≈ 37 %).

Room 6 · Multiplying by the matrix

The operation v' = v · P is introduced: starting from today's probability vector, tomorrow's is obtained. An animated demo shows the pure-Sunny case; then the student practises with another starting point.

Student tasks

  • Watch the animated vector-multiplication example.
  • Move the "today" sliders and check the "tomorrow" result computed by the matrix.

Room 7 · Two steps ahead

Extension of the previous step: applying the matrix twice (P²) to predict the day after tomorrow. The student starts from "it is raining today" and adjusts the sliders to obtain the probability of Sun in two days (target ≈ 0.29).

Student tasks

  • Move the sliders to apply P twice starting from Rain.
  • Read the probability of Sun the day after tomorrow and verify it.

Room 8 · The past is forgotten

Three convergence curves, one for each possible starting state (Sunny, Cloudy, Rainy), iterating the matrix repeatedly. The curves merge: the past stops mattering. The student identifies from which day that occurs.

Student tasks

  • Observe how the three curves converge to the same value.
  • Write from which day they almost completely overlap (target ≈ day 9).

Room 9 · One hundred frogs

Simulation of 100 independent frogs starting from different states. In the long run all of them redistribute according to the stationary distribution π ≈ (0.37 / 0.37 / 0.26). The student predicts the Sunny percentage.

Student tasks

  • Watch the simulation of 100 frogs jumping in parallel.
  • Estimate and record the long-run percentage in Sunny (target 37 ± 7 %).

Room 10 · The fixed point

The stationary distribution π such that π · P = π is introduced. The student moves three sliders to find this special distribution; the screen confirms when it is found within a 2 % tolerance.

Student tasks

  • Move the sliders until the output vector matches the input vector.
  • Memorise or note the fixed-point values (π ≈ 0.37 / 0.37 / 0.26).

Room 11 · Order 1 vs order 2

An order-1 model (P1) and an order-2 model (P2, rows = pairs of days) are trained on 5 years of October in Barcelona and evaluated on 2024 (31 unseen days). Order 2 gets 4 more correct (16 vs 20). The student selects the correct explanation of the trade-off.

Student tasks

  • Compare the number of correct predictions of both models on the test set.
  • Choose the explanation of why more memory does not always win with few data points.

Room 12 · The frog drowns

Absorbing chain: 5 stepping stones in a row plus a puddle (trap state). Each jump has only a 5 % chance of falling in. The student predicts where the frog will be after 1 000 jumps, then simulates and discovers that it always ends up in the puddle because 0.95^1000 ≈ 0.

Student tasks

  • Make an intuitive prediction before simulating.
  • Simulate and compare with the prediction; read the explanation of the absorbing state.

Room 13 · The random surfer

Six web pages connected with a PageRank matrix (damping 0.85). The frog follows links 85 % of the time and teleports 15 %. The student observes which page is visited most (A) and explains why, based on the link structure.

Student tasks

  • Let the simulation run and read the accumulated visits.
  • Choose the correct explanation of why A wins (it receives more links).

Room 14 · Design your own Google

Interactive editor of 5 pages (A–E): the student draws links by clicking and simulates 1 000 surfer jumps. Phase 1 goal: make E the most visited. Phase 2 goal: achieve the same result using only pages that are already "trusted sources".

Student tasks

  • Connect the pages with links and simulate until E is the most popular.
  • In phase 2, achieve the same result by leveraging pages with high reputation.

Room 15 · WhatsApp predictor

A Markov chain trained on 3 million words of Spanish subtitles. The student chooses the order (1, 2, or 3) and a starting word, generates dialogues, and compares fluency; then answers which is the best compromise for a real keyboard.

Student tasks

  • Try all three orders with several starting words and read the generated texts.
  • Choose which order offers the best balance between coherence and data coverage.

Room 16 · Bitcoin in 5 colours

One year of Bitcoin (2025) encoded in 5 variation states: large rise, rise, flat, fall, large fall. The view can be switched between daily, weekly, and monthly resolution. The matrix shows the autocorrelation of the market.

Student tasks

  • Switch between day, week, and month views and observe how the matrix diagonal changes.
  • Answer what a high diagonal means (clustering: similar states chain together).

Room 17 · Does the market have memory?

Honest out-of-sample test on 2026: the model trained on 2025 generates price curves with P1 and P2 and compares them with the real curve (MSE over 15 points). The student repeats several runs and draws conclusions about the model's predictive power.

Student tasks

  • Run several simulations and observe the MSE of each model.
  • Choose the most sensible conclusion: whether Markov predicts prices with consistent success.

Room 18 · Honest back-test

Comparison of three investment strategies with 1 bitcoin: buy and hold, Markov order 1, and Markov order 2. Each day the models say LONG or OUT. The final financial result is simulated over the 2026 test period.

Student tasks

  • Run several simulations and compare the final balance of the three strategies.
  • Choose what happens to the Markov strategy versus buy-and-hold and reason why.

Room 19 · Conclusions

Six statements about Markov chains (absorbing states, price prediction, the Markov property, stationary distribution, etc.) that the student classifies as true or false; each answer reveals a real-world example.

Student tasks

  • Read each statement and mark it as true or false.
  • Read the real-world examples that appear after each answer.

Rooms to project

The most striking ones to show and discuss in class.

Room 5 · The frog jumps between clouds — The live graph with the jumping frog and real-time counters is highly visual: ideal for projecting so the class can intuit the stationary distribution before calculating it.
Room 12 · The frog drowns — The "aha" moment of the lab: the probability of falling in is only 5 % per jump, but intuition fails. Projecting the simulation and debating why the puddle always wins sparks a very rich discussion about absorbing states.
Room 14 · Design your own Google — The link editor turns PageRank into a concrete design challenge. The class can propose graphs on the board and watch in real time how votes circulate, connecting the mathematics with real SEO.
Room 15 · WhatsApp predictor — Generating sentences with orders 1, 2, and 3 in real time is surprising and fun; perfect for projecting and discussing why higher order is not always better and how a phone keyboard works.