← Back to the tool 📄 Download as PDF

1. The problem

Given a 10-player ranked lobby (5 allies, 5 enemies, each with a known op.gg peak rank), output one number: the probability the ally team wins. No champion picks, no role context, no team comp logic — just rank vs rank.

This document walks the whole journey: how the first version was built, what was wrong with it, and how it was calibrated against real match outcomes to land on the current numbers.

2. First swing — the intuition-based model

The model I’ll call the “OG” version was built by hand. Eight discrete buckets (six for Master-and-above split by LP, one for Diamond, one catch-all for Emerald-and-below), each assigned an MMR value by gut feel:

BucketPeak rank rangeOG MMR
HCMaster 2500+ LP+1275
LCMaster 2000–2500 LP+1000
GMMaster 1500–2000 LP+750
HMMaster 1000–1500 LP+500
MMMaster 500–1000 LP+250
LMMaster 0–500 LP0
DMDiamond (any division)−150
EMEmerald or below−350

Plus two model-wide knobs: T = 400 (the softmax “temperature” — more on this later) and S = 400 (the Elo scale — same). The values were plausible-looking guesses. None of them were tested against real outcomes.

The OG model is going to be the reference point throughout the rest of this doc. Every time we say “calibrated” or “new” version, the implicit comparison is to this hand-tuned starting point.

3. The math under the hood — Elo (chess)

The model is an Elo system — the same family of skill ratings developed by Arpad Elo around 1960 for ranking chess players (adopted by the US Chess Federation that year, by FIDE in 1970).

Trivia: Elo was a real person — a Hungarian-American physics professor. “ELO” is not an acronym, despite a persistent backronym (“Expected Level of Opposition”). It’s just his surname. The same rating family is used in everything from chess to League to competitive Scrabble to (until 2019) Tinder’s matchmaking algorithm.

In the chess (1v1) case, each player has a numeric rating R. Higher is better. If player A (rating RA) plays player B (rating RB), the predicted probability A wins is:

P(A wins) = 1 / (1 + 10(RB − RA) / S)

S is the scale parameter — chess uses S = 400, with a specific interpretation: a 400-point gap means the higher-rated player wins ~91% of the time, an 800-point gap means ~99%. Each S-point gap multiplies the favourite’s odds by 10×.

The base-10 isn’t fundamental — you could rewrite the same model in natural log with a rescaled S. Elo picked base 10 for the “decade of odds per S-point” interpretability. It’s a convention, not a deep mathematical truth.

For perspective: Magnus Carlsen’s peak FIDE Elo is 2882. Bobby Fischer’s was 2785 — a 97-point gap. Plugging into the formula: prime Magnus would beat prime Fischer ~64% of the time. The gap between the two greatest chess players ever is, by Elo’s own scale, a slight favourite — not a layup.

After each game, ratings update toward the actual result. In chess:

RA,new = RA,old + K · (actual − expected)

where K (typically 10–40) controls how fast ratings move. League’s actual MMR system is proprietary and not pure Elo (it adds decay, smurf-queue heuristics, and uncertainty bands à la Microsoft’s TrueSkill). But the core mechanic — a numeric skill rating, a logistic prediction curve, update steps — is Elo. Op.gg’s “Top Tier” (peak rank in the current season) is the closest visible proxy, so that’s what the model reads as input.

4. Adapting Elo to a 5v5 team game

The chess formula expects one rating per side. League has 5. How do you turn five player MMRs into one team rating?

Two obvious extremes:

Both are wrong in opposite directions. The truth lives between them, and the model needs a knob to dial it.

5. The softmax-Elo trick (Dehpanah-style aggregation)

The clean way to interpolate between mean and max is the log-sum-exp (softmax) aggregation. This is the approach Dehpanah et al. and others use for multiplayer Elo extensions in MOBA / team-shooter research:

Rteam = T · log( Σ exp(Ri / T) )

where Ri are the 5 player MMRs and T is a temperature parameter:

Why softmax is the natural choice: it’s the smooth, differentiable interpolation between min/mean/max that comes out of statistical mechanics (it’s the Boltzmann sum). For MOBA games specifically, the Dehpanah-style research argues that skill in team games is asymmetrically transferable — a stronger player elevates teammates more than weaker players drag them down. The softmax with a moderate T captures this: the strongest player’s MMR gets the heaviest exponential weight, but everyone still contributes.

The OG model picked T = 400 — a guess. We’ll see how the data feels about that.

6. Why discrete buckets instead of continuous LP

In principle each player could be assigned a continuous MMR from their exact peak LP. Two reasons we stuck with 8 discrete buckets:

Gauge choice: the model is anchored so Master 0 LP = MMR 0 exactly. This is a free choice — adding a constant to all 8 MMRs doesn’t change any prediction (predictions only see differences) — so we have to fix one. Master 0 LP being the zero point is a natural pick.

7. Time to calibrate

The OG model’s parameters (8 bucket MMRs + T + S) were 10 numbers, all guesses. The plan: collect a large set of (lobby ranks, actual outcome) pairs and let the data tell us what they should be.

Data collection

120 hand-picked EUW players, 15 per bucket × 8 buckets, with at least 50 ranked solo/duo games since the 2026-04-29 season reset. For each player we pulled their first up-to-200 games chronologically, then scraped every unique lobby participant’s op.gg “Top Tier” peak rank — about 44,000 unique players, scraped via headless browser at ~30/min over many hours.

Final dataset: 13,739 games, each with 10 known peak ranks and a known winner. Bucketed by peak LP into the 8 buckets.

Fitting method

Maximum likelihood. For each game, the model predicts P(ally wins) given the lobby’s bucket counts and the current parameters. Each game contributes log P(observed outcome) to the total log-likelihood. We minimise the negative log-likelihood (i.e., maximise the probability the model assigns to the actual outcomes) using L-BFGS-B (a standard quasi-Newton optimiser from scipy).

Overfitting controls

The model ladder

We didn’t just fit “the model” — we fit five nested versions, each adding flexibility on top of the previous one, and picked the best by cross-validation. This guards against overfitting: a more complex model only wins if it actually helps held-out predictions.

LevelWhat’s free# params
L0nothing (OG baseline, all values fixed)0
L18 bucket MMRs (anchored to Master 0 LP = 0)7
L2L1 + T + S (global values)9
L3L2 + rank-dependent T: T(R) = T0 + a·R11
L4L3 + rank-dependent S13

8. What came out

Brier scores (lower = better calibration)

Model5-fold CV BrierSealed holdout BrierHoldout Δ vs OG
L0 (OG)0.24250.2481
L10.22650.2309−0.0172
L20.22600.2303−0.0178
L3 (winner)0.22600.2303−0.0178
L40.22600.2303−0.0178

L3 wins on the strict CV tiebreak, but L2, L3 and L4 are statistically identical — rank-dependence of T and S didn’t help. L2 is the practically simplest equivalent and what the website ships: same 8 bucket MMRs, one global T, one global S.

Final calibrated parameters

ParameterCalibratedOG (intuition)Δ
HC MMR+1125+1275−150
LC MMR+930+1000−70
GM MMR+793+750+43
HM MMR+654+500+155
MM MMR+398+250+148
LM MMR+490+49
DM MMR−34−150+116
EM MMR−312−350+38
T (temperature)1317400+917 (≈3.3×)
S (Elo scale)510400+110
Side note — the HC vindication: earlier exploratory work on a much smaller apex-only dataset suggested HC ≈ 1109 (via a joint MLE on apex games). The new fit on the full 120-player dataset lands at HC = +1125 — within 16 MMR of that prior. Two independent datasets, same answer. Good sign.

9. Worked example — putting the calibrated model to work

Five Low Masters (each with MMR +49) on the ally team. Four Low Masters plus one High Challenger (MMR +1125) on the enemy team. Using the calibrated T = 1317 and S = 510:

Ally rating = 1317 · log( 5 · exp(49/1317) ) = 1317 · log(5.190) = 2169
Enemy rating = 1317 · log( 4·exp(49/1317) + exp(1125/1317) ) = 1317 · log(6.501) = 2466

Difference: enemy team is +297 ahead. Plug into the Elo formula:

P(ally wins) = 1 / (1 + 10297 / 510) = 1 / (1 + 3.83) = 0.21
So one HC on the enemy side drops your win rate from 50% (mirrored lobbies) to 21%. The same scenario under the OG model (T = 400, S = 400, HC = +1275, LM = 0) would have given you ~2% win rate. The OG T = 400 made carries far too dominant — a single Challenger looked like an automatic loss. The calibrated T = 1317 reflects what the data actually shows: solo carries matter, but less than the intuition-based model assumed.

10. What this tells us — human intuition

Diamond and Low Master are basically the same tier

DM = −34, LM = +49 — just 83 MMR apart. Translated into the units you actually see on the client: the median Low Master in our data peaks around Master 300 LP, while the median Diamond peaks around Diamond 2. That’s roughly 4 divisions of climbing distance on the ladder.

By the model: in a hypothetical 1v1 between two players at their bucket medians, the LM wins ~59% of the time — barely above coin-flip. (P = 1 / (1 + 10−83/510) = 0.593.) Four divisions of LP between Diamond 2 and Master 300 buys the higher one almost nothing in true skill. The Master crest at that boundary is mostly cosmetic.

The Emerald cliff

EM = −312, DM = −34 — a 278 MMR gap, 3.3× bigger than the DM↔LM gap above. The median EM peaks around Emerald 2 and the median Diamond around Diamond 2 — about the same 4-division LP distance as DM↔LM, but more than 3× the skill jump.

Same LP distance, vastly different skill jump: P(higher wins 1v1) = 0.78 here, vs only 0.59 for DM↔LM. The real cliff in solo Q lives at the Emerald/Diamond boundary, not the Diamond/Master one.

And the EM bucket lumps everyone Emerald-and-below together — Platinum, Gold, Silver, Bronze, Iron, Unranked. Inside that bucket, the effective skill depends on the player mix; see the caveat in §11.

The compressed apex

HC dropped from the OG +1275 to +1125. The calibrated apex spread is narrower than originally assumed — High Challenger is only ~150 MMR above Low Challenger, not 275. Long-standing intuition about how spread out the apex really is was too wide.

What T = 1317 means in plain terms

T controls the carry-vs-team-play balance. In the softmax sum, each player’s weight is exp(R/T). The intuitive question: how much does a strong player actually carry the team rating? Some concrete examples, all with 4 Low Master teammates (Master ~300 LP):

Translation: in this model, no realistic carry in a Master+ lobby genuinely “1v9s”. The strongest player tilts the sum, but 4-vs-1 weight-by-tier always matters more than any single player’s ceiling.

The OG T = 400 set the same thresholds at 3.3× tighter: a +349 MMR carry got exp(349/400) ≈ 2.4× weight (vs the new model’s 1.3×). That was way too max-heavy. The data says solo carries are far rarer than the intuition believed. Average team strength matters more than your best player’s ceiling.

What S = 510 means in plain terms

S is the Elo scale: every S-point MMR gap multiplies the favourite’s odds by 10×. With S = 510, anchored to concrete bucket pairs:

S = 400 (the OG value) would have made every one of those gaps feel sharper — HM vs LM goes from 92% under S = 510 to 96% under S = 400. The calibrated S = 510 says rank gaps in solo Q convert to win rate less steeply than classical chess Elo predicts. That tracks — solo Q has more randomness (4 randos per side) than chess does.

11. Caveats and known limitations

The EM bucket is not uniform

The EM bucket sweeps in anyone with a peak below Diamond — Emerald, Plat, Gold, Silver, Bronze, Iron, Unranked. The calibration data was ~83% Emerald inside the EM bucket. But in lobbies where the EM-bucket share is heavier on Plat / Gold / lower (e.g., smurf accounts queuing in low MMR), the bucket’s effective skill is lower than the calibrated −312, and the model will over-credit the higher-ranked side for beating them.

For analysing such cases (e.g., a Master smurf grinding through Emerald MMR) we extrapolate sub-Diamond tier MMRs from the EM→DM slope of the fitted curve. That gives:

Emerald: −312  |  Plat: −591  |  Gold: −859  |  Silver: −1126  |  Bronze: −1395  |  Iron: −1660

These aren’t part of the calibrated 8-bucket model that ships on the website — they’re an analysis tool for the cases where the EM bucket’s real composition diverges sharply from the calibration mix.

S could be rank-dependent

Earlier work hinted S might grow with rank — fits at the Low-Master end suggested S ≈ 400, fits on Challenger data suggested S ≈ 1325. The L4 model tried fitting S(R) = S0 + a·R but the slope came out essentially zero, because we don’t have enough Challenger games in the dataset to drive the rank-dependence. The shipping value S = 510 is reasonable globally but possibly underfits the very top. Open follow-up.

Bucketing loses information

A Master 5 LP player and a Master 499 LP player both bucket as LM and get the same MMR. They’re not the same. A continuous-LP fit (using exact peak LP through a monotone spline) is the natural next step and would improve sharpness, especially near bucket boundaries.

Op.gg’s “Top Tier” is the input, with all its quirks

We use op.gg’s Top Tier (peak rank in the current season). This is lagging for players who just hit a new high mid-season and is missing entirely for renamed accounts (~1.4% of slots). For unknowns we use a lobby-mean fallback: assign the unknown player the bucket whose MMR is closest to the average of the 9 known peaks in their game.

12. TL;DR

The model is an Elo system like chess, extended from 1v1 to 5v5 by replacing each side’s single rating with a softmax aggregate of the 5 players’ MMRs. Two knobs: T (temperature, how much the strongest player carries) and S (Elo scale, how rating gaps convert to win probability).

Started from a hand-tuned baseline (the OG model), fit 8 bucket MMRs + T + S on 13,739 EUW games across 120 players spanning Emerald to Challenger, with 25% player-level holdout and 5-fold CV. Result: 7.2% improvement in holdout Brier vs the OG, well-calibrated reliability across the prediction curve.

Three takeaways the numbers force you to accept:

← Back to the tool