Designing a Score System That Doesn't Frustrate Players

If you have ever played a competitive game, you know the feeling. You grind for hours, climb your rank carefully, then lose three games in a row and watch your progress evaporate. The score you earned over a dozen wins disappears in a fraction of the time. You close the app. Maybe you open it again tomorrow. Maybe you don't.

When I was building BrainFit — a brain training RPG with 32 mini-games across six cognitive areas — I knew that the score system would be the emotional backbone of the entire experience. Not the games themselves, not the space-themed visuals, not the gacha shop. The score. Because the score is the player's mirror. It tells them whether they are getting better or worse, and that single signal determines whether they keep playing or quit.

I spent weeks designing what I call the BQ (Brain Quotient) system: a scoring mechanism that uses asymmetric EMA smoothing and three layers of decline protection to reward improvement while cushioning losses. This post explains the reasoning behind every design decision, with the actual production code from BrainFit's bq_calculator.dart.

What Is BQ?

BQ stands for Brain Quotient. It is BrainFit's player-facing skill metric — the number that players see on their profile, track over time, and compare with friends.

BQ is structured around six cognitive areas, each corresponding to a category of mini-games:

| Area | Code | Games | |------|------|-------| | Memory | WIS | PatternMatrix, CardFlip, FaceMemory, SoundSequence, NBack, MemoryPalace | | Focus | AGI | ColorStroop, SpeedTap, TrackingBall, DualTask, GoNoGo, AttentionalBlink | | Logic | INT | NumberChain, BlockPuzzle, Balance, PatternMatching, MatrixReasoning, TowerPlanning | | Language | CHA | WordScramble, AssociationChain, FillBlank, CategorySort, VerbalFluency, VerbalAnalogy | | Speed | SPD | StarlightSearch, MeteorSort, SignalSwitch, BlackholeDodge | | Spatial | SPA | PlanetRotation, SpaceMaze, NebulaAssembly, OrbitPredict |

Each area has its own BQ score ranging from 1 to 100. The overall BQ is the average of all active areas — that is, areas where the player has actually played at least one game. This is important: if a player has only played Memory and Logic games, their overall BQ is the average of those two areas, not dragged down to near zero by the four areas they have not touched yet.

Here is the actual implementation:

static int totalBq({
  required int wis,
  required int agi,
  required int int_,
  required int cha,
  int spd = 0,
  int spa = 0,
}) {
  final values = [wis, agi, int_, cha, spd, spa];
  final active = values.where((v) => v > 0).toList();
  if (active.isEmpty) return 0;
  return (active.reduce((a, b) => a + b) / active.length).round();
}

This "active areas only" design was a deliberate choice. BrainFit starts with four core areas (WIS, AGI, INT, CHA) and unlocks Speed and Spatial later. Without the active-area filter, unlocking a new area would immediately tank the player's overall BQ from, say, 65 to roughly 43 — punishing them for exploring new content. That would be a terrible incentive structure.

The Elo Foundation

Under the hood, BQ is not calculated directly from game scores. Each area tracks player skill using an Elo rating system — the same mathematical framework used in chess rankings. The Elo system underneath BQ is covered in this post, so I will keep the summary brief here.

Each area has an independent Elo rating ranging from 600 to 1800, starting at 1000. When a player answers problems correctly or incorrectly, their Elo updates using the standard expected-score formula with asymmetric K-factors. The K-factors are already designed to slow down losses: the K-factor for incorrect answers is half the K-factor for correct answers (16 vs 32 for new players, 8 vs 16 for experienced players).

BQ is a linear mapping from this Elo range to a 1-100 scale:

static int eloToBq(double elo) {
  final clamped = elo.clamp(EloService.minElo, EloService.maxElo);
  return ((clamped - EloService.minElo) /
              (EloService.maxElo - EloService.minElo) *
              99 +
          1)
      .round()
      .clamp(1, 100);
}

An Elo of 600 maps to BQ 1. An Elo of 1800 maps to BQ 100. An Elo of 1200 (the midpoint) maps to BQ 51. The mapping is intentionally linear — no curves, no thresholds. The Elo system already handles the nonlinear dynamics of skill estimation. BQ just provides a human-friendly scale.

So why not just show the Elo rating directly? Two reasons. First, a number like "1,247" does not mean anything to most players. Second, and more importantly, raw Elo fluctuates too much between games. A player might go from 1,247 to 1,231 after one bad game. That 16-point drop is statistically insignificant, but it feels like a failure. Players read meaning into numbers that the system did not intend. BQ exists as a smoothed, protected layer on top of Elo — a buffer between the statistical reality and the player's emotional experience.

Why Asymmetric Smoothing: Loss Aversion

The key psychological insight behind BQ's design comes from Daniel Kahneman and Amos Tversky's prospect theory. Their research demonstrated that losses are psychologically approximately twice as painful as equivalent gains are pleasurable. Losing $100 feels roughly twice as bad as gaining $100 feels good. This phenomenon — loss aversion — is one of the most replicated findings in behavioral economics.

In the context of a brain training app, loss aversion means that a player who gains 5 BQ points over three days and then loses 5 points in one bad session does not feel like they are back to even. They feel like they are behind. The loss dominates their emotional accounting. If this happens repeatedly, the player learns that playing the game is emotionally risky. The rational response to that lesson is to stop playing.

This is why BQ uses asymmetric Exponential Moving Average (EMA) smoothing. The core idea is simple: when the player's true skill (raw BQ derived from Elo) goes up, the displayed BQ follows relatively quickly. When it goes down, the displayed BQ follows slowly.

Here is the implementation:

/// Asymmetric EMA: ascent α=0.4 (fast), descent α=0.15 (slow).
/// Minimum +1 guaranteed on ascent.
static int smoothBq(int currentDisplayBq, int newRawBq) {
  if (newRawBq == currentDisplayBq) return currentDisplayBq;
  final alpha = newRawBq > currentDisplayBq ? 0.4 : 0.15;
  final smoothed = alpha * newRawBq + (1 - alpha) * currentDisplayBq;
  if (newRawBq > currentDisplayBq) {
    return max(currentDisplayBq + 1, smoothed.round()).clamp(1, 100);
  }
  return smoothed.round().clamp(1, 100);
}

The EMA formula itself is standard:

$$ \text = \alpha \times \text + (1 - \alpha) \times \text $$

The asymmetry is in alpha. For increases, alpha is 0.4 — the displayed BQ moves 40% of the way toward the new raw value in each update. For decreases, alpha is 0.15 — only 15% of the way. This means upward movement is roughly 2.7 times faster than downward movement.

Why These Specific Alpha Values?

I chose alpha = 0.4 for ascent because I wanted improvement to feel responsive. If a player has been working hard and their Elo is climbing, they deserve to see that reflected relatively quickly. But not instantly — instant updates would make the score feel jittery. An alpha of 0.4 provides a smooth ramp up that still feels connected to real performance. When a player with a displayed BQ of 50 achieves a raw BQ of 60, the displayed BQ moves: 50 → 54 → 57.6 → 60. Three updates to close most of the gap.

I chose alpha = 0.15 for descent because I wanted to approximate the inverse of loss aversion. If losses feel twice as bad, then losses should happen roughly half as fast. An alpha of 0.15 is not exactly half of 0.4, but it accounts for the fact that the EMA compounds: the effective half-life for descent is about 4.3 updates versus 1.4 for ascent — roughly a 3:1 ratio, which matches the asymmetric K-factors in the underlying Elo system.

There is also a subtle but important detail: on ascent, the minimum increase is +1. This is the max(currentDisplayBq + 1, smoothed.round()) line. Without this floor, a player whose raw BQ moves from 50 to 51 would see a smoothed value of 50.4, which rounds back to 50. The player improved but sees no change. That is demoralizing. The +1 minimum ensures that any real improvement is always visible.

Three Layers of Decline Protection

Asymmetric smoothing handles the normal case well: gradual declines during a mediocre session are softened, while improvements are clearly reflected. But there are edge cases where smoothing alone is not enough. A player having a terrible day, or a child who accidentally plays games way above their level, or someone returning after a long break whose skills have temporarily deteriorated — these scenarios can produce rapid, deep drops that no amount of smoothing fully absorbs.

For these cases, I built three layers of decline protection into the BqProtection class. Each layer addresses a different failure mode.

Layer 1: Lose Streak Dampening

/// Lose streak decay threshold.
static const int loseStreakThreshold = 3;

// Inside applyProtection():
// 1. Lose streak: after 3 consecutive losses, cap decline at -1/game
if (consecutiveLosses >= loseStreakThreshold) {
  protected = max(protected, currentBq - 1);
}

After three consecutive losses, the maximum BQ decline is capped at -1 per game. This addresses the frustration spiral: when a player is clearly struggling, further punishment does not help. They are probably already tilted. Limiting the bleeding to -1 per game means they cannot freefall, and a single good game can start the recovery.

Why three losses? Because one or two losses in a row is normal variance. Three is a signal that something is systematically wrong — the difficulty is too high, the player is distracted, or they are having an off day. At that point, the priority shifts from accurate skill tracking to emotional safety.

Why -1 instead of 0? Because completely freezing the score during a lose streak would break the trust contract with the player. BQ should still mean something. If a player keeps losing and their BQ does not change at all, the number feels fake. A decline of -1 per game is small enough to avoid frustration but large enough to maintain credibility.

Layer 2: Daily Floor

/// Maximum daily drop.
static const int dailyFloorDrop = 5;

// 3. Daily floor: today's start BQ - 5
final dailyFloor = (todayStartBq - dailyFloorDrop).clamp(1, 100);

No matter what happens during a single day, a player's BQ cannot drop more than 5 points below where they started the day. If you wake up with a BQ of 72, the absolute worst your BQ can reach by end of day is 67.

This layer protects against the binge-loss scenario. Some players, when they start losing, play more games trying to "win it back" — the same psychological trap that affects gamblers. Without a daily floor, these players can dig themselves into a deep hole in a single frustrated session. The daily floor says: "You can have a bad day. The damage is limited. Come back tomorrow fresh."

Five points is about a week's worth of typical improvement for an active player. Losing a week's progress in one day hurts, but it is recoverable. It is big enough that the score still has meaning within a single day, but small enough that no single session can be catastrophic.

Layer 3: All-Time Floor

/// All-time floor ratio relative to personal best.
static const double historyFloorRatio = 0.8;

// 2. All-time high BQ floor at 80%
final historyFloor =
    (allTimeHighBq * historyFloorRatio).round().clamp(1, 100);

A player's BQ can never drop below 80% of their personal best. If you once reached a BQ of 90, your BQ will never go below 72 (90 * 0.8 = 72), no matter how many bad games you play, no matter how many days you have.

This is the nuclear option — the absolute floor that prevents catastrophic loss of progress. It exists primarily for returning players. Someone who played BrainFit daily for three months, reached a BQ of 85, took a six-month break, and comes back with rusty skills should not see their BQ crater to 30. Their historical achievement deserves permanent recognition. The 80% floor says: "You proved you could do this once. We'll never let the score suggest otherwise."

Why 80%? I tested several thresholds. At 90%, the floor kicks in too early and makes the score feel sticky — players cannot tell if they are genuinely declining or being artificially propped up. At 70%, the floor rarely activates and does not provide enough protection for returning players. 80% hits the sweet spot: it allows meaningful short-term fluctuation while preventing the kind of deep, demoralizing drops that make players feel like their past efforts were wasted.

How the Layers Combine

The three protection layers do not replace each other — they all apply simultaneously, and the highest floor wins:

static int applyProtection({
  required int newBq,
  required int currentBq,
  required int todayStartBq,
  required int allTimeHighBq,
  required int consecutiveLosses,
}) {
  // If rising, no protection needed
  if (newBq >= currentBq) return newBq.clamp(1, 100);

  var protected = newBq;

  // 1. Lose streak: after 3 consecutive losses, cap at -1
  if (consecutiveLosses >= loseStreakThreshold) {
    protected = max(protected, currentBq - 1);
  }

  // 2. All-time high floor at 80%
  final historyFloor =
      (allTimeHighBq * historyFloorRatio).round().clamp(1, 100);

  // 3. Daily floor: today's start - 5
  final dailyFloor = (todayStartBq - dailyFloorDrop).clamp(1, 100);

  // Apply the highest floor
  return max(protected, max(historyFloor, dailyFloor)).clamp(1, 100);
}

Notice the first line: if the new BQ is equal to or higher than the current BQ, protection is bypassed entirely. Protection only activates on decline. This is critical — we never want protection to interfere with improvement.

Here is a concrete example. A player has:

Current BQ: 75
Today's starting BQ: 78
All-time high BQ: 85
Consecutive losses: 5 (on a bad streak)

After a loss, the raw BQ calculation produces 69. Here is what each layer does:

Lose streak dampening: 5 consecutive losses exceeds the threshold of 3, so the decline is capped at -1. Protected value: max(69, 75 - 1) = 74.
Daily floor: 78 - 5 = 73. The BQ cannot go below 73 today.
All-time floor: 85 * 0.8 = 68. The BQ can never go below 68.

The highest floor is 74 (from lose streak dampening), so the player's BQ becomes 74. Without protection, it would have been 69 — a 6-point drop that would feel punishing after an already frustrating streak. With protection, it is a 1-point drop that acknowledges the loss without being demoralizing.

Legacy Migration

BrainFit did not always use the Elo-based BQ system. The original version used a simpler scoring mechanism with BQ values ranging from 0 to 999. When I rebuilt the scoring system around Elo, I needed a migration path that would not alienate existing players.

The migration maps the old 0-999 range linearly onto the Elo range of 600-1800:

/// Legacy BQ (0-999) → Elo migration.
static double migrateToElo(int oldBq) => EloService.migrateOldBq(oldBq);

// In EloService:
static double migrateOldBq(int oldBq) {
  assert(oldBq >= 0 && oldBq <= 999, 'oldBq must be in range 0-999');
  return _minElo + (oldBq / 999) * (_maxElo - _minElo);
}

An old BQ of 0 maps to Elo 600 (BQ 1 in the new system). An old BQ of 999 maps to Elo 1800 (BQ 100). An old BQ of 500 maps to Elo 1200 (BQ 51). The linearity preserves relative rankings: if player A had a higher old BQ than player B, player A will have a higher new BQ too.

This matters for the social features. BrainFit has galaxies (guilds), friend rankings, and PvP challenges. A migration that scrambled the relative order of player scores would create chaos in every leaderboard.

Planet Evolution: Making the Score Tangible

One of the most satisfying design decisions I made was connecting BQ to a visible, evolving planet. In BrainFit's space theme, each player has a planet that evolves through five stages as their BQ improves:

| Stage | Name | Elo Range | Approx BQ | |-------|------|-----------|-----------| | 1 | Stardust | < 800 | 1-17 | | 2 | Asteroid | 800-999 | 17-34 | | 3 | Planet | 1000-1199 | 34-51 | | 4 | Star System | 1200-1499 | 51-75 | | 5 | Galaxy | 1500+ | 75-100 |

The planet evolution stages are tied to the Elo rating thresholds, not directly to BQ. This means the decline protection on BQ does not artificially inflate evolution stages — if a player's true Elo drops below a threshold, their planet can regress. However, because the Elo system itself has asymmetric K-factors (losses produce half the Elo change of gains), even the underlying rating system is gentler on descent.

This creates a nice psychological dynamic. The BQ number on the profile is heavily protected — it rarely drops fast, and it never drops catastrophically. But the planet visual provides a more honest signal about the player's current skill trajectory. Players who are genuinely declining will eventually see their planet stage drop, which serves as a gentle nudge to practice more without the sharp sting of a rapidly falling number.

The Full Pipeline

Let me trace the complete journey from a game result to the displayed BQ, because seeing how all the pieces fit together reveals the design philosophy:

Game ends. The player answered 15 problems correct, 5 wrong, average level 12.
Elo updates. EloService.updateEloForGame() processes each answer sequentially with asymmetric K-factors. The player's WIS Elo moves from 1,150 to 1,167.
Raw BQ calculated. BqCalculator.eloToBq(1167) maps this to a raw BQ of 48.
Smoothing applied. If the current displayed WIS BQ is 45, smoothBq(45, 48) uses alpha = 0.4 (ascent) to produce 46 (with the +1 minimum guarantee).
Protection check. Since the new BQ (46) is higher than current (45), BqProtection.applyProtection() returns it unchanged — protection only activates on decline.
Overall BQ recalculated. totalBq() averages all active area BQs.
Player sees updated BQ on profile. The planet pulses with a level-up animation if a threshold was crossed.

Now the same pipeline for a loss:

Game ends. 5 correct, 15 wrong, average level 12.
Elo updates. With asymmetric K (halved for losses), the WIS Elo drops from 1,150 to 1,135 — a drop of only 15, compared to a potential gain of 30 for an equivalent win.
Raw BQ calculated. eloToBq(1135) = 46.
Smoothing applied. Current displayed WIS BQ is 48. smoothBq(48, 46) uses alpha = 0.15 (descent): 0.15 * 46 + 0.85 * 48 = 47.7, rounds to 48. The displayed BQ does not change at all.
Protection check. Since 48 >= 48, protection does not activate.

This is the system working as intended. A single bad game against a moderate improvement baseline produces no visible change. The player's true skill estimate (Elo) adjusted appropriately, but the player-facing score absorbed the fluctuation. It takes sustained poor performance — multiple games, over multiple sessions — before the displayed BQ starts to visibly decline. And even then, the three protection layers ensure the decline is bounded.

Design Principles

Looking back at the BQ system, I see three principles that guided every decision:

The displayed score is a communication tool, not a measurement tool. Elo is the measurement. BQ is the communication. Elo needs to be statistically accurate. BQ needs to be emotionally accurate. These are different goals, and they require different properties. Elo should update fast and symmetrically to maintain calibration. BQ should update asymmetrically and with protection to maintain motivation.

Frustration compounds; encouragement does not. A player who gains 3 BQ points and then loses 3 BQ points does not feel neutral — they feel like they lost ground. A player who gains 3 points and keeps them feels slightly better. This asymmetry in emotional impact justifies the asymmetry in the score mechanics. We are not lying to the player about their skill — the underlying Elo is honest. We are choosing to communicate improvements quickly and declines gradually.

Safety nets should be invisible. Most players will never consciously notice the protection layers. They will simply feel that BQ is "stable but responsive" — it goes up when they play well and sort of stays put when they play badly. Only in extreme cases (five losses in a row, massive daily drops) do the safety nets activate visibly. And even then, the player does not see the mechanism — they just see a score that refuses to freefall. That is the goal: the protection should feel natural, not artificial.

What I Would Do Differently

The system is not perfect. One trade-off I am still debating is the daily floor of -5. For highly active players who play 20+ games a day, a 5-point daily floor might be too generous — it means their score can barely move down no matter how poorly they play. I have considered making the daily floor proportional to the number of games played that day, but the added complexity does not yet seem worth it.

Another consideration is that the all-time floor of 80% can create a "soft ceiling" effect. A player who reached BQ 100 can never drop below 80, which means their BQ score has less dynamic range for the rest of their time with the app. This has not been a problem in practice because reaching BQ 100 requires an Elo near 1800 — only the most skilled and dedicated players get there — but it is something I am monitoring.

If you are designing a scoring system for your own app, the key takeaway is this: separate the measurement from the display. Track skill accurately under the hood. Then invest serious thought in how you present that skill to the player. Because at the end of the day, a score system is not really about measuring skill. It is about making the player want to keep playing.