Implementing FSRS-5 Spaced Repetition From Scratch in Flutter

When I started building BloomCard, the algorithm decision felt like the most consequential one I'd face. Spaced repetition is the entire reason the app exists — get it wrong and every study session is quietly lying to the user about when they should next see a card.

I evaluated two serious candidates: SM-2, the 1987 algorithm still powering Anki, and FSRS-5, developed in 2022 by the open-spaced-repetition community. I went with FSRS-5. Then I spent another two weeks deciding exactly how much of it to implement.

Why FSRS-5 Over SM-2

SM-2 has a lot going for it: 35+ years of production use, extremely simple math, and near-universal developer familiarity. But it has a structural problem — it models memory as a linear interval multiplier. You rate a card "Good", your interval multiplies by 2.5. Rate it "Good" again, multiply by 2.5 again. The formula doesn't account for how recently you reviewed, how difficult the card has historically been for you, or the actual probability that you'd forget it at this moment.

FSRS-5 replaces all of that with a model grounded in memory research, built around three measurable quantities:

Stability (S): The half-life of your memory for that specific card, measured in days. A card with S = 20 means you'd likely still remember it 20 days after reviewing.
Difficulty (D): A 1–10 score representing how intrinsically hard the card is for you personally.
Retrievability (R): The probability you can recall it right now, which decreases as a function of elapsed time and stability.

The forgetting curve that drives all scheduling is:

R(t, S) = (1 + 19/81 · t/S)^(-0.5)

This is derived from the empirically-observed shape of human forgetting curves. Because interval scheduling targets a desired retention rate — I set desiredRetention = 0.9 — every card's next review date reflects its actual memory state rather than an arbitrary multiplier.

Research published alongside FSRS-5 shows it achieves the same recall rate as SM-2 with roughly 30% fewer reviews. That's a meaningful difference when you're building a daily-habit product.

The State Machine: newCard → learning → review ↔ relearning

Every card in BloomCard lives in one of four states:

enum CardState { newCard, learning, review, relearning }

The transitions are deliberately simple:

newCard → review: You see a card for the first time and rate it Hard, Good, or Easy — it moves straight into the long-term review cycle.
newCard → learning: You rate a brand-new card Again (immediate failure) — it stays in short-term acquisition.
review → relearning: You fail a review card (Again rating) — the card drops out of long-term memory and needs re-acquisition.
relearning → review: You pass the card again after a lapse — it graduates back to long-term scheduling.
review → review: Normal successful review — stability grows and the interval lengthens.

Here's the core transition logic from fsrs_engine.dart:

if (card.state == CardState.newCard) {
  updated.difficulty = _initDifficulty(rating);
  updated.stability = _initStability(rating);
  updated.reps = 1;
  updated.lapses = rating == Rating.again ? 1 : 0;
  updated.state =
      rating == Rating.again ? CardState.learning : CardState.review;
} else {
  final elapsedDays = card.lastReview != null
      ? now.difference(card.lastReview!).inHours / 24.0
      : 0.0;
  final r = retrievability(elapsedDays, card.stability);

  updated.difficulty = _nextDifficulty(card.difficulty, rating);

  if (rating == Rating.again) {
    updated.stability =
        _nextStabilityAfterFail(card.difficulty, card.stability, r)
            .clamp(0.1, double.infinity);
    updated.lapses = card.lapses + 1;
    updated.state = CardState.relearning;
  } else {
    updated.stability = _nextStabilityAfterSuccess(
        card.difficulty, card.stability, r, rating);
    updated.reps = card.reps + 1;
    updated.state = CardState.review;
  }
}

The reps and lapses counters are the card's permanent history — they never reset. lapses feeds into the achievement system (BloomCard has 37 achievements), and both numbers are used in stats screens.

Rating Buttons: Again, Hard, Good, Easy

Every review ends with one of four ratings. The semantics:

Again (~10 min): "I forgot this completely"
Hard: "I remembered but it was a struggle"
Good: "I remembered with normal effort"
Easy: "I remembered without any effort"

For a brand-new card, initial stability comes directly from the first four FSRS-5 default parameters, indexed by rating:

static const List<double> defaultParams = [
  0.40255, 1.18385, 3.173, 15.69105, // w0-w3: initial stability for Again/Hard/Good/Easy
  7.1949,  // w4: difficulty mean reversion
  0.5345,  // w5: difficulty mean reversion speed
  1.4604,  // w6: stability after success multiplier
  // ... 12 more parameters
];

double _initStability(Rating rating) {
  return params[rating.index];
}

Rate a new card Easy: initial stability = 15.7 days. Rate it Hard: 1.2 days. Rate it Again: 0.4 days (about 10 hours). These aren't guesses — they're empirically derived from millions of real review logs in the open-spaced-repetition research corpus.

For the Again case specifically, I bypass the interval formula entirely and hardcode 10 minutes:

final interval = rating == Rating.again
    ? 0.00694 // ~10 minutes in days (10 / 1440)
    : nextInterval(updated.stability);
updated.due = now.add(Duration(minutes: (interval * 24 * 60).round()));

When you forget a card, you see it again in 10 minutes. The session keeps moving.

Stability Concept and the Interval Formula

Stability is the number of days after which your recall probability drops to exactly your desiredRetention. The interval calculation inverts this relationship to find the optimal review day:

double nextInterval(double stability) {
  return (stability / (19 / 81) * (pow(desiredRetention, -2.0) - 1))
      .clamp(1.0, 36500.0);
}

With desiredRetention = 0.9, this simplifies to roughly interval ≈ stability. A card with stability 10 gets scheduled in about 9 days. Stability 60 gives roughly 55 days. The near-linear relationship means stability is interpretable at a glance: "this card has a 60-day half-life" is a statement a non-expert can reason about.

The success stability formula shows how stability compounds across correct reviews:

double _nextStabilityAfterSuccess(
    double d, double s, double r, Rating rating) {
  final hardPenalty = (rating == Rating.hard) ? params[14] : 1.0;
  final easyBonus = (rating == Rating.easy) ? params[15] : 1.0;
  return s *
      (1 +
          exp(params[6]) *
              (11 - d) *
              pow(s, -params[7]) *
              (exp((1 - r) * params[8]) - 1) *
              hardPenalty *
              easyBonus);
}

Two things stand out:

(11 - d): Easier cards grow stability faster. Difficulty 1 gives maximum growth; difficulty 10 grows slowly. This is why hard cards require more rehearsal to reach the same stability level.
(exp((1 - r) * params[8]) - 1): The lower your retrievability at review time, the bigger the stability gain. Reviewing a card right at the forgetting threshold gives more benefit than reviewing it while you still remember it clearly. This is why cramming works short-term but fails long-term — high-R reviews at the beginning of study sessions don't build the stability that time-spaced reviews do.

Simplified Implementation: What I Kept vs. Cut

The full FSRS-5 specification is considerably more involved. Here's my honest accounting:

What I kept:

All 19 default parameters (w0–w18), empirically derived from the research corpus
The complete DSR model with separate formulas for success and failure
Power-law forgetting curve with the retrievability() function
Difficulty mean reversion — hard cards slowly drift back toward average difficulty
Hard penalty (parameter w14 = 2.27) and Easy bonus (w15 = 0.23)
Stability clamped to 0.1 minimum on failure (memory isn't fully erased)

What I cut:

Per-user parameter optimization. The full FSRS-5 spec includes an optimizer that tunes all 19 parameters using gradient descent on each user's actual review history. This is the algorithm's biggest advantage over any fixed-parameter system — your scheduler adapts to you. I skipped it. Building the optimizer adds substantial complexity and requires 500–1000+ reviews per card type before it has enough signal. I use the default parameters for everyone.
Fuzz factor. Full FSRS adds small random variations to prevent review bunching — too many cards coming due on the same day. My scheduling is deterministic.
Learning sub-states. The spec defines nuanced step sequences for the learning and relearning states (like Anki's configurable "1 min → 10 min → graduated" pipeline). I simplified to a flat 10-minute retry for Again.
Same-day review handling. If you review a card multiple times in a day, my engine treats each review independently. Full FSRS has special intra-day logic.

The honest tradeoff: I estimate roughly 10% lower scheduling accuracy compared to a fully personalized FSRS-5 implementation. In exchange, the engine is about 180 lines of Dart, executes in microseconds on any device, has zero ML dependencies, and has 100% branch coverage in the test suite. For a consumer flashcard app where most users have fewer than 1,000 cards, this tradeoff is worth taking.

Growth Visualization: Seed → Sprout → Bud → Bloom

This is where the algorithm connects to what makes BloomCard visually different from other flashcard apps. Stability — that days-number representing memory half-life — maps directly to each card's growth stage in the garden:

// garden_utils.dart
static GrowthStage getGrowthStage(double stability) {
  if (stability < 5) return GrowthStage.seed;
  if (stability < 20) return GrowthStage.sprout;
  if (stability < 60) return GrowthStage.bud;
  return GrowthStage.bloom;
}

The thresholds aren't arbitrary aesthetic choices. They reflect real memory states:

Seed (S < 5): Memory hasn't consolidated. A few days without review and it'll wither. You're reviewing every 1–5 days.
Sprout (5 ≤ S < 20): Memory is forming. You can go a week or two without losing it. Review intervals of roughly one week.
Bud (20 ≤ S < 60): Solid knowledge. Monthly review territory. Approaching mastery.
Bloom (S ≥ 60): Long-term memory. The card is durable enough that even without reviewing, you'd likely recall it a month later. Review intervals of two months or more.

A card first rated Easy starts with stability ~15.7 — it becomes a sprout on day one. A card you struggled with starts at ~0.4 — it stays a seed for several review cycles. When a card hits Bloom, it unlocks a unique flower variety for your collection. There are 21 flower species across four rarity tiers (Common, Uncommon, Rare, Legendary). The act of reaching stability 60 is the trigger for the collection unlock — the algorithm and the reward system are directly coupled.

The garden screen renders every card in a grid, sorted by creation date (newest at top), each drawn by a CustomPainter class — no emoji anywhere in the app. Cards created within the last 7 days get a green "NEW" badge. When you open the app, you can see at a glance which cards are bloomed, which are budding, and which still need attention. The garden's visual state is a direct readout of your actual memory state, rendered in a form anyone can understand without knowing what "stability" means.

Five Study Modes, One Algorithm

FSRS scheduling runs under all five study modes. The mode determines how you interact with the card; the rating you give runs through the same engine.

Normal: Classic card flip — see front, flip to reveal back, rate yourself. The simplest path through the algorithm.

Reverse: Cards show back-first. You try to recall the front. Useful for language learning where production (generating the target language) matters as much as recognition. The reversed=true flag only affects the UI; FSRS processing is identical.

Typing with Levenshtein matching: You type your answer directly. Correctness is judged by Levenshtein distance ≤ 2 — minor typos and capitalization differences don't count as failures. This removes the "I knew it really" self-deception that plagues self-rated flashcard systems and forces actual retrieval. A correct answer maps to Good, incorrect to Again.

Quiz: Four-choice multiple choice. Distractors are generated from other cards in the same deck. Requires at least 4 cards in the deck. Correct → Good, incorrect → Again.

Interleaving: Pulls due cards from all your decks into a single shuffled session. Each card shows a deck-name badge for context. Research consistently demonstrates that interleaved practice produces better long-term retention than blocked practice (studying one subject at a time). This mode applies FSRS scheduling across your entire card library simultaneously.

Extra Study mode — available after a session completes — lets you keep reviewing already-due cards beyond the daily target. Each extra card still runs through FSRS and updates stability normally.

The Data Model

Every FsrsCard carries its full scheduling state as columns in the Drift/SQLite database:

class FsrsCard {
  CardState state;
  double stability;
  double difficulty;
  int reps;
  int lapses;
  DateTime? lastReview;
  DateTime due;
}

Every review also writes a ReviewLog entry — rating, before/after stability values, elapsed days, timestamp. This log is the data that would feed a future parameter optimizer. The infrastructure is there; the optimizer isn't built yet.

Testing the Engine

Because the FSRS engine is a pure class with no Flutter dependencies, testing it is straightforward. The engine takes a card, a rating, and a timestamp; produces an updated card with a new due date. No mocking, no database, no UI framework.

The test suite covers all state machine transitions exhaustively — new cards to learning vs. review, review cards lapsing to relearning, stability growth across multiple successful reviews, difficulty convergence over repeated reviews. The forgetting curve gets dedicated tests verifying that retrievability equals exactly 0.9 when elapsed days equals stability. The interval formula gets boundary tests at both ends of the 1–36,500 day clamp. The engine has 100% branch coverage.

garden_utils_test.dart separately validates the growth stage boundaries — that stability 4.99 maps to seed, 5.0 maps to sprout, 19.99 maps to sprout, 20.0 maps to bud, and so on.

Looking Forward

The most significant missing piece is parameter personalization — building the optimizer that tunes the 19 FSRS parameters to each user's review history. The data infrastructure is already in place (ReviewLogs table with before/after stability on every review). The optimizer itself is the missing piece.

The other direction is making more of the stability data visible in the stats screen. Currently you see garden health (ratio of non-seed cards) and review streaks. A stability distribution histogram — showing how many of your cards sit in each stability bucket — would let users understand not just whether their garden looks full, but how deeply they actually know their material.

For more on how the progression system connects garden growth to XP levels, streaks, and the broader reward architecture, see the progression systems post.

The algorithm itself isn't magic. It's a formula that determines: review this card at this specific moment to make the most efficient use of your study time. What made building it interesting was making that formula visible — translating stability numbers into a garden, and turning scheduled intervals into flowers that grow at exactly the right pace.