Science Library · Pillar 03
Confusion Pairs · Interleaving

Why mixing concepts beats stacking them — by 43%.

The science of interleaving, the math of discrimination, and why “do all the easy ones first” is the most common curriculum mistake in the world.

TL;DR

The finding: Interleaved practice — mixing related-but-distinct concepts in the same session — produces ~43% better long-term retention than blocked practice, where one concept is mastered before moving to the next. The effect is largest precisely for the concepts that learners most commonly confuse.

The mechanism: Learning isn’t just remembering each thing. It’s learning to tell things apart. Adjacent contrast — answering a question about mitosis right after one about meiosis — reveals what makes each one specifically distinct. Blocked practice deprives the learner of this contrast.

The product: The AI Confusion Detector identifies which concept pairs your learners commonly confuse, in your specific content. The AI Memory Coach interleaves those pairs in the same session — exactly when they’re most useful.

Imagine you’re learning the difference between four impressionist painters: Monet, Manet, Cassatt, and Pissarro. You have a stack of paintings — twelve from each artist. Two ways to study them:

Blocked. Look at all twelve Monets. Then all twelve Manets. Then Cassatt. Then Pissarro. Each artist’s work is presented together, so the style soaks in.

Interleaved. Shuffle them. Monet. Manet. Pissarro. Cassatt. Monet. Cassatt. Manet. Pissarro. And so on. You never see two from the same artist consecutively.

If you ask people which approach feels more productive, they’ll almost universally pick blocked. The work feels deeper, more focused. Style A sits more cleanly in your head.

If you test which approach actually works — say, by showing previously-unseen paintings and asking who painted each — interleaved wins, decisively. Kornell & Bjork ran exactly this study in 2008 [2]. Interleaved learners outperformed blocked learners by a wide margin. Yet 80% of the same participants predicted they’d do worse on the interleaved condition.

This gap — between what feels productive and what is actually productive — is at the heart of the interleaving effect.

The canonical study: Rohrer & Taylor, 2007

The clearest applied demonstration came in 2007. Doug Rohrer and Kelli Taylor [1] ran a study with college students learning to compute the volume of four different geometric solids: spheroid, spherical cone, spherical wedge, and half cone.

Group A practiced in blocks: four spheroid problems, then four spherical cone, etc. Group B got the same sixteen problems in interleaved order. Same content. Same time. Same number of repetitions.

Immediately after, blocked practice looked better — group A scored 89% on a quiz, group B scored 60%. The blocked group was visibly more confident.

One week later, both groups returned for a final test. Group A (blocked): 20%. Group B (interleaved): 63%. That’s a 43% absolute gap in long-term retention, in favor of interleaving — even though the blocked group felt more confident immediately after practice. The Brunmair & Richter 2019 meta-analysis [3] covers 32 such studies; the effect replicates broadly, particularly for category-discrimination tasks.

PRACTICE PATTERN AND LONG-TERM OUTCOME Blocked AAAA BBBB CCCC DDDD 20% retention @ 1 week felt easier · scored 89% immediately Interleaved ABCD ABCD ABCD ABCD 63% retention @ 1 week felt harder · scored 60% immediately +43% long-term gap A B C D
Figure 1. Rohrer & Taylor 2007 (approximate). Same 16 problems, same time, same repetitions. Blocked practice produces higher immediate performance but collapses by one week. Interleaved practice feels harder, but 43 percentage points more sticks. The gap widens further at one month.

The mechanism: learning is discrimination, not just memorization

The classical view of learning is that you memorize concepts one at a time, and the depth of each memory determines later recall. Under that view, blocked practice should win: you build a deep representation of A before moving to B.

The interleaving effect tells us this view is wrong, or at least incomplete. What blocked practice gives you is a strong association between the practice context and concept A. What it doesn’t give you is the ability to discriminate: to look at a new problem and decide whether it’s an A-problem or a B-problem.

Carvalho & Goldstone [4] made this explicit: interleaving works because it creates adjacent contrast. When you see a problem about closures right after a problem about scope, you’re forced to actively notice what’s different — what makes a closure specifically a closure and not just “some scope thing.” Blocked practice doesn’t force this comparison, so the comparison never happens.

This reframes what we should actually want from learning: not “Can the learner recall concept A?” but “Can the learner, faced with a novel problem, correctly decide it’s an A-type problem and not a B-type problem?” That second skill — discrimination — is what predicts real-world performance. And blocked practice produces almost none of it.

Blocking gives you fluency you don’t have. Interleaving gives you fluency you do. R. A. Bjork, paraphrased

Confusion pairs in the wild

The 43% lift is biggest precisely for the concept pairs learners commonly confuse. The first step in applying interleaving well is identifying those pairs — and they’re more domain-specific than people realize. A few examples across domains:

Confusion pairs — selected examples across domains
Domain Pair What makes them distinct
Biologymitosis vs meiosisOne produces identical diploid cells (somatic division); the other produces four haploid gametes via two divisions.
Pythonappend() vs extend()append adds the argument as a single element; extend iterates and adds each element from an iterable.
English grammaraffect vs effectAffect is (usually) a verb meaning “influence”; effect is (usually) a noun meaning “result.”
Probabilitypermutation vs combinationPermutations care about order; combinations don’t. P(n,k) ≠ C(n,k) by a factor of k!
Statisticsmean vs medianBoth summarize a distribution, but the mean is pulled by outliers while the median isn’t. Trimodal data breaks both.
Inferencecorrelation vs causationCorrelation says two variables move together; causation says one moves the other. Confounders, reverse causality, and selection break the leap.
Chemistryenantiomer vs diastereomerEnantiomers are non-superimposable mirror images; diastereomers are stereoisomers that aren’t mirror images. They behave differently in physiology.
Hindi grammarहै vs हैंSingular present-tense vs plural present-tense forms of “to be.” Learners commonly switch when subject number changes.

Finding these pairs in your specific content is the operational hard part. Generic confusion-pair lists for English grammar or basic biology are easy to find. Domain-specific pairs for your company’s internal compliance training, your school’s chapter-three math curriculum, or your role-specific certification syllabus — those have to be discovered.

When NOT to interleave

Interleaving isn’t a universal good. Three cases where blocked practice is genuinely better:

  • Procedural skills before fluency. Learning to play a chord on guitar, or to write a basic for-loop, benefits from initial blocked repetition until the basic action is fluent. Interleaving too early creates frustration without payoff.
  • Items that share almost no surface features. If concepts A and B are vastly different, there’s nothing to discriminate. Mixing them doesn’t help and can dilute time-on-task.
  • Very short study sessions with very few items. If you only have 4 minutes and 6 questions, interleaving’s discrimination benefits don’t have room to manifest. Blocked is fine here.

What we don’t yet know

Three open questions our research team is studying:

  • Optimal mixing ratios. If A and B are the confusion pair, what fraction of items should be A vs B vs unrelated items? Empirically the answer depends on the learner’s current discrimination ability — strong learners benefit from harder mixing.
  • Cross-language transfer of confusion pairs. Are confusion pairs the same in Hindi-medium Class 7 Math vs English-medium Class 7 Math? Probably not in all cases. Currently building bilingual confusion-pair models.
  • LLM-generated confusion-pair detection accuracy. Our AI Confusion Detector uses both response patterns (which items are missed together) and content-similarity (semantic overlap of concept definitions). Each signal alone is noisy; together they’re useful — but the calibration is ongoing work.
Applied at Future Proof

How the AI Confusion Detector works.

Two signals: response patterns (which concept pairs your learners consistently miss together) and semantic similarity (concepts whose definitions overlap or whose names look alike). The Detector surfaces a confusion-pair score per pair per cohort, and the AI Memory Coach uses that score to choose whether to interleave the pair in the next session. The Knowledge Map visualizes the pairs so L&D teams can see — and adjust — what the AI is treating as related.

See the AI Confusion Detector
References

Selected papers.

The canonical interleaving studies, the discrimination-mechanism work, and the modern meta-analyses.

  1. Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science 35: 481–498. DOIPDF
  2. Kornell, N., & Bjork, R.A. (2008). Learning concepts and categories: Is spacing the “enemy of induction”? Psychological Science 19(6): 585–592. DOI
  3. Brunmair, M., & Richter, T. (2019). Similarity matters: A meta-analysis of interleaved learning and its moderators. Psychological Bulletin 145(11): 1029–1052. DOI
  4. Carvalho, P.F., & Goldstone, R.L. (2014). Effects of interleaved and blocked study on delayed test of category learning generalization. Frontiers in Psychology 5: 936. DOI
  5. Carvalho, P.F., & Goldstone, R.L. (2015). What you learn is more than what you see: What can sequencing effects tell us about inductive category learning? Frontiers in Psychology 6: 505. DOI
  6. Soderstrom, N.C., & Bjork, R.A. (2015). Learning versus performance: An integrative review. Perspectives on Psychological Science 10(2): 176–199. DOI
  7. Rohrer, D., Dedrick, R.F., & Stershic, S. (2015). Interleaved practice improves mathematics learning. Journal of Educational Psychology 107(3): 900–908. DOI
  8. Birnbaum, M.S., Kornell, N., Bjork, E.L., & Bjork, R.A. (2013). Why interleaving enhances inductive learning: The roles of discrimination and retrieval. Memory & Cognition 41(3): 392–402. DOI
Try it

Want the confusion pairs in your content?

A 20-minute walkthrough on your team’s actual content. We run the AI Confusion Detector and surface the top 10 pairs your learners most commonly miss together — usually with a few that surprise you.

9 citations Reviewed May 2026 Open peer review welcomed