Docs / Reed-Solomon erasure coding
Reed-Solomon erasure coding, explained without math
How a 1960s coding theorem lets you store data in N pieces and recover everything from any K of them.
TL;DR
Reed-Solomon takes K original pieces of data and generates N − K extra "parity" pieces using polynomial math. Store all N somewhere. Lose any N − K of them, and the survivors mathematically reconstruct the missing ones. It powers QR codes, RAID 6, deep-space probe transmissions, and ShardHex's cross-cloud splitting.
The problem it solves
Things go missing. Cloud accounts get banned, hard drives fail, network packets drop, QR codes get coffee spilled on them, satellite signals fade behind clouds. If your data was a single chunk, losing any part of it usually means losing all of it.
We want a way to keep our data even when some of it disappears — without paying the storage cost of just keeping N copies of everything.
The naive approach: replication
The simplest fix is to store the file three times. Lose any two copies, you still have it. This is what RAID 1 mirroring does, what Google Drive does internally, what most people mean when they say "back it up to two places".
It works, but it's expensive. To survive losing 2 copies, you need 3× the storage. To survive losing 3 copies, 4×. The cost grows linearly with how much loss you want to tolerate.
And if you lose 3 of 3 copies — coordinated cloud outage, all your hard drives die in the same week, account got fully wiped — there's no way back. Replication is binary: works or doesn't.
The smarter approach: erasure coding
What if you could store, say, 5 pieces total and survive losing any 2 of them — at only 1.67× the storage instead of 3×?
That's what erasure coding does. The trick: instead of storing 5 identical copies, you store 3 "data" pieces (containing slices of your original file) and 2 "parity" pieces (containing math derived from the data). Now any 3 of the 5 — in any combination — can rebuild everything.
Reed-Solomon, invented in 1960 by Irving Reed and Gustave Solomon, is the most widely deployed erasure coding scheme. It scales to any (K, N) you want.
A tiny worked example
Real Reed-Solomon uses Galois field arithmetic for performance and exactness. We'll use ordinary integer arithmetic instead — it's not how the algorithm actually works internally, but the intuition is the same.
Suppose your "file" is just two numbers: 3 and 5. We want to encode them into 4 pieces such that any 2 can recover the originals.
Encode:
- Piece 1 = 3 (a data piece — just the first number)
- Piece 2 = 5 (a data piece — just the second number)
- Piece 3 = 3 + 5 = 8 (a parity piece)
- Piece 4 = 3 + 2 × 5 = 13 (another parity piece)
Store these 4 pieces in 4 different places. Now things go wrong.
Scenario A — lose Piece 1:
You still have 5, 8, 13. Compute: 3 = 8 − 5 = 3. ✓ Recovered.
Scenario B — lose Pieces 1 and 2 (both data!):
You still have only 8 and 13, the parity pieces. Solve the two equations:
- data1 + data2 = 8
- data1 + 2 × data2 = 13
Subtract first from second: data2 = 5. Substitute back: data1 = 3. ✓ Recovered.
Scenario C — lose any 3 pieces:
You only have 1 piece. There are infinite (data1, data2) pairs that match — unrecoverable. This matches the rule: "any K = 2 pieces recover, fewer than K cannot".
Why it works (the intuition)
The core mathematical fact: a polynomial of degree K − 1 is uniquely determined by any K points on it.
If I give you 2 points on a line (a degree-1 polynomial, 2 coefficients), you can find that line. If I give you 3 points on a parabola, you can find the parabola. If I give you K points on a degree K−1 polynomial, you can find the polynomial — Lagrange interpolation, every-time.
Reed-Solomon's encoding step:
- Treat your K data pieces as the coefficients of a polynomial.
- Evaluate that polynomial at N different x-values (typically 0, 1, 2, … N − 1).
- Each (x, y) evaluation becomes a "shard". Store the y-values; the x-values are implicit (from the shard index).
Decoding:
- Receive any K of the N (x, y) pairs.
- Lagrange-interpolate to find the original polynomial.
- The polynomial's coefficients are your original data. Done.
That's the entire game. The math is high-school algebra plus a finite-field representation (which makes the operations fast and exact for binary computers, but doesn't change the underlying idea).
Where Reed-Solomon shows up in the wild
You're using Reed-Solomon every day even if you've never heard of it. Some places it lives:
- QR codes — that's why a QR code still scans when half of it is obscured by a coffee cup. The black-and-white squares include Reed-Solomon parity covering up to 30% loss, depending on the error correction level.
- CDs, DVDs, Blu-ray — recovers data despite scratches.
- RAID 6 — survives any 2 simultaneous disk failures using two independent parity blocks per stripe.
- ZFS RAID-Z2 / RAID-Z3 — same idea, single-machine variant.
- Deep-space communications — Voyager, the Mars rovers, Cassini all encode telemetry with Reed-Solomon (typically RS(255, 223)) so that signal fade or cosmic-ray flips don't lose the data.
- Backblaze Vaults — their cloud storage internally splits each file into 17 data + 3 parity pieces across 20 disks in 20 different chassis, surviving any 3-disk failure.
- Microsoft Azure Storage — uses local-reconstruction codes (a Reed-Solomon variant) to keep durability with less storage overhead than 3-way replication.
- ShardHex — splits each file into N erasure-coded shards distributed across up to 12 different cloud providers, surviving the loss of any N − K providers entirely.
Tradeoffs vs. plain replication
| Aspect | 3-way replication | Reed-Solomon (5, 3) |
|---|---|---|
| Storage cost | 3.0× | 1.67× |
| Survives losing | 2 of 3 copies | 2 of 5 shards |
| Encoding CPU | ~zero | Modest (saturates GB/s on modern CPUs) |
| Read performance | Read 1 copy, done | Must read K shards, then decode |
| Partial reads | Trivial — open the copy | Need full file (no random access without re-encoding) |
| Locations needed | 3 | 5+ |
Erasure coding is great for cold storage (you rarely read it) and large-scale durability. It's worse for hot data that needs random access. ShardHex picks the cold-storage tradeoff because backups are cold storage.
How ShardHex uses it
When you split a file in ShardHex:
- The file is encrypted with AES-256-CTR locally on your device.
- The encrypted bytes are chunked into K equal-sized data shards.
- Reed-Solomon generates N − K parity shards.
- All N shards are uploaded to N different cloud providers (Google Drive, S3, Backblaze, MinIO, etc. — up to 12 supported).
- A small manifest file records which shard went where; you keep this on your device.
When you want the file back:
- Read the manifest. Identify the K closest/fastest shards.
- Download them in parallel from K different clouds.
- Reed-Solomon decodes the K shards back into the original encrypted bytes.
- AES decrypts. Original file restored.
If some shards have gone missing — a cloud banned your account, an anonymous host deleted the file, a provider went offline — ShardHex's repair function downloads the surviving K, regenerates the missing N − K, and re-uploads them to fresh locations. The original file never sees the network in plaintext form during any of this.
If you want to go deeper
- Wikipedia article on Reed-Solomon error correction — solid entry-level reference, covers history and several implementation approaches.
- Backblaze's Reed-Solomon blog series — extremely accessible engineering write-up of how they implemented Reed-Solomon for their Vault storage. Their code was open-sourced and is the basis for many other implementations.
- James Plank's papers — the academic deep end. Galois field optimization, Cauchy variants, regenerating codes. Read these if you want to actually implement one.
- QR code error correction — a real-world walkthrough of how Reed-Solomon parameters are picked for a specific use.