Docs / Choosing N and K parameters
Choosing N and K
How to pick the two numbers that determine your storage overhead, redundancy, and how many cloud accounts you need.
TL;DR
If you don't want to think about this, use N=5, K=3. It survives losing any 2 of your 5 cloud providers, costs about 67% extra storage, and is the right choice for roughly 90% of users.
What N and K actually mean
When you split a file in ShardHex, you choose two numbers:
- N — the total number of shards your file is split into.
- K — the minimum number of shards needed to reconstruct the original. Any K of the N shards is enough — it does not matter which K.
Reed-Solomon erasure coding produces K data shards (containing slices of your file) and N − K parity shards (containing math derived from the data). When you want the file back, ShardHex needs any K shards from the original N — it can rebuild whichever K − x are missing using the parity, as long as x ≤ N − K.
In other words: you can lose up to N − K shards (and therefore up to N − K cloud providers) without losing your file.
The fundamental tradeoff
Storage cost per shard is roughly file_size / K. Total storage across all providers is roughly file_size × (N / K).
So the two knobs work against each other:
- Higher N → more redundancy, more total storage, more cloud accounts required.
- Higher K → less storage overhead per shard, but fewer providers can be lost without losing data.
- Higher N − K (the gap) → more loss tolerance, but pure overhead.
Cheat sheet
For a 10 GB original file:
| Profile | N | K | Per-shard size | Total storage | Survives losing |
|---|---|---|---|---|---|
| Light | 3 | 2 | 5 GB | 15 GB (1.5×) | 1 provider |
| Balanced (recommended) | 5 | 3 | 3.3 GB | 16.7 GB (1.67×) | 2 providers |
| Strong | 7 | 4 | 2.5 GB | 17.5 GB (1.75×) | 3 providers |
| Paranoid | 10 | 5 | 2 GB | 20 GB (2×) | 5 providers |
Notice how going from N=5 to N=7 only adds about 5% to total storage but doubles the number of providers you can lose. The marginal cost of more redundancy is small.
How to actually pick
Step 1 — How many providers might you plausibly lose at once?
"At once" means within a window where you can't react. For most people that's 1 to 3:
- 1 provider — single account ban or one cloud goes down for a day
- 2 providers — same vendor (e.g. Google Drive + YouTube) banned together; or two anonymous hosts (Catbox + GoFile) both deleting your file in the same week
- 3+ providers — coordinated takedown, regional cloud outage cascade, or a long absence where you can't intervene
That number is your N − K.
Step 2 — How many cloud accounts do you actually have?
Pick N to match. ShardHex doesn't enforce one shard per provider, but having two shards on the same provider largely defeats the point — if that provider dies, both shards die.
If you have 3 providers and want to lose 2, you mathematically can't do it with one-shard-per-provider — you'd need at least 4 providers. Either acquire more accounts, accept a smaller N − K, or use the same provider twice (and accept the risk concentration).
Step 3 — How fast does recovery need to be?
Recovery downloads K shards in parallel. Higher K means more parallel connections during restore. For most home users this doesn't matter, but if you're restoring 100 GB and want it back in minutes rather than hours, larger K helps — assuming your bandwidth has the headroom.
Worked examples
Example 1 — Casual user with three free cloud accounts
You have Google Drive (15 GB free), OneDrive (5 GB free), and Dropbox (2 GB free). You want any single one of them to be replaceable without losing data.
Pick: N=3, K=2. Survives losing 1 provider. Each shard is half the file size; you'll need 1.5× the original storage in total, distributed across the three accounts.
Example 2 — Privacy-focused user with five accounts including two anonymous hosts
You have Google Drive, OneDrive, Dropbox, plus Catbox and GoFile (both anonymous, both somewhat unreliable — files can disappear without notice).
Pick: N=5, K=3. Two providers can vanish — including both anonymous ones — without breaking your file. This is the recommended default for a reason.
Example 3 — Self-hosted setup with three MinIO instances and two cloud accounts
You run MinIO on three different machines (home, office, friend's house) plus have Backblaze B2 and Cloudflare R2 paid accounts.
Pick: N=5, K=3. Any combination of two — say one MinIO going offline plus one cloud provider issue — is recoverable. Higher than that and you're paying meaningful storage cost for unlikely simultaneous failures.
Example 4 — Long-term cold archive ("seven years from now")
You're storing important documents for the long haul. You expect at least 1–2 of your providers to disappear or change ownership over that time horizon, and you may not actively monitor them.
Pick: N=7, K=4. Three providers can fail before you even need to repair. Pair with a calendar reminder to verify and repair shards every 12 months, and you've got a setup that survives a decade of cloud market turbulence.
Common mistakes to avoid
K = N (no redundancy)
All shards required to restore. Lose any one shard and the file is gone. This is strictly worse than just storing the file once on a single cloud — you've split it across more providers (more attack surface) without buying any resilience.
K = 1 (full replication)
Each shard is a complete copy of the original file. This works but is just N-way replication — easier to do with rsync, no need for the splitting machinery. ShardHex's value comes from K being meaningfully > 1.
N too high for available providers
If you set N=10 but only have 3 cloud accounts, you'll end up putting 3–4 shards on each provider. When that provider goes down, you lose 3–4 shards at once, easily exceeding N − K. Match N to your actual provider count.
Counting an unreliable provider as a full-weight slot
Anonymous hosts (Catbox, GoFile, Pixeldrain) can delete files for any or no reason. If your N − K is 1 and you put a shard on Catbox, you're effectively betting your data on Catbox specifically. Either don't use anonymous hosts for important files, or budget extra N − K to account for their fragility.
Picking once and never revisiting
The right N/K depends on which providers you currently use. If you add a new account or close an old one, your old N/K might no longer match reality. Revisit the choice every time your provider lineup changes meaningfully.
If you change your mind later
ShardHex doesn't yet support in-place re-coding to different N/K. To change parameters for a file you've already split:
- Download and merge the original file using the existing manifest.
- Re-split it with the new N/K.
- Re-upload the new shards.
- Delete the old shards from your clouds (or let them age out).
For large files this is bandwidth-expensive. Pick a sane N/K up front to avoid having to do this often.
Summary
- If unsure, use N=5, K=3. Survives 2 simultaneous losses, 67% storage overhead, fits the typical 5-cloud user profile.
- The marginal cost of going from N=5 to N=7 is small — consider it for important long-term archives.
- Match N to your actual provider count. Multiple shards on one provider concentrates risk in defeat of the whole point.
- Treat anonymous hosts (Catbox, GoFile) as half-weight slots when budgeting N − K.
- Revisit the choice when your set of cloud providers changes.