Database Sharding: What They Don't Tell You
Database sharding sounds simple in theory: split your data across multiple databases to handle more load. In practice, it's one of the most complex architectural decisions you can make.
The Trigger
Our PostgreSQL instance was handling 50,000 queries per second. Response times were creeping up, and we were already on the largest available instance. Sharding seemed inevitable.
The Decision
We considered three sharding strategies:
We chose hash-based sharding on user_id for even distribution:
typescript
400">"text-primary">function getShardId(userId: string, numShards: number): number {
400">"text-primary">const hash = crypto.createHash(400">'md5').update(userId).digest(400">'hex');
400">"text-primary">return parseInt(hash.substring(0, 8), 16) % numShards;
}The Hidden Costs
What we didn't anticipate:
Would I Do It Again?
Yes, but I'd start with read replicas and caching first. Sharding should be a last resort, not a first solution.