Scaling WebSockets to 1M Concurrent Connections

When I first inherited the WebSocket infrastructure at my previous company, we were handling around 10,000 concurrent connections. The system worked, but barely. Today, I'll walk you through the journey of scaling that same system to handle over 1 million concurrent connections.

The Initial Architecture

The original setup was simple—perhaps too simple:

typescript

// The 400">"simple" approach that got us to 10K connections
400">"text-primary">const wss = new WebSocket.Server({ port: 8080 });

wss.on(400">'connection', (ws) => {
  ws.on(400">'message', (message) => {
    // Broadcast to all clients
    wss.clients.forEach((client) => {
      400">"text-primary">if (client.readyState === WebSocket.OPEN) {
        client.send(message);
      }
    });
  });
});

This approach has several fundamental problems at scale:

▸Single process limitation: Node.js runs on a single thread

▸Memory pressure: Each connection consumes memory

▸Broadcast inefficiency: O(n) message distribution

The First Bottleneck: Connection Limits

The first wall we hit was around 65,000 connections. This is the default ephemeral port range on most Linux systems. The fix was relatively straightforward:

bash

# Increase the port range
echo 400">"1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

# Increase file descriptor limits
ulimit -n 1000000

But this only bought us time.

Horizontal Scaling with Redis Pub/Sub

The real scaling came from horizontal scaling. We introduced Redis as a message bus:

typescript

400">"text-primary">import Redis 400">"text-primary">from 400">'ioredis';

400">"text-primary">const publisher = new Redis();
400">"text-primary">const subscriber = new Redis();

// Each server subscribes to channels
subscriber.subscribe(400">'broadcast');

subscriber.on(400">'message', (channel, message) => {
  localClients.forEach(client => client.send(message));
});

// Publishing goes through Redis
400">"text-primary">function broadcast(message: string) {
  publisher.publish(400">'broadcast', message);
}

This allowed us to scale horizontally, but introduced new challenges around connection affinity and message ordering.

Lessons Learned

1.Start with observability: We should have instrumented everything from day one

2.Test at scale early: Synthetic load testing saved us from many production incidents

3.Plan for failure: Every component will fail; design for graceful degradation

The best time to plan for scale is before you need it. The second best time is now.

What's Next

In the next post, I'll cover how we implemented connection draining during deployments—a surprisingly complex problem that took us three attempts to get right.