How CDNs Use Consistent Hashing to Prevent Cache Stampedes
This article explains the concept of consistent hashing, a fundamental technique used in distributed systems like CDNs and databases. It contrasts consistent hashing with simpler modulo hashing, which can cause 'cache stampedes' when the number of servers changes. The piece details how a 'hash ring' and 'virtual nodes' solve this by minimizing the number of keys that need to be remapped when servers are added or removed, thus improving stability and load distribution.
Key Takeaways
- Simple modulo hashing (`hash(key) % N`) causes mass key remapping and cache invalidation when the number of servers (N) changes.
- Consistent hashing solves this by mapping both servers and keys to a 'hash ring,' ensuring only the keys belonging to an added or removed server need remapping.
- To prevent uneven load distribution ('hotspotting'), implementations use 'virtual nodes,' mapping each physical server to multiple points on the ring.
- The technique is foundational to large-scale distributed systems, including CDNs like Akamai and databases like Amazon DynamoDB and Apache Cassandra.
Why It Matters
For any large-scale streaming service, cache stability is critical to performance and cost. Consistent hashing is the underlying mechanism that allows CDNs and backend data stores to scale dynamically without causing catastrophic failures. It directly impacts user experience by ensuring high cache hit ratios and protects origin servers from being overwhelmed during server changes. As services increasingly depend on distributed architectures, understanding how components like virtual nodes address load balancing is key to diagnosing hotspotting issues and ensuring infrastructure resiliency during scaling events.
Read full article at hackernoon.com
