Caching: Types, Patterns, and Trade-offs
What is Caching?
Caching is the practice of storing data in temporary storage to enable faster access.
At a systems level, the speed difference is massive:
- Hard disk access: ~1 ms
- Memory (RAM) access: ~100 ns
That’s roughly 10,000x faster.
The entire idea of caching is to avoid expensive operations (disk, DB, network, computation) by serving precomputed or previously fetched data.
Types of Caching
1. External Cache
A separate system acting as a shared cache layer.
- Examples: Redis, Memcached
- Shared across multiple servers
- Suitable for distributed systems
- Involves network overhead
Use this when you need scalability and shared state across instances.
2. In-Process Cache
Cache stored within the application process itself.
- Extremely fast (no network calls)
- Limited by server memory
- Not shared across instances
Best for micro-optimizations and ultra-low latency use cases.
3. CDN (Content Delivery Network)
Optimizes delivery by serving content from geographically closest servers.
- Reduces latency due to network distance
- Ideal for static/media assets
- Offloads traffic from core backend
Commonly used for:
- Images
- Videos
- Static files
4. Client-Side Cache
Caching at the browser level.
- LocalStorage / SessionStorage
- Browser disk caching
- Least controllable
Useful for:
- Reducing repeated API calls
- Improving perceived performance
Cache Architectures
1. Cache-Aside (Lazy Loading)
- App checks cache first
- If miss → fetch from DB → update cache
Pros: Simple, widely used
Cons: Cache inconsistency possible
2. Write-Through
- Write goes to cache and DB simultaneously
Pros: Strong consistency
Cons: Higher write latency
3. Write-Behind (Write-Back)
- Write goes to cache first
- DB updated asynchronously
Pros: Fast writes
Cons: Risk of data loss if cache fails
4. Read-Through
- Cache layer itself fetches from DB on miss
Pros: Cleaner abstraction
Cons: Less control at application level
Eviction Policies
When cache is full, something must be removed.
1. LRU (Least Recently Used)
Removes items not used recently.
2. LFU (Least Frequently Used)
Removes items with lowest access frequency.
3. FIFO (First In First Out)
Removes oldest entries.
4. TTL (Time To Live)
Entries expire after a fixed time.
Choosing the right policy depends on access patterns and data lifecycle.
Common Issues & Deep Dives
1. Cache Stampede (Thundering Herd)
When many requests hit a missing/expired cache simultaneously, all fall back to DB.
Mitigation:
- Request coalescing
- Locking
- Staggered TTLs
2. Cache Consistency
Cache may serve stale data.
Tradeoff:
- Strong consistency → more complexity
- Eventual consistency → simpler but stale reads possible
3. Hot Keys
Some keys get disproportionately high traffic.
Problems:
- Uneven load
- Cache node bottlenecks
Solutions:
- Replication
- Sharding
- Local caching layer
NFRs: When Should You Use Caching?
Caching is most beneficial when:
- Read-heavy workloads
- Expensive queries (joins, aggregations)
- High database CPU usage
- Strict latency requirements
If your system is write-heavy or requires strict consistency, caching needs careful consideration.
How to Introduce Caching (Practical Approach)
Identify bottlenecks
- Slow endpoints
- High DB load
- Repeated queries
Decide what to cache
- Query results
- Computed data
- API responses
Choose cache architecture
- Cache-aside (most common starting point)
Set an eviction policy
- Based on usage patterns
Address downsides
- Stampede prevention
- Consistency strategy
- Monitoring
Final Thought
Caching is one of the highest leverage optimizations in system design.
But it’s not just about adding Redis.
It’s about:
- Understanding access patterns
- Designing for consistency vs performance
- Handling edge cases at scale
Done right, caching can transform system performance. Done poorly, it can introduce subtle, hard-to-debug issues.