Vijay Pagare

Design: Activity Feed

The Problem

In an activity feed (like LinkedIn or Instagram), we need to track which posts a user has already seen.

1. The Storage Strategy (The Write Path)

If we use a traditional SQL database, every “seen” event requires the DB to find a specific row and update it. This involves “Random I/O,” which is slow. At 100k+ events per second, a standard relational database will struggle with locking and disk contention.

The Solution: Use an LSM-Tree based storage engine (like Cassandra or ScyllaDB). In this model, “Seen” events are treated as “appends.” They are written to a log in memory and eventually flushed to disk as sorted files. It’s significantly faster because the system doesn’t “search” for anything during the write; it just adds the data to the end of the log.

2. The Efficiency Trick: Bloom Filters

When a user refreshes their feed, the system might fetch 50 candidate posts and must check: “Has the user seen any of these?”

Querying the database 50 times per refresh is too expensive. Instead, we use a Bloom Filter.

3. The Architecture Flow

  1. Ingestion: User scrolls -> Event is sent to an Asynchronous Queue (like Kafka).
  2. Processing: A background worker picks up the event and writes it to the LSM-Tree Database.
  3. Filtering: When the feed is generated, the system checks the Bloom Filter to skip seen content instantly.

Summary

When designing for high-volume status tracking, prioritize Write Speed by using append-only storage patterns and protect your Read Latency by using probabilistic data structures like Bloom Filters.

#SystemDesign #Engineering