Reqflow
← All comparisons

Web Crawler vs Instagram Feed

Web Crawler

Web Crawler

URL frontier, politeness, deduplication, distributed fetching.

Components (8)

  • Seed URLs
  • URL Frontier
  • Fetcher Service
  • Parser Service
  • Bloom Filter
  • robots.txt Cache
  • Content Store
  • URL Metadata DB

Headline numbers

  • Crawl throughput needed~385 pages/sec
  • Storage for content~20 TB / month
  • URL Frontier queue depth~10B entries
Instagram Feed

Instagram Feed

Fan-out on write vs read, ranking, CDN.

Components (10)

  • Mobile Client
  • CDN
  • API Gateway
  • Feed Service
  • Post Service
  • Kafka
  • Fan-out Worker
  • Redis
  • Postgres
  • S3

Headline numbers

  • Posts / sec (avg)~3,000/sec
  • Feed read QPS~58,000/sec
  • Fan-out writes / sec~600,000/sec

Key differences

Only in Web Crawler
None.
In both
  • Client
  • Queue
  • Service
  • Cache
  • Storage
  • Database
Only in Instagram Feed
  • CDN
  • API Gateway

Flow shape

Web Crawler flows
  • Crawl a page8 steps
  • Fetcher worker crashes4 steps
Instagram Feed flows
  • Post a photo (fan-out on write)7 steps
  • Open feed (precomputed timeline)5 steps
  • Fan-out Worker is down7 steps
  • Timeline cache flushed (Redis restart)4 steps