System Design Notes

From BitTiger


Template: SNAKE

Scenario: case/interface (Enumerate & Sort)

Necessary: constraint/hypothesis (Ask & Predict)

Application: service/algorithm (Replay & Merge)

Kilobit: data (Append & Choose)

Evolve (Analyze & Trackback)


Node vs Server (physical vs virtual)


Server (Machine, physical)

Service (Process, virtual)



Process (100 – 300)

Thread (1000 – 3000)


Monitor system

S: Where? WhatsApp, WeChat, KuWo

A: 1000 machines x 200 processes = 200k

QPS = 1000 / 60 = 16 request per second

K: Clients upload to one server, avoid set up 2 servers and sync








CPU loading/Memory loading/Disk loading/Network(Connection)

Buffer (lock)

Message queue


File (append) / HDFS / Stream / NoSQL / JSON


Collection (Save in file or HDFS) + Display (Dashboard, MySQL as buffer to improve speed)


KAFKA: subscriber (save, message queue)

User -> Dashboard <—> MySQL <- Spark <– (stream)- KAFKA –(subscription)—> HDFS (log)



S: which device to use? (Shrink to narrow areas) sync

Big file (sync, delete, …)

Divide to multiple chunks (block number as id, or offset)

  1. Duplicate space (same file for user 1 and 2, save 1 copy)
  2. Version number (3)
  3. Backup number (3, history)



Step 1: CS Fundamental

Computer Architecture

Operating System

Computer Network

Software Engineering


Step 2: System Design

Design by yourself


Step 3: Open Source Projects

Kafka, Spark, MongoDB, Redis


Step 4: Hands on

Code by yourself


What should I do if I don’t know the answer? (Pattern Recognition as previous questions)

Calendar -> Google Sheet

Pokemon Go -> Uber, Yelp


3 + 1

Divide and conquer


Break Assumption





User System


Login/Log out


  1. Daily Active User (15% login => total login users => QPS, if QPS > 1000, need to optimize)
  2. Name or password (hashed) is constant length which is easy to store, change, transfer
  3. State (ban for admin, inactive for user self like Douban)
  4. Life or session lifecycle (session list)
  5. Indexed hash to search single item (userId), Tree (binary search tree, B+ tree) to search items in a range
  6. Transformer is an inheritance (Robot + Car, Talk + Run + Transform)
  7. Write transaction log for rollback (system breaks after cut money before update time) – Atomicity (all or nothing)
  8. Checker for unknown or duplicate user (foreign key, unique) – Consistency
  9. Single thread in distributed system, Lock in multiple threads, CAS – Isolation (Read & Write)
  10. 3 copies (2 in same house but different rack, 1 in different house) – Durability (ACID)


Tiny URL

Scenario: Insert, Lookup

Necessary: Daily Active Users (assumed as 1,000,0000)

Application: Hash (number 10 based , or number alphabet 62 based)

Kilobit: Average size of long url (100 byte), short (int, 4 byte), state (4 byte, time limited service), daily new url: 100 * 108 = 10.8MB, yearly new url: 10.8 * 365 = 4GB

Evolve: Random (0, range) to get id to protect from cracker and conflict (try again), cache (preload, replacement-LRU)


Parking Lot

Vehicle, Parking Spot, Parking Lot, Parking Manager, Parking Ticket


Achievement System

3 key points in OO Design: A class know something, do something, is something

Object, AchieveType, Achieve, Achieve Manager, Player, World


Blackjack with AI






Scenario: Ex. ‘d’ -> ‘dog’, ‘dress’

Necessaries: Average latency < 100ms, 95% < 50ms, challenge (every millisecond counts)

Application: algorithm (binary search tree, m as average len of word, n as num of words, time as O(logn + k), space as O(n*(m*char+2*pointer)). Trie tree. Ternary search tree. Database with ‘like’). Architecture: Browser(local cache)->Aggregator(cache, content details) ->Global/Personalized ranking


Facebook: 1. Search different friends, events, books, movies 2. Levels of cache (100 for 1st, 1000 for 2nd)

LinkedIn: 1. Check ‘link’ with inverted index (Searcher) 2. Check boomfilter with ‘linked’ (Searcher) 3. Check forward index with ‘linked’ (Searcher) 4. Assign the docs with scores (Ranker) 5. Merge results from different scores (Merger)



Scenario: 3 feeds list, a timeline, social graph (one direction)

Necessaries: Read timeline: 300K, Write tweet: 5K, Notify: 1 million followers within 1s, Concurrent users: 15m, Daily tweets: 400m


PUSH Model (Lady Gaga storm, 49.7m followers), write

PULL Model (Architecture of the MATRIX, 1B following), read

Set a threshold, push < 90K followers, pull > 100K followers (mixed)

How to choose pull and push (Understand specific scenario, balance between latency/computation/storage, O(n) is not workable for distributed system since single node might cost a lot of time, Simplicity asks for focusing on one, implement and improve one method is easier, like FB use pull(feed list), Ins use push(Fanout) )

How to Speed Up: memory (save space with active user list), cache miss

Interface: Get(userIDList, k, endTime, beginTime)

Indexer -> Blender


Rate Limiter

Limit: QPS < 5

Necessaries: 5 times in every second, in any second

Application: time bucket, (fixed) request list

Evolve: Batch queries (tons of request at the same time, like 10^9), Lock (multi-thread), quota, Token Bucket (space as O(1))


Whats APP

Design Data Structure (User Table(s), Friend Table(s), Channel Table(s), Message Table(n))

Storage (SQL(s) vs No SQL(n), SQL – Relationship, Edit, Correct, Transaction, combinational state. No SQL – K/V pair, Append, Tolerant, Performance, Sequential record)

Memory database (Redis, SQLite)

Architecture: Socket -> Broadcaster -> Router -> User/Channel Manager

Batch message (one head and tail for all messages), Compress content,

Summary: data structure, storage, connection, traffic design

Delay status check (5s)

Data-pull: build a user’s timeline by pulling data from his following.

Data-push: build a user’s timeline by pushing data from his following.

Link-pull: a user pulls the notification from the user every k seconds.

Link-push: the server pushes the notification to the user with WebSocket as soon as it appears

Combine link pull and push: push numbers of notification, pull details of notifications

Summary: Chat/Feed, Push/Pull, Data/Link




Twitter Scaling

Divide and Conquer

Challenge: MySQL is hard to extend (need join to get data in different tables), small change need deploy completely, performance is bad, architecture is messed

Solution: divide storage for tweets, users ,timelines (Redis), social graph

Separate Routing (TFE), Presentation (Web, API, Monorail), Logic (Tweets, Users, Timelines), Storage

Tweet -> Write API -> Fanout -> Timeline Cache (Redis) -> Timeline Service

Tweet -> Write API -> Ingester -> Early Bird (Search Index) -> Blender     (for Search)

Tweet -> Write API -> HTTP/Mobile Push Notification

Open source: Twitter Server, Finagle (communication for services), Service as a Function (call a service)

Separate ‘What to do’ and ‘How to do’ (Functional Design)

Architecture: Service (HTTP/Aggregator/Timeline Service/Tweet Store) -> Twitter Server -> Finagle -> JVM -> Operating System -> Hardware


Twitter Heron: Streaming at Scale

(From Storm to Heron)

Storm: real time data processing in milliseconds

Heron: Reduce responsibility for ZooKeeper, separate resources, optimize configuration, control traffic

Improve CPU usage, throughput, delay


Twitter Search Engine Update History

Search Query: Raw Query->Serialized Query->Rewritten Query->Luene Query

Blender (emphasize on latest tweets, 2% top related tweets, let other department handle implementation)


Uber Architecture

Distributed Web Architectures (from 0 to 1)

Process message: RequestManager (don’t close service, close port i.e. 9000 to hot update which is to update in realtime)

Hot backup: SlaveMater

MongoDB Access:

Peak = 5 (or 10 in 3 months) * Average, like 100qps as future oriented service (rapid growth business)

SnowFall (failure in Python API to process business leads to Node.js parts)

JS is a good stuff (Atwood’s law: any application that can be written in JavaScript, will eventually be written in JavaScript.)


Scaling Uber’s Real-time Market Platform (from 1 to 1K)

Challenge: dynamic request and supply

TChannel is a a good helper for remote call

Google S2 is a good helper for map service

Ringpop is a good helper for distributed storage and load balancer

Solution for data center is broken: Store data back up in app side (mobile phone, tablet, etc.), if data center A is damaged and data center B can read info from app


Uber’s Ringpop and the Fight for Flap Dampening (Uber Internal Discipline RingPop)

Dispatch service:

Requirement: every 4 seconds updating driver’s position (write 1 million/sec), rider request car near by (retrieve 10k/s)

Challenge: single machine cannot satisfy providing service, architecture with center will have issue with center broken, how to implement center-less architecture

Summary: make a circle of initial list and everyone can read the info brief reference, ask k neighbors to confirm chaos or failure, hash 2 copies to ensure no data loss, Actor model to process messages (mail box and single thread).


Google Geometry

Geometry on the Sphere: Google’s S2 Library

From 3D to 2D: Sphere to 6 Squares (shining light from earth core to surfaces)

From 2d to 1D: Iteration all points (Hilbert Curve)



SQL (durability for logs and backup, consistency for transaction, challenge: Logic View vs Physical View)

Challenge: data is bigger and bigger (larger, stronger, more expensive server vs more smaller, cheaper server)

No SQL: 2009 add tags anytime in Twitter

Cluster Database

K/V: Redis, Riak

Document: Mongo DB, Couch DB (put everything in a doc together)

Column-family: HBase, Cassandra (row and column create key)


Graph: Neo4J


Challenge:only one rule to cluster, if search in other rules

Solution: Map-Reduce


Consistency: (logic, copy)

CAP (Partition, Consistency, Availability)


To be or not to be (Be yourself)



Micro vs Macro (Fat)

Subset of SOA (Service-oriented architecture)

Component (independent replacement, update)

Component (Library – function call, http, together. Service – indepent) (UI, Server, DBA — Orders, Shipping, Catalog)


Microservice Trend:

Component is service, Team based on demand, Weaken pipe and strengthen terminal, Separate Management (Relational, Document based, K/V database, Kafaka), Auto Deployment (Continuous Delivery, Phoenix Servers, Blue/Green Deployment, Docker), Design for failure (0.99^10=0.90, 0.99^100=0.37), Evolve Design


Macroservice (easy to develop, data consistent, easy to refactor, better to use in early stage) vs Microservice (Continuous deployment, data usability to tolerate error/loss, easy to maintain, cross platform, better to use in late stage)



Microservices is a group of independent demand components via HTTP communication

Believe in Microservice without specifying scenarios is the source of pain

Aim high and sweat in details (Buddhism is a way for psychology cure, not solution for everything)



The Google File System

Save huge large file

Paper is to solve certain problem, don’t memorize it

I/O is core


BigTable (Data Model): A Distributed Storage System for Structured Data

Key-Value store huge large file


MapReduce (Algorithm): Simplified Data Processing on Large Clusters

Map (disassemble) Reduce (assemble)


MapReduce is Divide&Conquer



Git Internal

User System



About liyao13

Yao Li is a web and iOS developer, blogger and he has a passion for technology and business. In his blogs, he shares code snippets, tutorials, resources and notes to help people develop their skills. Donate $5 to him for a coffee with PayPal at About Me page and read more professional and interesting technical blog articles. Follow him @Yaoli0615 at Twitter to get latest tech updates.
This entry was posted in CS Research&Application, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s