The core of technology industry forecasting is to judge the timing of large-scale commercial use based on the industry foundation.
Super cycles are typical characteristics of technological revolutions, driven by a core technology and a series of supporting technologies.
The artificial intelligence revolution has already occurred and will combine with more supporting technologies, spreading to more fields, but the timing will vary.
Areas worth paying attention to in the near term: intelligent services, robotics, MR+AI.
Consistency: all clients see the same data at the same time no matter what node they connect to
Availability: any client which requests data gets a response even if some of the nodes are down
Partition Tolerance: the system continues to operate despite network partitions (a communication break between two nodes)
Goal and Solution
Ability to store big data: use consistent hashing to spread load across servers
High availability reads: data replication, multi-data center setup
High availability writes: versioning and conflict resolution with vector clocks (serverId, version)
Dataset partition: consistent hashing
Incremental scalability: consistent hashing
Heterogeneity: consistent hashing
Tunable consistency: Quorum consensus (R + W > N)
Failure detection: heartbeat sent to other nodes (mark down if no signal after the threshold time)
Handling temporary failures: sloppy quorum and hinted handoff (slack restriction, use first healthy W and R nodes, ignore down nodes, push back after old node is up)
Handling permanent failures: Merkle tree (tree level with hash, check hash from root until find the diff)
Handling data center outage: cross-data center replication
Chapter 7 – Design a unique ID generator in distributed system
Multi master replication: use DB auto_increment feature, increase by k which is the number of DB servers in use (i.e. DB s1 has ID 1, 3, 5… DB s2 has ID 2, 4, 6…)
UUID: universally unique identifier (128 bit, number and alphabetic)
Ticket Server: centralized auto_increment in a single DB server (Ticket Server)
Twitter Snowflake ID Generator: Sign bit (1 bit) + Timestamp (41) + Datacenter ID (5) + Machine ID (5) + Sequence number (12 bits, increment by 1, reset to 0 every millisecond)
Pros
Cons
UUID
* Generating UUID is simple* Easy to scale (web server is responsible for generating IDs they consume)
* 128 bits long* IDs don’t go up with time* IDs could be non-numeric
Ticket Server
* Numeric IDs* Easy to implement, works for small to medium applications
Single point failure (system is down if ticket server goes down)
Clock synchronization (Network Time Protocol)
Section length running (fewer sequence numbers but more timestamp bits are effective for low concurrency and long-tem applications)
High availability
Chapter 8 – Design a URL Shortener
HashMap (memory is limited)
Hash function (CRC32 / MD5 / SHA-1) + collision resolution (append a new predefined string, bloom filter)
Base62 Hash
Hash + collision resolution
Base 62 conversion
Fixed short URL length
Short URL length is not fixed. It goes up with the ID
Does not need a unique ID generator
Depends on a unique ID generator
Collision is possible and needs to be resolved
No collision as ID is unique
Not possible to figure out the next available short URL as it doesn’t depend on ID
Easy to figure out the next available short URL if ID increments by 1 for a new entry. (This can be security concern)
Chapter 9 – Design a web crawler
URL Frontier: store URLs to be downloaded (politeness – frequency / priority / freshness – update)
HTML Downloader: download web pages from the internet using the HTTP protocol (Robots.txt)
DNS Resolver: translate URL to IP address
Content Parser: parse and validate web page (malformed page provoke problems and waste storage)
Content Seen?: eliminate data redundancy and shorten processing time (compare hash vs character)
Content Storage: store HTML content (most in disk, popular ones in memory) – data type, size, access frequency, life span
URL Extractor: parse and extract links from HTML pages (convert relative path to absolute one)
URL Filter: exclude certain content types, file extensions, error links, and URLs in blacklisted sites
URL Seen?: keep track of URLs that are visited before or already in the Frontier, avoid adding the same URL (hash table / bloom filter)
URL Storage: store already visited URLs
Web crawler workflow
DFS vs BFS
Performance optimization (distributed crawl, cache DNS resolver, geo-locality, short timeout)
Robustness (consistent hashing to add / remove servers, save crawl states and data, exception handling, data validation)
Extensibility (modules to download HTML web pages, image, audio, video, PDF, etc / detect copyright and trademark infringements)
Feed publishing API – POST /v1/feed (content, user_id, auth_token)
Newsfeed retrieval API – GET /v1/feed (user_id, auth_token)
Web servers / Fanout service
Pros
Cons
Fanout on write (news feed is pre-computed during write time)
* news feed is generated in real-time and can be pushed immediately* Fetching news feed is fast (pre-computed during write time)
* hotkey problem (if a users has many friends, fetching friend list and generating feeds are slow* For inactive users or those rarely log in, pre-computing news feeds waste computing resources
Fanout on read (news feed is generated during read time, on-demand model)
* For inactive users or those rarely log in, fanout on read works better (not waste computing resources)* No hotkey problem (data is not pushed to friends)
Fetching news feed is flow (not pre-computed)
Hybrid approach (push for most users, pull for celebrities or users who have many friends / followers)
Fanout service
1 Fetch friend IDs from the graph database (Neo4j)
2 Get friends info from the user cache / DB
3 Send friends list and new post ID to the message queue
4 Fanout workers fetch data from the message queue and store news feed data in the news feed cache
5 Store <post_id, user_id> in news feed cache
Newsfeed retrieval
1 A user sends a request to retrieve its news feed (/v1/feed)
2 The load balancer redistributes requests to web servers
3 Web servers call the news feed service to fetch news feeds
4 News feed service gets a list post IDs front the news feed cache
5 The news feed service fetches the complete user and post objects from caches (userName, profile picture, post content, images, etc)
6 The fully hydrated news feed is returned in JSON format back to the client to render
Cache architecture
News Feed (news feed)
Content (hot cache, normal)
Social Graph (follower, following)
Action (liked, replied, others)
Counters (like, reply, others)
Scaling DB (Vertical scaling vs Horizontal scaling, SQL vs NoSQL, Master-slave replication, Read replicas, Consistency models, DB sharding)
Send => chat service (store and relay message) => receiver
Polling: client periodically asks the server if there are messages available (costly, inefficient)
Long polling: client holds the connection open until there are actually new messages available or timeout threshold reached. Once a client receives new messages, it immediately sends another request, restarting the process. (Sender and receiver may not connect to the same chat server in stateless architecture, a server has no good way to tell if a client is disconnected, inefficient if a user doesn’t chat much)
Web Socket: most common solution for sending asynchronous updates from server to client. (Connection is initiated by the client, bi-directional and persistent. It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection)
Stateless: service discovery (Apache Zookeeper) / Authentication / Group management / User profile
Stateful: chat service (Web Socket)
Third party: push notification
Scalability
Chat servers: message sending / receiving
Presence servers: manage online / offline status
API servers: user login, sign up, change profile, etc
Notification servers: send push notifications
Key-value store: store chat history (easy horizontal scaling, low latency, handle long tail data, HBase / Cassandra)
Multi language: store Unicode in Trie nodes (Unicode is an encoding standard covers all the characters for all the writing systems of the world, modern and ancient)
Queries in one country are different from others: build tries for different countries, adopt CDNs
Support the trending (real-time) search-queries: reduce working data by sharding, change the ranking model and assign more weight to recent search queries, data may come as streams (we don’t have to access to all data at once – Hadoop MapReduce, Apache Spark Streaming / Storm / Kafka)
Chapter 14 – Design a Youtube
Requirements: upload video fast, smooth video streaming, change video quality, low infrastructure cost, high availability, scalability, reliability, clients supported (mobile apps, web browser, smart TV)
High level design
Client: computer, mobile phone, smart TV
CDN: store videos for fast streaming (original file is stored in blob file storage like Amazon S3)
API Servers: feed recommendation, generate video upload URL, update metadata DB and cache, signup
Video uploading flow
User: watch a video on devices such as computer, mobile phone, or smart TV
Load balancer: evenly distributes requests among API servers
API servers: all user requests go through API servers except video streaming
Metadata DB: store video metadata, sharded and replicated to meet performance and high availability
Metadata Cache: cache video metadata and user objects
Original storage: store original videos via blob storage (Binary Large Object) – a collection of binary data stored as a single entity in a database management system
Transcoding servers: video encoding, convert a video format to other formats (MPEG, HLS, etc) to provide best video streams for different devices and bandwidth capabilities
Transcoded storage: blob storage storing transcoded video files
CDN: cache videos for fast streaming
Completion queue: message queue storing info about video transcoding completion events
Completion handler: a list of workers that pull event data from the completion queue and update metadata cache and DB
Flow a: upload the actual video
1 videos are uploaded to the original storage
2 Transcoding servers fetch videos from the original storage and start transcoding
3 Once transcoding is complete, following two steps are executed in parallel:
3a – Transcoded videos are sent to transcoded storage
3b – Transcoding completion events are queued in the completion queue
3a1 – Transcoded videos are distributed to CDN
3b1 – Completion handler contains a bunch of workers that continuously pull event data from the queue
3b1a and 3b1b – Completion handler updates the metadata DB and cache when video transcoding is done
4 API servers inform the client that the video is successfully uploaded and is ready for streaming
Flow b: update the metadata
While a file is being uploaded to the original storage, the client in parallel sends a request to update the video metadata. The request contains video metadata including file name, size, format, etc. API servers update the metadata cache and DB.
Video streaming flow
Popular streaming protocols:
MPEG-DASH: Moving Picture Experts Group, Dynamic Adaptive Streaming over HTTP
Apple HLS: HTTP Live Streaming
Microsoft Smooth Streaming
Adobe HTTP Dynamic Streaming (HDS)
Design deep dive
Video transcoding
Raw video consumes large amount of storage space
Many devices and browsers only support certain types of video formats
Ensure users watch high-quality video while maintain smooth playback (based on bandwidth)
Network condition can change (especially on mobile devices) – switching video quality automatically or manually based on network condition
Encoding formats
Container: contain video file, audio and metadata (.avi, .mov or .mp4)
Codecs: compression and decompression algorithms to reduce size (H.264, VP9, HEVC)
Directed acyclic graph (DAG) model
Tasks: Inspection / Video encoding (360p/480p/720p/1080p/4k.mp4) / Thumbnail / Watermark
Video transcoding architecture
Preprocessor: video splitting (chunks), GOP alignments for old clients, DAG generation, cache data
DAG scheduler: split a DAG graph into stages of tasks and put them in the task queue
Task workers: run the tasks defined in DAG (watermark, encoder, thumbnail, merger)
Temporary storage: metadata in memory, video / audio in blob storage
Encoded video: final output of the encoding pipeline (funny_720p.mp4)
System optimizations
Speed optimization: parallelize video uploading (split video into smaller chunks by GOP alignment)
Speed optimization: place upload centers close to users
Speed optimization: parallelism everywhere
Safety optimization: pre-signed upload URL (access permission to the object identified in URL)
Safety optimization: protect your videos (Digital rights management, AES encryption, watermark)
Cost-saving optimization: only serve most popular videos from CDN (others from high capacity storage video servers), no need to store many encoded versions for less popular content, short videos can be encoded on-demand. Some videos are only popular in certain regions, no need to distribute to other areas. Build your own CDN like Netflix and partner with Internet Service Provider (ISP)
Error handling (recoverable vs non-recoverable – malformed video format)
Upload error: retry a few times
Split video error: the entire video is passed to the server
Transcoding error: retry
Preprocessor error: regenerate DAG diagram
DAG scheduler error: reschedule a task
Resource manager queue down: use a replica
Task workers down: retry the task on a new worker
API server down: direct requests to a different server
Metadata cache down: access other nodes to fetch data (bring up a new server to replace the dead one)
Metadata DB down: promote one slave to act as the new master if master is down, use another slave and bring up another to replace if slave down
Wrap up
Scale the API tier: keep stateless for API servers, easy to scale horizontally
Scale the DB: DB replication and sharding
Live streaming: diff streaming protocol (low latency), lower requirement for parallelism, diff error handling
Video takedowns: violate copyrights, pornography or other legal acts (report => remove)
Chapter 15 – Design a Google Drive
Requirements: add files, download files, sync files across multi devices, see file versions, share files, send notification (when a file is edited, deleted, shared)
Reliability / Fast sync speed / Bandwidth usage / Scalability / High availability
High-level design
A web server to upload and download files
A database to keep track of metadata (user data, login info, files info, etc)
A storage system to store files
1 upload a file to Google Drive (simple / resume upload)
User: A user uses the application either through a browser or mobile app
Block servers: Block servers upload blocks to cloud storage. Block storage, referred to as block-level storage, is a technology to store data files on cloud-based environments. A file can be split into several blocks, each with a unique hash value, stored in our metadata database. Each block is treated as an independent object and stored in our storage system (S3). To reconstruct a file, blocks are joined in a particular order. As for the block size, we use Dropbox as a reference: it sets the maximal size of a block to 4MB.
Cloud storage: A file is split into smaller blocks and stored in cloud storage
Cold storage: a computer system designed for storing inactive data, meaning files are not accessed for a long time
Load balancer: a load balancer evenly distributes requests among API servers.
API servers: responsible for almost everything other than the uploading flow. (authentication, managing user profile, updating file metadata, etc)
Metadata database: store metadata of users, files, blocks, versions, etc.
Metadata cache: some of the metadata are cached for fast retrieval
Notification service: a publisher / subscriber system that allows data to be transferred from notification service to clients as certain events happen
Offline backup queue: if a client is offline and can’t pull the latest file changes, the offline backup queue stores the info so changes will be synced when the client is online
File version: id, file_id, device_id, version_number, last modified
Add file metadata
1 Client 1 sends a request to add the metadata of the new file
2 Store the new file metadata in a metadata DB and change the file upload status to “pending”
3 Notify the notification service that a new file is being added
4 The Notification service notifies relevant clients (client 2) that a file is being uploaded
Upload files to cloud storage
2.1 Client 1 uploads the content of file to block servers
2.2 Block servers chunk the files into blocks, compress, encrypt the blocks, and upload them to cloud storage
2.3 Once the file is uploaded, cloud storage triggers upload completion callback. The request is sent to API servers
2.4 File status changed to “uploaded” in Metadata DB
2.5 Notify the notification service that a file status is changed to “uploaded”
2.6 The notification service notifies relevant clients (client 2) that a file is fully uploaded
Download flow
1 Notification service informs client 2 that a file is changed somewhere else
2 Once client 2 knows that new updates are available, it sends requests to fetch metadata
3 API servers call metadata DB to fetch metadata of the changes
4 Metadata is returned to the API servers
5 Client 2 gets the metadata
6 Once the client receives the metadata, it sends requests to block servers to download blocks
7 Block servers first download blocks from cloud storage
8 Cloud storage returns blocks to the block servers
9 Client 2 downloads all the new blocks to reconstruct the file
Notification service
Dropbox uses long polling (WebSocket is suited for real-time bi-directional communication such as chat app)
Save storage space
De-duplicate data blocks / Adopt an intelligent data backup strategy (set a limit for number of versions to store, keep valuable versions only) / Moving infrequently used data to cold storage
Failure handling
Load balancer / Block server / Cloud storage / API server / Metadata cache & DB / Notification service / Offline backup queue failure
Wrap up
If upload files directly to cloud storage from the client instead of going through block servers, it has a few drawbacks:
1 the same chunking, compression and encryption logic must be implemented on different platforms (iOS, Android, Web). It’s error-prone and requires a lot of engineering effort
2 A client can easily be hacked or manipulated, implementing encrypting logic on the client side is not ideal
DISCOVERING SHORTCUT KEYS => Ctrl/Cmd + Shift + L to open a widget that shows all the shortcut keys.
CONTENT ASSIST => Ctrl + Space to see a list of suggested completions. Typing one or more characters before clicking Ctrl + Space will shorten the list.
PARAMETER HINT => Ctrl + Shift + Space to see a list of parameter hints.
Search File => Cmd/Ctrl + Shift + R
Run Server
Tomcat
3. Git
git reset (undo)
git checkout -b yli origin/yli (create branch yli to track on origin/yli)
I read a blog which records the process to write a React like framework from scratch, it is really good experience to learn the concepts in nutshell like this way, I share my notes as following.
1. React uses JSX but it is transpiled as vanilla Javascript in the core, createElement() is utilized to achieve similar effect.