How to design GFS/BigTable/MapReduce
Foundation of Big Data Age
Interviewer: design search engine
How to read a paper? Find a suitable solution under your scenario instead of recite details
What is the layers of search engine system?
Layers of system
Application 1 / Application 2 / Application 3
Data model (BigTable)
File system (GFS)
Let’s design from bottom up
Google File System
How to save a file? (how to read or write data from or into file?)
What’s the core idea? （如果删除了什么就不是它自己，就像文件系统，如果去掉数据读写功能，就不是文件系统了）
1 block = 1024 Byte
Faster to search (why separate file info and data block)
What’s the challenge? (tons of data) => evolve
How to save a big file? (index will explode)
1 chunk = 64MB = 64 * 1024 = 65,536 blocks (larger chunk)
Reduced size of metadata
Waste space for small files
How to save an extra-large file?
a single machine cannot save the file
always ask why？
read high scaling blogs, make up logic behind
System Design goal? 如何系统性地设计
How to support lookup and range query on a file?
How to save a large table?
Big Table (memory intensive) + GFS (disk intensive)
Divide + Assemble
Input -> Split -> Map -> Shuffle -> Reduce -> Finalize
How to compare MapReduce vs Spark? Is MapReduce less popular than Spark? Do I need to study Spark? How to follow up new tech?
MapReduce is more about idea, Spark is about framework and implementation, faster and more powerful, that’s why more and more people use Spark
Donate $5 to me for a coffee with PayPal and read more professional and interesting technical blog articles about web and mobile development. Feel free to visit my web app, WhizWallet, to apply for credit, store or gift cards, DealsPlus to browse daily deals and store coupons to save money.
Follow me @Yaoli0615 at Twitter to get latest tech updates.