How to design GFS/BigTable/MapReduce


How to design GFS/BigTable/MapReduce

Foundation of Big Data Age
Interviewer: design search engine
How to read a paper? Find a suitable solution under your scenario instead of recite details
What is the layers of search engine system?
Layers of system
Application 1 / Application 2 / Application 3
Algorithm (MapReduce)
Data model (BigTable)
File system (GFS)
Let’s design from bottom up
Google File System
How to save a file? (how to read or write data from or into file?)
What’s the core idea? (如果删除了什么就不是它自己,就像文件系统,如果去掉数据读写功能,就不是文件系统了)
Key point
1 block = 1024 Byte
Faster to search (why separate file info and data block)
What’s the challenge? (tons of data) => evolve
How to save a big file? (index will explode)
Key point
1 chunk = 64MB = 64 * 1024 = 65,536 blocks (larger chunk)
Reduced size of metadata
Reduce traffic
Waste space for small files
How to save an extra-large file?
a single machine cannot save the file
always ask why?
read high scaling blogs, make up logic behind
System Design goal? 如何系统性地设计
Big Table
How to support lookup and range query on a file?
How to save a large table?
Big Table (memory intensive) + GFS (disk intensive)
Divide + Assemble
Input -> Split -> Map -> Shuffle -> Reduce -> Finalize
How to compare MapReduce vs Spark? Is MapReduce less popular than Spark? Do I need to study Spark? How to follow up new tech?
MapReduce is more about idea, Spark is about framework and implementation, faster and more powerful, that’s why more and more people use Spark
Donate $5 to me for a coffee with PayPal and read more professional and interesting technical blog articles about web and mobile development. Feel free to visit my web app, WhizWallet, to apply for credit, store or gift cards, DealsPlus to browse daily deals and store coupons to save money.
Follow me @Yaoli0615 at Twitter to get latest tech updates.

Core Java Volume I–Fundamentals (10th Edition) (Core Series)

Core Java, Volume II–Advanced Features (10th Edition) (Core Series)

Test-Driven Java Development

Java Concurrency in Practice

Java: An Introduction to Problem Solving and Programming (7th Edition)

Java 9 for Programmers (Deitel Developer Series)

Java SE8 for the Really Impatient: A Short Course on the Basics (Java Series)

Core Java for the Impatient

Java: The Beginners Guide for every non-programmer which will attend you trough your learning process

Java Deep Learning Essentials

Machine Learning in Java

Learning Reactive Programming With Java 8

Java 9 Programming By Example

Thinking in Java (4th Edition)

The Java EE Architect’s Handbook, Second Edition: How to be a successful application architect for Java EE applications

Java Artificial Intelligence: Made Easy, w/ Java Programming


About liyao13

Yao Li is a web and iOS developer, blogger and he has a passion for technology and business. In his blogs, he shares code snippets, tutorials, resources and notes to help people develop their skills. Donate $5 to him for a coffee with PayPal at About Me page and read more professional and interesting technical blog articles. Follow him @Yaoli0615 at Twitter to get latest tech updates.
This entry was posted in CS Research&Application, Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s