Google OS

Description

Anatomy of Google Service Platform

Reviews
Shared by: Piyush Bakshi
Stats
views:
439
rating:
not rated
reviews:
0
posted:
9/2/2008
language:
UNKNOWN
pages:
0
Anatomy of Google Service Platform March 2, 2007 Jaesun Han (jshan0000@gmail.com) Contact : http://www.web2hub.com Contents Overview of Web 2.0 Technologies Google Service Platform Google File System(GFS) Bigtable MapReduce Chubby Case Study: Google Services over Platform Google Analytics Google Earth Personalized Search Web 2.0 Technology Map Web 2.0 Technology Layer Client Layer Front-End Layer Data Processing Layer Raw Data External Data Source Distributed/Parallel Processing Platform Layer Distributed Storage Distributed File System Processed Data DB XHTML, CSS, PHP, Python, RSS,Atom, Microformats, Ruby, RoR, OpenAPI, RIA Dojo, DWR, REST,JSON, (Ajax, Flex, Atlas, GWT, SOAP, XUL, XAML, Apache, Mashup Gadget) PHP, MySQL Recommendation (Collaborative Filtering) Ranking, Clustering, Data mining, Personalization, Social Network Analysis Cluster Computing, Beowulf, Grid, Globus, Condor, P2P, DHT, MPI, Utility Computing, Virtualization, Autonomous Computing Cluster Management Google Service Platform Platform Layer Client Layer Front-End Layer Data Processing Layer Raw Data External Data Source Distributed/Parallel Processing Platform Layer Distributed Storage Distributed File System Processed Data DB XHTML, CSS, PHP, Python, RSS,Atom, Microformats, Ruby, RoR, OpenAPI, RIA Dojo, DWR, REST,JSON, (Ajax, Flex, Atlas, GWT, SOAP, XUL, XAML, Apache, Mashup Gadget) PHP, MySQL Recommendation (Collaborative Filtering) Ranking, Clustering, Data mining, Personalization, Social Network Cluster Computing, Beowulf, Grid, Globus, Condor, P2P, DHT, MPI, Utility Computing, Virtualization, Autonomous Computing Cluster Management Google Service Platform Services Service Library Google OS Google Service Platform Service Software 기술 Search engine, Email server, IM server, Map database, Various Web sites … System Software 기술 Google Linux, Google File System, MapReduce Library, Chubby, BigTable Intelligent System, Programming Model(River, TACC), Replication/Redundancy … Hardware 기술 Google Cluster Clusters, Geographic distribution, Automated Setup, Automated Backup, Standard components, Commodity drives, Flexible co-location, Easy-access design … • 450,000 or more servers (NYT) • All PC servers less than $1,000 • 40 or more pizza box servers per rack Advantages • Easy Development • Scalability • Robustness Google Service Platform Computation Distributed data processing library MapReduce (OSDI 2006) Bigtable (OSDI 2006) Distributed storage system for structured data Storage Distributed File System (OSDI 2006) Distributed Lock Manager GFS (SOSP 2003) Chubby GFS GFS: Overview Scalable distributed file system for large distributed data-intensive applications Running on inexpensive commodity hardware Delivering high aggregate performance to a large number of clients Features user-level distributed file system centralized architecture metadata: client <-> a single master data: client <-> chunkservers 64MB fixed large chunk size non-standard file system interface (not POSIX API) create, delete, open, close, read, write, snapshot and record append three replicas of a chunk no client caching (but caching metadata like chunk location) GFS: Architecture lookup table map In-Memory Data Structure • file and chunk namespaces • mapping from files to chunks • chunk locations operation log File creation/deletion File renaming Chunk addition/deletion Separation of control flow and data flow File read/write GFS: Write pipelined data delivery to fully utilize each machine’s network bandwidth primary and replicas locations primary lease (initial timeout=60s) ordering write requests for the same chunk GFS: Relaxed Consistency write from 2 chunk1 A B write from 4 0 1 2 3 4 5 6 7 chunk2 replicas of chunk2 case1 (chunk2: B -> A) 0 1 2 3 4 5 6 7 Undefined case2 (chunk2: A -> B) 0 1 2 3 4 5 6 7 Consistent GFS: Atomic Record Appends write from 1 chunk1 0 1 2 3 Record Append chunk1 A B 0 1 2 3 A B write from 2 0 1 2 3 0 1 2 3 max record size : ¼ of the max chunk size ( padding) error in writing B case1 (B -> A) case1 (B -> A) 0 1 2 3 replicas of chunk1 case2 (A -> B) case2 (A -> B) 0 1 2 3 writing at the exact offset Bigtable Bigtable: Overview Motivation Lots of structured and semi-structured data web crawl data, satellite image, user data, email, … No commercial system big enough Bigtable Distributed storage system for structured data A sparse, distributed, persistent multi-dimensional sorted map Goals wide applicability, scalability, high performance, and high availability Target workloads from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end-uses Applications more than 60 Google products and projects (Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth) Bigtable: Data Model Table Column Family: the basic unit of access control Timestamp (Column Family:Qualifier) Column Key Tablet Row Key Atomic read/write for a single row key Tablets the unit of distribution and load balancing Indexing (row:string, column:string, time:int64) string (com.cnn.www, anchor:my.look.ca, t8) “CNN.com” Tablet com.web2hub.www Bigtable: API // Open the table Table *T = OpenOrDie(“/bigtable/web/webtable”); // Write a new anchor and delete an old anchor RowMutation r1(T, “com.cnn.www”); r1.Set(“anchor:www.c-span.org”, “CNN”); r1.Delete(“anchor:www.abc.com”); Operation op; Apply(&op, &r1) Scanner scanner(T); ScanStream *stream; stream = scanner.FetchColumnFamily(“anchor”); stream->SetReturnAllVersions(); scanner.Lookup(“com.cnn.www”); for (; !stream->Done(); stream->Next()) { printf(“%s %s %lld %s\n”, scanner.RowName(), stream->ColumnName(), stream->MicroTimestamp(), stream->Value()); } Writing to Bigtable Metadata operations Reading from Bigtable Create/delete tables and column families, change metadata Several other features of API single-row transactions: atomic read-modify-write sequences execution of client-supplied scripts (written in Sawzall) Bigtable: SSTable Used internally to store Bigtable data Immutable, sorted file of key-value pairs data blocks + an index block size is 64KB, but configurable an index is used to locate blocks loaded into memory when the SSTable is opened key-value 64K block 64K block 64K block index Bigtable: Tablet & Locality Group Locality Group1 contents Locality Group2 anchor Locality Group3 language checksum Tablet com.cnn.www (abc.html ~ help.html) (100~200MB) SSTable1 (100MB) SSTable2 (50MB) SSTable3 (30MB) GFS chunks 64MB 64MB 64MB 64MB Bigtable: Tablet Location Features Three-level hierarchy (served on tablet servers, not the master) client library’s caching and prefetching of tablet locations row key (tablet’s table id + its end row) tablet location ex) webtable:com.cnn.www 128MB = 217x1KB row Addressing 234 tablets Bigtable: Tablet Assignment Cluster Management System Tablet Servers new exclusive lock /servers/tab_svr10 1) start a server Tablet Server (tab_svr10) 8) 9) reassign ac de unassigned a let qui k e 4) t e re loc tablets ea m th an cr he on e t 2) lo c d ito re i u k r/ cq a se 3) rv er s 5) assign tablets Bigtable k oc l Chubby 6) check lock status Master 7) failure or losing lock Bigtable: Master Failure Tablet Changes Create Delete Merge Split initiated by master initiated by tablet server master lock /servers/master 5) reassign unassigned Chubby tablets 1) 3) check ac m assigned as qu te ire 2) & rl tablets oc ge sca k tl n/ ive se se rve rv rs er l is Bigtable t 4) scan METADATA tablets Tablet Servers Master 0) start a master Bigtable: Read/Write a single commit log per tablet server anchor v4.0 read on a merged view memtable (sorted buffer) anchor:www.abc.com ABC anchor:www.abc.com null anchor:www.c-span.org CNN anchor v3.0 t1: Set(“anchor:www.c-span.org”, “CNN”) t2: Delete(“anchor:www.abc.com”) t3: Set(“anchor:www.abc.com”, “ABC”) anchor v2.0 anchor v1.0 Fast writing: mutation is logged in memory Efficient reading: a merged view of sorted data structures Bigtable: Compactions v5.0 v4.0 v3.0 v2.0 v1.0 memtable minor compaction A new SSTable v6.0 memtable + all SSTables Only one SSTable major compaction MapReduce MapReduce: Overview Motivation Input data is large Lots of machines: hundreds of thousands of PC servers MapReduce Programming model and implementation for parallel processing large data sets parallelization, fault-tolerance, data distribution, and load balancing in a MapReduce library map & reduce functions map (k1, v1) list (k2, v2) list (v2) reduce (k2, list (v2)) Usage Examples Distributed Grep, Count of URL Access Frequency, Reverse Web-Link Graph, Term-Vector per-Host, Inverted Index, Distributed Sort MapReduce: Data Processing Flow MapReduce: Architecture Other MapReduce Programs (0) split input files (k1,v1) list(k2,v2) notifying global writing (hash(key) mod R) (k2,list(v2)) list(v2) partitioning (over GFS) (over GFS) MapReduce: Code Example class WordCounter : public Mapper { public: virtual void Map(const MapInput& input) { const string& text = input.value(); const int n = text.size(); for (int i = 0; i < n; ) { while ((i < n) && isspace(text[i])) i++; int start = i; while ((i < n) && !isspace(text[i])) i++; if (start < i) Emit(text.substr(start, i-start), “1”); }}} REGISTER_MAPPER(WordCounter); class Adder : public Reducer { virtual void Reduce(ReduceInput* input) { int64 value = 0; while (!input->done()) { value += StringToInt(input->value()); input->NextValue(); } Emit(IntToString(value)); }} REGISTER_REDUCER(Adder); int main(int argc, char** argv) { ParseCommandLineFlags(argc, argv); MapReduceSpecification spec; for (int i = 1; i < argc; i++) { MapReduceInput* input = spec.add_input(); input->set_format(“text”); input->set_filepattern(argv[i]); input->set_mapper_class(“WordCounter”); MapReduceOutput* out = spec.output(); out->set_filebase(“/gfs/test/freq”); out->set_num_tasks(100); out->set_format(“text”); out->set_reducer_class(“Adder”); out->set_combiner_class(“Adder”); spec.set_machines(2000); spec.set_map_megabytes(100); spec.set_reduce_megabytes(100); MapReduceResult result; if (!MapReduce(spec, &result)) abort(); } MapReduce: Fault-tolerance Worker Failure Re-execution of workers (map or reduce task) Completed map tasks (local disk) Completed reduce tasks (GFS) re-executed no need of re-execution Master Failure Periodic checkpointing of the master data structure re-execution Semantics in the Presence of Failure Guarantee atomic commits of map and reduce task outputs map output: by master’s confirm reduce output: by atomic rename operation of GFS Chubby Chubby: Overview Distributed lock service Target: loosely-coupled distributed system moderately large number of small machines connected by a highspeed network Goals reliability, availability, and easy-to-understand semantics throughput and storage capacity are considered secondary Similar to a simple file system, but different from whole-file read/write augmented with advisory locks and with event notifications Usage in both GFS and Bigtable for master election for discovering servers and finding the master as a well-known location to store a small amount of metadata as the root of their distributed data structures Chubby: System structure Bigtable master, tablet servers GFS master, chunkservers … simple database replicas list distributed consensus protocol • master election • database update DNS server Chubby: Interface Similar to a file system interface Example) /ls/datacenter000/servers/svr_10980 /ls: stand for lock service /datacenter000: Chubby cell’s name /local: client’s local Chubby cell /global: global Chubby cell /servers/svr_10980: interpreted within the named Chubby cell Node(file & directory)’s metadata three ACL filenames (reading, writing and changing ACL names) four monotonically increasing 64-bit numbers an instance number, a content generation number, a lock generation number, an ACL generation number Handles returned when clients open nodes includes check digits, a sequence number, mode information Chubby: Global cell global cell /ls/global /ls/cellname local cell subtree /ls/global/master is mirrored to subtree /ls/cell/slave • • • • Chubby’s own ACLs Advertisement of presence to monitoring services Pointers to allow clients to locate large data sets such as Bigtable cells many configuration files for other systems Chubby: API APIs Open(), Close(), Poison() GetContentsAndStat(), GetStat(), ReadDir() contents and metadata read atomically and in entirety SetContents(), SetACL() written atomically and in entirety Delete() Acquire(), TryAcquire(), Release() GetSequencer(), SetSequencer(), CheckSequencer() Usage example: primary election All potential primaries Open() and Acquire() The primary SetContents() : write its identity All replicas event notified and GetContentsAndStat() Chubby: Database & Backup Database Implementation The first version: replicated version of Berkeley DB Now: writing a simple database write ahead logging, snapshotting and atomic operations Backup Every few hours, the master writes a snapshot of its DB to a GFS file server in a different building Usage disaster recovery initializing the DB of a newly replaced replica Overall View Bird’s View Revisited Batch Clients C++ Sawzall Java Python Runtime Clients Cluster Management System MapReduce Workqueue (Scheduler) Bigtable Client Interface Chubby DB GFS (Linux File System, Multithreading) Google OS Server Process View Cluster Management System Chubby Cell A Single Server worker pool Global Scheduler Local Scheduler M M R Computation (MapReduce) Tablet Bigtable Master GFS Master Tablet Server SSTable (LG1) SSTable (LG2) Database SSTable (Bigtable) (LG3) Chunkserver chunks Storage (GFS) Google Services over Google Platform Google Analytics Embedded JavaScript Google Analytics raw click table(~200TB) row column tuple(URL,time) session info com.abc.www:0001 com.abc.www:0027 com.abc.www:0050 summary table(~20TB) row column website com.abc.www summary … … … … tablet:com.abc.www (a.html~o.html) SSTable(GFS file) tablet:com.abc.www (p.html~z.html) SSTable(GFS file) value:each session info Map analyzing Map key:website’s URL value:analyzed info Reduce aggregating Google Earth preprocessing & consolidating Google Earth imagery table(~70TB, CF:8, LG:3) column row geographic segment (x1,y1),(x2,y2) index table(~500GB , CF:7, LG:2) column row geographic segment (x1,y1),(x2,y2) image sources final images tablet:(x0,y0),(x4,y4) SSTable(GFS file) value:image source key:segment value:final image Map preprocessing Reduce consolidating & indexing GFS (final images) GFS (raw images) Personalized Search user histories (web queries, click URLs, search keywords, …) User Profile Personalized Search row userid jaesun_han jisoo1004 jk_tong user table(~4TB, CF:93, LG:11) column search web queries keywords click URLs user profile tablet:ja ~ jn value:web queries Map SSTable(GFS file) key:userid value:user history analyzing Map value:click URLs Reduce generating profile Q&A

Related docs
Google Chrome OS Download
Views: 23  |  Downloads: 1
Google Chrome OS: New Operating System by Google
Views: 705  |  Downloads: 7
Google Chrome OS early screenshots
Views: 102  |  Downloads: 4
Google Os
Views: 42  |  Downloads: 0
Os Milhões do Google! Post2PDF
Views: 0  |  Downloads: 0
Learn Google
Views: 106  |  Downloads: 11
Revolution_OS
Views: 6  |  Downloads: 0
Mobile OS Landscape
Views: 45  |  Downloads: 3
Google Gears
Views: 657  |  Downloads: 11
Google Analytics
Views: 1326  |  Downloads: 99
premium docs
Other docs by Piyush Bakshi
Black Widow Spider
Views: 654  |  Downloads: 3
federal holidays
Views: 1624  |  Downloads: 3
Rodeo Pictures
Views: 1113  |  Downloads: 11
Major League Baseball Teams
Views: 587  |  Downloads: 1
Kelli Finglass
Views: 1183  |  Downloads: 1
Jill Ireland
Views: 608  |  Downloads: 0
Falcon Aircraft
Views: 520  |  Downloads: 5
Quotes
Views: 612  |  Downloads: 10
Paris Bennett
Views: 551  |  Downloads: 0
Natasha Leggero
Views: 360  |  Downloads: 0
Island
Views: 236  |  Downloads: 5
Iceland
Views: 277  |  Downloads: 4
Dasara
Views: 262  |  Downloads: 3
Prussia
Views: 90  |  Downloads: 0
Mediterranean Sea
Views: 117  |  Downloads: 4