Introduction to Cloud Computing 彭波 北京大学信息科学技术学院 5/25/2009
大纲 What is Cloud Computing? Build a big cloud
云计算 (Cloud Computing)
What is Cloud Computing? 1. First write down your own opinion about “cloud computing”, whatever you thought about in your mind. 2. Question: What ? Who? Why? How? Pros and cons? 3. The most important question is: What is the relation with me?
Cloud Computing is… No software access everywhere by Internet power -- Large-scale data processing Appeal for startups Cost efficiency 实在是太方便了 Software as platform Cons Security Data lock-in SaaS PaaS Utility Computing SaaS PaaS Utility Computing
Software as a Service (SaaS) a model of software deployment whereby a provider licenses an application to customers for use as a service on demand.software deployment
Platform as a Service (PaaS) 对于开发 Web Application 和 Services , PaaS 提供了一 整套基于 Internet 的,从开发,测试,部署,运营到维护 的全方位的集成环境。特别它从一开始就具备了 Multi- tenant architecture ,用户不需要考虑多用户并发的问题, 而由 platform 来解决,包括并发管理,扩展性,失效恢复, 安全。
Utility Computing “pay-as-you-go” 好比让用户把电源插头插在墙上,你得 到的电压和 Microsoft 得到的一样,只是你用得少, pay less ; utility computing 的目标就是让计算资源也具有这 样的服务能力,用户可以使用 500 强公司所拥有的计算资 源,只是 use less pay less 。这是 cloud computing 的一 个重要方面
Cloud Computing is…
Key Characteristics illusion of infinite computing resources available on demand; elimination of an up-front commitment by Cloud users; 创业启动花费 ability to pay for use of computing resources on a short-term basis as needed 。 小时间片的 billing ,报告指 出 utility computing 在这一 点上的实践是失败的 very large datacenters large-scale software infrastructure operational expertise
Why now? very large-scale datacenter 的实践, 因为新的技术趋势和 Business 模式 pay-as-you-go computing
Key Players Amazon Web Services Google App Engine Microsoft Windows Azure
Key Applications Mobile Interactive applications, Tim O’Reilly 相信未来是 属于能够实时对用户提供信息的服务。 Mobile 必定是关键。 而后台在 datacenter 中运行是很自然的模式,特别是那些 mashup 融合类型的服务。 Parallel batch processing 。大规模数据处理使用 Cloud Computing 技术很自然, MapReduce , Hadoop 在这里起 到重要作用。这里,数据移入 / 移出 cloud 是很大的开销, Amazon 开始尝试 host large public datasets for free 。 The rise of analytics 。数据库应用中 transaction based 应 用还在增长,而 analytics 的应用增长迅速。数据挖掘,用 户行为分析等应用的巨大推动。 Extension of compute-intensive desktop application 。计 算密集型的任务,说 matlab, mathematica 都有了 cloud computing 的扩展, woo~
Cloud Computing = Silver Bullet? Google 文档在 3 月 7 日发生 了大批用户文件外泄事件。 美国隐私保护组织就此提 请政府对 Google 采取措施, 使其加强云计算产品的安 全性。 Problem of Data Lock-in
Challenges
Some other Voices It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true. Richard Stallman, quoted in The Guardian, September 29, 2008 It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true. Richard Stallman, quoted in The Guardian, September 29, 2008 The interesting thing about Cloud Computing is that we’ve redefined Cloud Computing to include everything that we already do.... I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads. Larry Ellison, quoted in the Wall Street Journal, September 26, 2008 The interesting thing about Cloud Computing is that we’ve redefined Cloud Computing to include everything that we already do.... I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads. Larry Ellison, quoted in the Wall Street Journal, September 26, 2008
What’s matter with ME?! What you want to do with 1000pcs, or even 100,000 pcs?
Cloud is coming…
Build a big “Cloud”
Example: Wikipedia Anthropology Experiment Download entire revision history of Wikipedia 4.7 M pages, 58 M revisions, 800 GB Analyze editing patterns & trends Computation Hadoop on 20-machine cluster Kittur, Suh, Pendleton (UCLA, PARC), “He Says, She Says: Conflict and Coordination in Wikipedia” CHI, 2007 Increasing fraction of edits are for work indirectly related to articles
Example: Scene Completion Image Database Grouped by Semantic Content 30 different Flickr.com groups 2.3 M images total (396 GB). Select Candidate Images Most Suitable for Filling Hole Classify images with gist scene detector [Torralba] Color similarity Local context matching Computation Index images offline 50 min. scene matching, 20 min. local matching, 4 min. compositing Reduces to 5 minutes total by using 5 machines Extension Flickr.com has over 500 million images … Hays, Efros (CMU), “Scene Completion Using Millions of Photographs” SIGGRAPH, 2007
Example: Web Page Analysis Experiment Use web crawler to gather 151M HTML pages weekly 11 times Generated 1.2 TB log information Analyze page statistics and change frequencies Systems Challenge “ Moreover, we experienced a catastrophic disk failure during the third crawl, causing us to lose a quarter of the logs of that crawl. ” Fetterly, Manasse, Najork, Wiener (Microsoft, HP), “A Large-Scale Study of the Evolution of Web Pages,” Software-Practice & Experience, 2004
Let’s build a big Computer… Given datacenter with tens of thousands of pcs, can you make all these tasks easier and run faster? Software infrastructure 的 关键部件是? Distributed storage system Distributed Computing Framework
Challenges 大规模数据处理面临的困难 大规模 PC 机群 scaling reliably is hard! On 1000s of nodes MTBF < 1 day With so many disks, nodes, switches something is always broken 并行 / 分布式程序开发,调试 is hard! 数据如何划分 任务如何调度 任务之间的通信 错误处理,容错 … Programming Model 一定的表达能力 很好的简单易用性 Programming Model 一定的表达能力 很好的简单易用性 Storage System & Computing Framework 良好可扩展性 良好的容错能力 Storage System & Computing Framework 良好可扩展性 良好的容错能力
Observation: When dealing with very large data collections, following a simple client-server approach is not going to work. Solution 1: For speeding up file accesses, apply striping techniques by which files can be fetched in parallel: (a) whole-file distribution, (b) file-striped system Cluster-Based Distributed File Systems
A natural DFS design File stripping as Chunks
Master of DFS 功能 元数据管理 inode: file -> 运行数据管理 Chunk server info 管理: map(chunk, chunkserver) Client info 管理 : locks, open files, etc. 问题 Performance bottleneck? Master failure? Master Recovery?
ChunkServer of DFS 功能 管理 chunk data: chunkid -> local file 问题 Performance bottleneck? Chunkserver failure -> data lost?
Review on DFS design Workload 大数据 顺序读和 append 操作为主 Goal Reliability, availability, scalability… Tolerance to hardware failures Managing numerous files of large size Optimizing commonly performed operations Strategies Chunk Replications (fault tolerance and performance) Large chunk size (MB) All metadata in memory on Master, with operation log
Master Client Chunkserver Data Replications in DFS /foo/bar.dat
Data Mutations Two kinds of data mutations are supported Random writes Record appends Leases used to maintain consistent mutation order A B A B A B Chunk Replica
Primary-based Consistency Protocol Master Chunkserver Client /foo/bar.dat Primary replica Secondary replica What if a mutation operation fail in the middle? What if a mutation operation fail in the middle?
Relaxed Consistency Model 修改操作后的文件区域状态 Consistent 不管从那个 replicas 读,所有 clients 看到相同数据 Defined consistent + 所有 clients 看到更新操作写入的全部数 据 Undefined consistent + 但是可能不能反映任意一个更新操作写 入的数据 Inconsistent Clients 不同时间看到不同的数据
Consistency Model (contd) 不提供完全严格的一致性 [3] 由应用程序处理这种放宽的一致性下出现的 inconsistent 数据区域问题 提供 atomic append ,保证 append at least once
Summary for DFS Architecture: master-worker File strip : large chunk size Scalability & Availability: Chunk replication Primary-based consistency protocol Relaxed consistency model
Distributed Computing 大规模机群 + 可靠存储( DFS )上怎样计算? 编程 运行 调试
Example: Web Page Analysis Experiment Use web crawler to gather 151M HTML pages weekly 11 times Generated 1.2 TB log information Analyze page statistics and change frequencies Systems Challenge “ Moreover, we experienced a catastrophic disk failure during the third crawl, causing us to lose a quarter of the logs of that crawl. ” Fetterly, Manasse, Najork, Wiener (Microsoft, HP), “A Large-Scale Study of the Evolution of Web Pages,” Software-Practice & Experience, 2004
A simple solution M :提取网页长度,按 domain 执行数据合并
A possible solution M: 提取网页长度, 按 domain 执行数据合并 R: 按 domain 执行数据合并
A More difficult Problem 统计文档集中每个 word 出现的次数 ?
Shuffle Implementation
Partition and Sort Group Partition function: hash(key)%reducer number Group function: sort by key
A Distributed Computing Framework Parallel/Distributed Computing Programming Model Input split shuffleoutput I’m the MapReduce Framework I’m the MapReduce Framework
Typical problem solved by MapReduce 读入数据 : key/value 对的记录格式数据 Map: 从每个记录里 extract something map (in_key, in_value) -> list(out_key, intermediate_value) 处理 input key/value pair 输出中间结果 key/value pairs Shuffle: 混排交换数据 把相同 key 的中间结果汇集到相同节点上 Reduce: aggregate, summarize, filter, etc. reduce (out_key, list(intermediate_value)) -> list(out_value) 归并某一个 key 的所有 values ,进行计算 输出合并的计算结果 (usually just one) 输出结果
Mapreduce Framework
Word Frequencies in Web pages 输入: one document per record 用户实现 map function ,输入为 key = document URL value = document contents map 输出 (potentially many) key/value pairs. 对 document 中每一个出现的词,输出一个记录
Example continued: MapReduce 运行系统 ( 库 ) 把所有相同 key 的记录收集到一 起 (shuffle/sort) 用户实现 reduce function 对一个 key 对应的 values 计算 求和 sum Reduce 输出
Example uses: distributed grep distributed sort web link-graph reversal term-vector / hostweb access log statsinverted index construction document clusteringmachine learningstatistical machine translation... Model is Widely Applicable MapReduce Programs In Google Source Tree
Algorithms Fit in MapReduce 文献中见到实现了的算法 K-Means, EM, SVM, PCA, Linear Regression, Naïve Bayes, Logistic Regression, Neural Network PageRank Word Co-occurrence Matrices , Pairwise Document Similarity Monte Carlo simulation ……
Capability of MapReduce MapReduce 难于有效实现的并行算法 [2] Dense/Sparse Linear Algebra N-Body Problems Dynamic Programming Graph Traversal Combinational Logic 。。。 MapReduce 是否可能成为 解决大部分并行计算需求的主要手段? MapReduce 是否可能成为 解决大部分并行计算需求的主要手段? "The landscape of parallel computing research: a view from Berkeley," 2006
Google MapReduce Architecture Single Master nodeMany worker bees
MapReduce Operation Initial data split into 64MB blocks Computed, results locally stored M sends data location to R workers Final output written Master informed of result locations
Fault Tolerance 通过 re-execution 实现 fault tolerance 周期性 heartbeats 检测 failure Re-execute 失效节点上已经完成 + 正在执行的 map tasks Why???? Re-execute 失效节点上正在执行的 reduce tasks Task completion committed through master Robust: lost 1600/1800 machines once finished ok Master Failure?
Refinement: Redundant Execution Slow workers significantly delay completion time Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowly Solution: Near end of phase, spawn backup tasks Whichever one finishes first "wins" Dramatically shortens job completion time
Refinement: Locality Optimization Master scheduling policy: Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect Thousands of machines read input at local disk speed Without this, rack switches limit read rate
Refinement: Skipping Bad Records Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix Not always possible ~ third-party source libraries On segmentation fault: Send UDP packet to master from signal handler Include sequence number of record being processed If master sees two failures for same record: Next worker is told to skip the record
Compression of intermediate data Combiner “ Combiner ” functions can run on same machine as a mapper Causes a mini-reduce phase to occur before the real reduce phase, to save bandwidth Local execution for debugging/testing User-defined counters Other Refinements
Summary CloudComputing brings Possible of using unlimited resources on-demand, and by anytime and anywhere Possible of construct and deploy applications automatically scale to tens of thousands computers Possible of construct and run programs dealing with prodigious volume of data … How to make it real? Distributed File System Distributed Computing Framework …………………………………
Q&A
参考文献 [1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "Above the Clouds: A Berkeley View of Cloud Computing," EECS Department, University of California, Berkeley UCB/EECS , February [2] Ucb/Eecs, K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick, "The landscape of parallel computing research: a view from Berkeley," [3] G. Sanjay, G. Howard, and L. Shun-Tak, "The Google file system," in Proceedings of the nineteenth ACM symposium on Operating systems principles. Bolton Landing, NY, USA: ACM Press, [4] J. D. a. S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in Osdi, 2004, pp
Google App Engine App Engine handles HTTP(S) requests, nothing else Think RPC: request in, processing, response out Works well for the web and AJAX; also for other services App configuration is dead simple No performance tuning needed Everything is built to scale “infinite” number of apps, requests/sec, storage capacity APIs are simple, stupid
App Engine Architecture 63 Python VM process stdlib app memcache datastore mail images urlfech stateful APIs stateless APIsR/O FS req/resp
Microsoft Windows Azure
Amazon Web Services Amazon’s infrastructure (auto scaling, load balancing) Elastic Compute Cloud (EC2) – scalable virtual private server instances Simple Storage Service (S3) Simple Queue Service (SQS) – messaging SimpleDB - database Flexible Payments Service, Mechanical Turk, CloudFront, etc.
Amazon Web Services Very flexible, lower-level offering (closer to hardware) = more possibilities, higher performing Runs platform you provide (machine images) Supports all major web languages Industry-standard services (move off AWS easily) Require much more work, longer time-to-market Deployment scripts, configuring images, etc. Various libraries and GUI plug-ins make AWS do help
Price of Amazon EC2