Introduction to Cloud Computing 彭波 北京大学信息科学技术学院 5/25/2009.

Slides:



Advertisements
Similar presentations
MMN Lab 未來教室與雲端化學習 Yueh-Min Huang Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan
Advertisements

Web Role 的每台虚机运行有 IIS ,用于处理 Web 请求 Worker Role 用于运行后台进程 Cloud Service 是什么? 支持多层架构的应用容器 由多个 Windows 虚拟机集群构成 集群有两种类型: Web 和 Worker Cloud Service 做什么 进行应用的自动化部署.
考研英语复试 口语准备 考研英语口语复试. 考研英语复试 口语准备 服装 谦虚、微笑、自信 态度积极 乐观沉稳.
13-1 人工智慧 13-2 雲端運算 13-3 感測網路與物聯網 13-4 生物資訊 13-5 計算機萬能嗎?
云计算辅助教学风云录 黎加厚 上海师范大学教育技术系 2010年8月9日.
云计算及安全 ——Cloud Computing & Cloud Security
DATE: 14/10/2009 陳威宇 格網技術組 雲端運算相關應用 (Based on Hadoop)
Big Data Ecosystem – Hadoop Distribution
2007年8月龙星课程 周源源老师课程体会 包云岗 中科院计算所
-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么?
Foundations of Computer Science
教育雲端科技的現況與未來發展 臺北市政府教育局聘任督學 韓長澤.
简化 IT,促进创新 — 为现代企业带来新生机
自衛消防編組任務職責 講 義 This template can be used as a starter file for presenting training materials in a group setting. Sections Right-click on a slide to add.
人工智能 Artificial Intelligence 第十一章
2013年 安徽高考研讨会 英 语.
台灣雲端運算應用實驗中心研發計畫 計 畫 期 間:自98年7月1日至99年6月30日止 執行單位名稱 :財團法人資訊工業策進會 國立中山大學.
第8章 系統架構.
HADOOP的高能物理分析平台 孙功星 高能物理研究所/计算中心
基于hadoop的数据仓库技术.
寻找适合您的工业4.0 Dell/曾峰.
大数据在医疗行业的应用.
Introduction to MapReduce
AaaS: ACL as a Service TEAM 2
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Unit 4 I used to be afraid of the dark.
Module 5 Shopping 第2课时.
YARN & MapReduce 2.0 Boyu Diao
Dave 云的未来: PaaS软件 Dave
高级软件工程 云计算 主讲:李祥 QQ: 年12月.
異質計算教學課程內容 「異質計算」種子教師研習營 洪士灝 國立台灣大學資訊工程學系
王耀聰 陳威宇 國家高速網路與計算中心(NCHC)
CHAPTER 6 認識MapReduce.
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Cloud Computing(雲端運算) 技術的現況與應用
Flash数据管理 Zhou da
崑山科技大學 曾 龍 資訊工程系系主任 數位生活研究所所長 雲端運算與資通安全研發中心主任
Decision Support System (靜宜資管楊子青)
创建型设计模式.
Cloud Computing Google云计算原理.
An Introduction to Cloud RDBMS
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
Lesson 44:Popular Sayings
大数据介绍及应用案例分享 2016年7月 华信咨询设计研究院有限公司.
Decision Support System (靜宜資管楊子青)
绩效管理.
Microsoft SQL Server 2008 報表服務_設計
IBM SWG Overall Introduction
TinyOS 石万兵 2019/4/6 mice.
資料庫 靜宜大學資管系 楊子青.
Version Control System Based DSNs
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
Real-Time System Software Group Lab 408 Wireless Networking and Embedded Systems Laboratory Virtualization, Parallelization, Service 實驗室主要是以系統軟體設計為主,
Guide to a successful PowerPoint design – simple is best
Unit 05 雲端分散式Hadoop實驗 -I M. S. Jian
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中央社新聞— <LTTC:台灣學生英語聽說提升 讀寫相對下降>
突出语篇语境,夯实词汇语法 一模试卷单选完形分析 及相应的二轮复习对策 永嘉罗浮中学 周晓媚.
高考应试作文写作训练 5. 正反观点对比.
The viewpoint (culture) [观点(文化)]
NASA雜談+電腦網路簡介 Prof. Michael Tsai 2015/03/02.
Distance Vector vs Link State
Outline Overview of this paper Motivation and Initialization
More About Auto-encoder
Distance Vector vs Link State Routing Protocols
11 Overview Cloud Computing 2012 NTHU. CS Che-Rung Lee
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
Experimental Analysis of Distributed Graph Systems
Section 1 Basic concepts of web page
Presentation transcript:

Introduction to Cloud Computing 彭波 北京大学信息科学技术学院 5/25/2009

大纲 What is Cloud Computing? Build a big cloud

云计算 (Cloud Computing)

What is Cloud Computing? 1. First write down your own opinion about “cloud computing”, whatever you thought about in your mind. 2. Question: What ? Who? Why? How? Pros and cons? 3. The most important question is: What is the relation with me?

Cloud Computing is… No software access everywhere by Internet power -- Large-scale data processing Appeal for startups Cost efficiency 实在是太方便了 Software as platform Cons Security Data lock-in SaaS PaaS Utility Computing SaaS PaaS Utility Computing

Software as a Service (SaaS) a model of software deployment whereby a provider licenses an application to customers for use as a service on demand.software deployment

Platform as a Service (PaaS) 对于开发 Web Application 和 Services , PaaS 提供了一 整套基于 Internet 的,从开发,测试,部署,运营到维护 的全方位的集成环境。特别它从一开始就具备了 Multi- tenant architecture ,用户不需要考虑多用户并发的问题, 而由 platform 来解决,包括并发管理,扩展性,失效恢复, 安全。

Utility Computing “pay-as-you-go” 好比让用户把电源插头插在墙上,你得 到的电压和 Microsoft 得到的一样,只是你用得少, pay less ; utility computing 的目标就是让计算资源也具有这 样的服务能力,用户可以使用 500 强公司所拥有的计算资 源,只是 use less pay less 。这是 cloud computing 的一 个重要方面

Cloud Computing is…

Key Characteristics illusion of infinite computing resources available on demand; elimination of an up-front commitment by Cloud users; 创业启动花费 ability to pay for use of computing resources on a short-term basis as needed 。 小时间片的 billing ,报告指 出 utility computing 在这一 点上的实践是失败的 very large datacenters large-scale software infrastructure operational expertise

Why now? very large-scale datacenter 的实践, 因为新的技术趋势和 Business 模式 pay-as-you-go computing

Key Players Amazon Web Services Google App Engine Microsoft Windows Azure

Key Applications Mobile Interactive applications, Tim O’Reilly 相信未来是 属于能够实时对用户提供信息的服务。 Mobile 必定是关键。 而后台在 datacenter 中运行是很自然的模式,特别是那些 mashup 融合类型的服务。 Parallel batch processing 。大规模数据处理使用 Cloud Computing 技术很自然, MapReduce , Hadoop 在这里起 到重要作用。这里,数据移入 / 移出 cloud 是很大的开销, Amazon 开始尝试 host large public datasets for free 。 The rise of analytics 。数据库应用中 transaction based 应 用还在增长,而 analytics 的应用增长迅速。数据挖掘,用 户行为分析等应用的巨大推动。 Extension of compute-intensive desktop application 。计 算密集型的任务,说 matlab, mathematica 都有了 cloud computing 的扩展, woo~

Cloud Computing = Silver Bullet? Google 文档在 3 月 7 日发生 了大批用户文件外泄事件。 美国隐私保护组织就此提 请政府对 Google 采取措施, 使其加强云计算产品的安 全性。 Problem of Data Lock-in

Challenges

Some other Voices It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true. Richard Stallman, quoted in The Guardian, September 29, 2008 It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true. Richard Stallman, quoted in The Guardian, September 29, 2008 The interesting thing about Cloud Computing is that we’ve redefined Cloud Computing to include everything that we already do.... I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads. Larry Ellison, quoted in the Wall Street Journal, September 26, 2008 The interesting thing about Cloud Computing is that we’ve redefined Cloud Computing to include everything that we already do.... I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads. Larry Ellison, quoted in the Wall Street Journal, September 26, 2008

What’s matter with ME?! What you want to do with 1000pcs, or even 100,000 pcs?

Cloud is coming…

Build a big “Cloud”

Example: Wikipedia Anthropology Experiment Download entire revision history of Wikipedia 4.7 M pages, 58 M revisions, 800 GB Analyze editing patterns & trends Computation Hadoop on 20-machine cluster Kittur, Suh, Pendleton (UCLA, PARC), “He Says, She Says: Conflict and Coordination in Wikipedia” CHI, 2007 Increasing fraction of edits are for work indirectly related to articles

Example: Scene Completion Image Database Grouped by Semantic Content 30 different Flickr.com groups 2.3 M images total (396 GB). Select Candidate Images Most Suitable for Filling Hole Classify images with gist scene detector [Torralba] Color similarity Local context matching Computation Index images offline 50 min. scene matching, 20 min. local matching, 4 min. compositing Reduces to 5 minutes total by using 5 machines Extension Flickr.com has over 500 million images … Hays, Efros (CMU), “Scene Completion Using Millions of Photographs” SIGGRAPH, 2007

Example: Web Page Analysis Experiment Use web crawler to gather 151M HTML pages weekly 11 times Generated 1.2 TB log information Analyze page statistics and change frequencies Systems Challenge “ Moreover, we experienced a catastrophic disk failure during the third crawl, causing us to lose a quarter of the logs of that crawl. ” Fetterly, Manasse, Najork, Wiener (Microsoft, HP), “A Large-Scale Study of the Evolution of Web Pages,” Software-Practice & Experience, 2004

Let’s build a big Computer… Given datacenter with tens of thousands of pcs, can you make all these tasks easier and run faster? Software infrastructure 的 关键部件是? Distributed storage system Distributed Computing Framework

Challenges 大规模数据处理面临的困难 大规模 PC 机群 scaling reliably is hard! On 1000s of nodes MTBF < 1 day With so many disks, nodes, switches something is always broken 并行 / 分布式程序开发,调试 is hard! 数据如何划分 任务如何调度 任务之间的通信 错误处理,容错 … Programming Model 一定的表达能力 很好的简单易用性 Programming Model 一定的表达能力 很好的简单易用性 Storage System & Computing Framework 良好可扩展性 良好的容错能力 Storage System & Computing Framework 良好可扩展性 良好的容错能力

Observation: When dealing with very large data collections, following a simple client-server approach is not going to work. Solution 1: For speeding up file accesses, apply striping techniques by which files can be fetched in parallel: (a) whole-file distribution, (b) file-striped system Cluster-Based Distributed File Systems

A natural DFS design File stripping as Chunks

Master of DFS 功能 元数据管理 inode: file -> 运行数据管理 Chunk server info 管理: map(chunk, chunkserver) Client info 管理 : locks, open files, etc. 问题 Performance bottleneck? Master failure? Master Recovery?

ChunkServer of DFS 功能 管理 chunk data: chunkid -> local file 问题 Performance bottleneck? Chunkserver failure -> data lost?

Review on DFS design Workload 大数据 顺序读和 append 操作为主 Goal Reliability, availability, scalability… Tolerance to hardware failures Managing numerous files of large size Optimizing commonly performed operations Strategies Chunk Replications (fault tolerance and performance) Large chunk size (MB) All metadata in memory on Master, with operation log

Master Client Chunkserver Data Replications in DFS /foo/bar.dat

Data Mutations Two kinds of data mutations are supported Random writes Record appends Leases used to maintain consistent mutation order A B A B A B Chunk Replica

Primary-based Consistency Protocol Master Chunkserver Client /foo/bar.dat Primary replica Secondary replica What if a mutation operation fail in the middle? What if a mutation operation fail in the middle?

Relaxed Consistency Model 修改操作后的文件区域状态 Consistent 不管从那个 replicas 读,所有 clients 看到相同数据 Defined consistent + 所有 clients 看到更新操作写入的全部数 据 Undefined consistent + 但是可能不能反映任意一个更新操作写 入的数据 Inconsistent Clients 不同时间看到不同的数据

Consistency Model (contd) 不提供完全严格的一致性 [3] 由应用程序处理这种放宽的一致性下出现的 inconsistent 数据区域问题 提供 atomic append ,保证 append at least once

Summary for DFS Architecture: master-worker File strip : large chunk size Scalability & Availability: Chunk replication Primary-based consistency protocol Relaxed consistency model

Distributed Computing 大规模机群 + 可靠存储( DFS )上怎样计算? 编程 运行 调试

Example: Web Page Analysis Experiment Use web crawler to gather 151M HTML pages weekly 11 times Generated 1.2 TB log information Analyze page statistics and change frequencies Systems Challenge “ Moreover, we experienced a catastrophic disk failure during the third crawl, causing us to lose a quarter of the logs of that crawl. ” Fetterly, Manasse, Najork, Wiener (Microsoft, HP), “A Large-Scale Study of the Evolution of Web Pages,” Software-Practice & Experience, 2004

A simple solution M :提取网页长度,按 domain 执行数据合并

A possible solution M: 提取网页长度, 按 domain 执行数据合并 R: 按 domain 执行数据合并

A More difficult Problem 统计文档集中每个 word 出现的次数 ?

Shuffle Implementation

Partition and Sort Group Partition function: hash(key)%reducer number Group function: sort by key

A Distributed Computing Framework Parallel/Distributed Computing Programming Model Input split shuffleoutput I’m the MapReduce Framework I’m the MapReduce Framework

Typical problem solved by MapReduce 读入数据 : key/value 对的记录格式数据 Map: 从每个记录里 extract something map (in_key, in_value) -> list(out_key, intermediate_value) 处理 input key/value pair 输出中间结果 key/value pairs Shuffle: 混排交换数据 把相同 key 的中间结果汇集到相同节点上 Reduce: aggregate, summarize, filter, etc. reduce (out_key, list(intermediate_value)) -> list(out_value) 归并某一个 key 的所有 values ,进行计算 输出合并的计算结果 (usually just one) 输出结果

Mapreduce Framework

Word Frequencies in Web pages 输入: one document per record 用户实现 map function ,输入为 key = document URL value = document contents map 输出 (potentially many) key/value pairs. 对 document 中每一个出现的词,输出一个记录

Example continued: MapReduce 运行系统 ( 库 ) 把所有相同 key 的记录收集到一 起 (shuffle/sort) 用户实现 reduce function 对一个 key 对应的 values 计算 求和 sum Reduce 输出

Example uses: distributed grep distributed sort web link-graph reversal term-vector / hostweb access log statsinverted index construction document clusteringmachine learningstatistical machine translation... Model is Widely Applicable MapReduce Programs In Google Source Tree

Algorithms Fit in MapReduce 文献中见到实现了的算法 K-Means, EM, SVM, PCA, Linear Regression, Naïve Bayes, Logistic Regression, Neural Network PageRank Word Co-occurrence Matrices , Pairwise Document Similarity Monte Carlo simulation ……

Capability of MapReduce MapReduce 难于有效实现的并行算法 [2] Dense/Sparse Linear Algebra N-Body Problems Dynamic Programming Graph Traversal Combinational Logic 。。。 MapReduce 是否可能成为 解决大部分并行计算需求的主要手段? MapReduce 是否可能成为 解决大部分并行计算需求的主要手段? "The landscape of parallel computing research: a view from Berkeley," 2006

Google MapReduce Architecture Single Master nodeMany worker bees

MapReduce Operation Initial data split into 64MB blocks Computed, results locally stored M sends data location to R workers Final output written Master informed of result locations

Fault Tolerance 通过 re-execution 实现 fault tolerance 周期性 heartbeats 检测 failure Re-execute 失效节点上已经完成 + 正在执行的 map tasks Why???? Re-execute 失效节点上正在执行的 reduce tasks Task completion committed through master Robust: lost 1600/1800 machines once  finished ok Master Failure?

Refinement: Redundant Execution Slow workers significantly delay completion time Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowly Solution: Near end of phase, spawn backup tasks Whichever one finishes first "wins" Dramatically shortens job completion time

Refinement: Locality Optimization Master scheduling policy: Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect Thousands of machines read input at local disk speed Without this, rack switches limit read rate

Refinement: Skipping Bad Records Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix Not always possible ~ third-party source libraries On segmentation fault: Send UDP packet to master from signal handler Include sequence number of record being processed If master sees two failures for same record: Next worker is told to skip the record

Compression of intermediate data Combiner “ Combiner ” functions can run on same machine as a mapper Causes a mini-reduce phase to occur before the real reduce phase, to save bandwidth Local execution for debugging/testing User-defined counters Other Refinements

Summary CloudComputing brings Possible of using unlimited resources on-demand, and by anytime and anywhere Possible of construct and deploy applications automatically scale to tens of thousands computers Possible of construct and run programs dealing with prodigious volume of data … How to make it real? Distributed File System Distributed Computing Framework …………………………………

Q&A

参考文献 [1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "Above the Clouds: A Berkeley View of Cloud Computing," EECS Department, University of California, Berkeley UCB/EECS , February [2] Ucb/Eecs, K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick, "The landscape of parallel computing research: a view from Berkeley," [3] G. Sanjay, G. Howard, and L. Shun-Tak, "The Google file system," in Proceedings of the nineteenth ACM symposium on Operating systems principles. Bolton Landing, NY, USA: ACM Press, [4] J. D. a. S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in Osdi, 2004, pp

Google App Engine App Engine handles HTTP(S) requests, nothing else Think RPC: request in, processing, response out Works well for the web and AJAX; also for other services App configuration is dead simple No performance tuning needed Everything is built to scale “infinite” number of apps, requests/sec, storage capacity APIs are simple, stupid

App Engine Architecture 63 Python VM process stdlib app memcache datastore mail images urlfech stateful APIs stateless APIsR/O FS req/resp

Microsoft Windows Azure

Amazon Web Services Amazon’s infrastructure (auto scaling, load balancing) Elastic Compute Cloud (EC2) – scalable virtual private server instances Simple Storage Service (S3) Simple Queue Service (SQS) – messaging SimpleDB - database Flexible Payments Service, Mechanical Turk, CloudFront, etc.

Amazon Web Services Very flexible, lower-level offering (closer to hardware) = more possibilities, higher performing Runs platform you provide (machine images) Supports all major web languages Industry-standard services (move off AWS easily) Require much more work, longer time-to-market Deployment scripts, configuring images, etc. Various libraries and GUI plug-ins make AWS do help

Price of Amazon EC2