Download presentation
Presentation is loading. Please wait.
1
YARN & MapReduce 2.0 Boyu Diao
2
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples 这周上周讨论了hadoop内核 hdfs 这周我们将讨论内核另外两个部分 yarn和mapreduce 其实hadoop的内核并不是在一开始就有这三个,而是从hadoop2.0开始才有的 我们首先来看一下hadoop内核发展的历程
3
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples
4
Evolution of Hadoop Core
Hadoop的1.0版本 只有hdfs和mapreduce 到了2.0 加入了一个调度层yarn Hadoop 2.0 还可以支持其他数据处理的框架
5
Evolution of Hadoop Core
Apache 版本 年8月1日后 没有再更新过 年6月27日 2.X版本,比0.23版本多了hdfs HA 2.6.4 年2月左右
6
Evolution of Hadoop Core
Why Hadoop 2.0 从版本演化,我们可以看到 Hadoop2.0 相比hadoop1.0的优化在两方面 一个是Hadoop hdfs HA 一个是添加yarn 资源调度层 为什么要做这两个 和很好理解。
7
Evolution of Hadoop Core
Why Hadoop 2.0 ? Performance bottleneck : JobTracker / NameNode Single point of failure : JobTracker / NameNode Not flexible : MapReduce Only Cost of Operation and maintenance Data Sharing
8
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture 上周我们提到过一次hadoop生态系统,这里我们重新详细说一下 上周我们也提过,大数据技术,核心是分布式系统的相关理论。 所以,在说HDFS之前,我们先讨论一些分布式系统的概念 然后就是HDFS 的架构,读写过程,和Shell以及api等
9
YARN: Yet Another Resource Negotiator
名字的由来
10
YARN Architecture Why YARN Performance bottleneck : JobTracker
Single point of failure : JobTracker Not flexible : MapReduce Only Cost of Operation and maintenance Data Sharing
11
YARN Architecture YARN YARN 是资源调度器, 两层调度 就好比说,送快递
以前顺丰在中国只有一个网店,上海,负责中国所有市区县镇的快递业务。那你想想,这个运配中心得多大 但更合理的肯定是 全国分几个大的枢纽,到市县还会有下级单位,一级一级的调度。 那你可能会问,这么简单的调度为什么在最开始的时候没想到,原因很简单,因为google的论文里没写。 其实gogle
12
YARN Architecture :Terminologies
Resource Manager Application Manager Resource Scheduler Node Manager Application Master Container YARN里的哪些术语,也就是他怎么定义各个组件的
13
YARN Architecture :Resource Manager
Application Manager Resource Scheduler Client Request Start/ Monitor App Master Monitor Node Manager 生命周期
14
YARN Architecture :Node Manager
Task Managing Local Resource Scheduling App Mater Request 生命周期
15
YARN Architecture :Application Master
Start/Monitor App Apply Resource for Tasks Allocate Resource for Tasks
16
YARN Architecture :Container
Containing : Task Runtime Environment Jars Task Resources CPU/Memory Initial Information Start Command Parameters 生命周期 类似docker
17
YARN Architecture :Terminologies
Resource Manager Node Manager Application Master Container
18
YARN Architecture : Anatomy
19
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
20
YARN: Fault-Tolerance
Resource Manager Zookeeper HA Node Manager All tasks on this machine fail Resource Manager inform App Master to restart failure tasks Application Master Resource Manager Restart AM Resource Manager keep the context
21
YARN: Resources Scheduling
FIFO Scheduling Capacity Scheduling Fair Scheduling Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
22
YARN: X on YARN
23
YARN: Tez on YARN
24
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
25
MapReduce MapReduce : Simplified Data Processing on Large Clusters, OSDI 2004 MapReduce: The theory or framework A paradigm in functional language A Software in Google Core of Hadoop 1.0 Core of Hadoop 2.0 (MapReduce on YARN)
26
MapReduce: Why Functional Programming Lisp
27
MapReduce Architecture
28
MapReduce Architecture: 1.0 Terminologies
Job Tracker Task Trackers Map Task Reduce Task
29
MapReduce Architecture: Job Tracker
Master Manage Jobs Schedule jobs to Task Trackers Resource Scheduling
30
MapReduce Architecture: Task Trackers
Slaves Map Tasks Reduce Tasks Communicate with Job Tacker
31
MapReduce Architecture: Map Task
Map Engine Input <key1,v1> Output <key2,v2>
32
MapReduce Architecture: Reduce Task
Reduce engine Input <key1 list(value1)> Output <value2>
33
MapReduce Architecture: 2.0 Terminologies
MR App Master Master Manage Jobs Schedule jobs to Task Trackers Ask Resource Manager for Resources.
34
MapReduce Architecture: MR App Master
Map Task Reduce Task
35
MapReduce : Anatomy
36
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
37
MapReduce : Fault-Tolerance
MRAppMaster Failure Resource Manager restart Default twice Map / Reduce Task Failure MRAppMaster request Resource and restart Default 4 times
38
MapReduce :Backup Tasks
Unusual Straggler Restart a same Map/Reduce Task on a different machine.
39
MapReduce :Applications
Distributed Grep
40
MapReduce :Applications
Count of URL Access Frequency:
41
MapReduce :Applications
Inverted Index:
42
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
Similar presentations