Presentation is loading. Please wait.

Presentation is loading. Please wait.

YARN & MapReduce 2.0 Boyu Diao 2016.06.17.

Similar presentations


Presentation on theme: "YARN & MapReduce 2.0 Boyu Diao 2016.06.17."— Presentation transcript:

1 YARN & MapReduce 2.0 Boyu Diao

2 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples 这周上周讨论了hadoop内核 hdfs 这周我们将讨论内核另外两个部分 yarn和mapreduce 其实hadoop的内核并不是在一开始就有这三个,而是从hadoop2.0开始才有的 我们首先来看一下hadoop内核发展的历程

3 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples

4 Evolution of Hadoop Core
Hadoop的1.0版本 只有hdfs和mapreduce 到了2.0 加入了一个调度层yarn Hadoop 2.0 还可以支持其他数据处理的框架

5 Evolution of Hadoop Core
Apache 版本 年8月1日后 没有再更新过 年6月27日 2.X版本,比0.23版本多了hdfs HA 2.6.4 年2月左右

6 Evolution of Hadoop Core
Why Hadoop 2.0 从版本演化,我们可以看到 Hadoop2.0 相比hadoop1.0的优化在两方面 一个是Hadoop hdfs HA 一个是添加yarn 资源调度层 为什么要做这两个 和很好理解。

7 Evolution of Hadoop Core
Why Hadoop 2.0 ? Performance bottleneck : JobTracker / NameNode Single point of failure : JobTracker / NameNode Not flexible : MapReduce Only Cost of  Operation and maintenance Data Sharing

8 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture 上周我们提到过一次hadoop生态系统,这里我们重新详细说一下 上周我们也提过,大数据技术,核心是分布式系统的相关理论。 所以,在说HDFS之前,我们先讨论一些分布式系统的概念 然后就是HDFS 的架构,读写过程,和Shell以及api等

9 YARN: Yet Another Resource Negotiator
名字的由来

10 YARN Architecture Why YARN Performance bottleneck : JobTracker
Single point of failure : JobTracker Not flexible : MapReduce Only Cost of  Operation and maintenance Data Sharing

11 YARN Architecture YARN YARN 是资源调度器, 两层调度 就好比说,送快递
以前顺丰在中国只有一个网店,上海,负责中国所有市区县镇的快递业务。那你想想,这个运配中心得多大 但更合理的肯定是 全国分几个大的枢纽,到市县还会有下级单位,一级一级的调度。 那你可能会问,这么简单的调度为什么在最开始的时候没想到,原因很简单,因为google的论文里没写。 其实gogle

12 YARN Architecture :Terminologies
Resource Manager Application Manager Resource Scheduler Node Manager Application Master Container YARN里的哪些术语,也就是他怎么定义各个组件的

13 YARN Architecture :Resource Manager
Application Manager Resource Scheduler Client Request Start/ Monitor App Master Monitor Node Manager 生命周期

14 YARN Architecture :Node Manager
Task Managing Local Resource Scheduling App Mater Request 生命周期

15 YARN Architecture :Application Master
Start/Monitor App Apply Resource for Tasks Allocate Resource for Tasks

16 YARN Architecture :Container
Containing : Task Runtime Environment Jars Task Resources CPU/Memory Initial Information Start Command Parameters 生命周期 类似docker

17 YARN Architecture :Terminologies
Resource Manager Node Manager Application Master Container

18 YARN Architecture : Anatomy

19 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture

20 YARN: Fault-Tolerance
Resource Manager Zookeeper HA Node Manager All tasks on this machine fail Resource Manager inform App Master to restart failure tasks Application Master Resource Manager Restart AM Resource Manager keep the context

21 YARN: Resources Scheduling
FIFO Scheduling Capacity Scheduling Fair Scheduling Dominant Resource Fairness: Fair Allocation of Multiple Resource Types

22 YARN: X on YARN

23 YARN: Tez on YARN

24 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture

25 MapReduce MapReduce : Simplified Data Processing on Large Clusters, OSDI 2004 MapReduce: The theory or framework A paradigm in functional language A Software in Google Core of Hadoop 1.0 Core of Hadoop 2.0 (MapReduce on YARN)

26 MapReduce: Why Functional Programming Lisp

27 MapReduce Architecture

28 MapReduce Architecture: 1.0 Terminologies
Job Tracker Task Trackers Map Task Reduce Task

29 MapReduce Architecture: Job Tracker
Master Manage Jobs Schedule jobs to Task Trackers Resource Scheduling

30 MapReduce Architecture: Task Trackers
Slaves Map Tasks Reduce Tasks Communicate with Job Tacker

31 MapReduce Architecture: Map Task
Map Engine Input <key1,v1> Output <key2,v2>

32 MapReduce Architecture: Reduce Task
Reduce engine Input <key1 list(value1)> Output <value2>

33 MapReduce Architecture: 2.0 Terminologies
MR App Master Master Manage Jobs Schedule jobs to Task Trackers Ask Resource Manager for Resources.

34 MapReduce Architecture: MR App Master
Map Task Reduce Task

35 MapReduce : Anatomy

36 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture

37 MapReduce : Fault-Tolerance
MRAppMaster Failure Resource Manager restart Default twice Map / Reduce Task Failure MRAppMaster request Resource and restart Default 4 times

38 MapReduce :Backup Tasks
Unusual Straggler Restart a same Map/Reduce Task on a different machine.

39 MapReduce :Applications
Distributed Grep

40 MapReduce :Applications
Count of URL Access Frequency:

41 MapReduce :Applications
Inverted Index:

42 Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN
YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture

43


Download ppt "YARN & MapReduce 2.0 Boyu Diao 2016.06.17."

Similar presentations


Ads by Google