YARN & MapReduce 2.0 Boyu Diao 2016.06.17
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples 这周上周讨论了hadoop内核 hdfs 这周我们将讨论内核另外两个部分 yarn和mapreduce 其实hadoop的内核并不是在一开始就有这三个,而是从hadoop2.0开始才有的 我们首先来看一下hadoop内核发展的历程
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture Examples
Evolution of Hadoop Core Hadoop的1.0版本 只有hdfs和mapreduce 到了2.0 加入了一个调度层yarn Hadoop 2.0 还可以支持其他数据处理的框架
Evolution of Hadoop Core Apache 版本 1.2.1 2013年8月1日后 没有再更新过 0.23.11 2014年6月27日 2.X版本,比0.23版本多了hdfs HA 2.6.4 2.7.2 2016年2月左右
Evolution of Hadoop Core Why Hadoop 2.0 从版本演化,我们可以看到 Hadoop2.0 相比hadoop1.0的优化在两方面 一个是Hadoop hdfs HA 一个是添加yarn 资源调度层 为什么要做这两个 和很好理解。
Evolution of Hadoop Core Why Hadoop 2.0 ? Performance bottleneck : JobTracker / NameNode Single point of failure : JobTracker / NameNode Not flexible : MapReduce Only Cost of Operation and maintenance Data Sharing
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture 上周我们提到过一次hadoop生态系统,这里我们重新详细说一下 上周我们也提过,大数据技术,核心是分布式系统的相关理论。 所以,在说HDFS之前,我们先讨论一些分布式系统的概念 然后就是HDFS 的架构,读写过程,和Shell以及api等
YARN: Yet Another Resource Negotiator 名字的由来
YARN Architecture Why YARN Performance bottleneck : JobTracker Single point of failure : JobTracker Not flexible : MapReduce Only Cost of Operation and maintenance Data Sharing
YARN Architecture YARN YARN 是资源调度器, 两层调度 就好比说,送快递 以前顺丰在中国只有一个网店,上海,负责中国所有市区县镇的快递业务。那你想想,这个运配中心得多大 但更合理的肯定是 全国分几个大的枢纽,到市县还会有下级单位,一级一级的调度。 那你可能会问,这么简单的调度为什么在最开始的时候没想到,原因很简单,因为google的论文里没写。 其实gogle
YARN Architecture :Terminologies Resource Manager Application Manager Resource Scheduler Node Manager Application Master Container YARN里的哪些术语,也就是他怎么定义各个组件的
YARN Architecture :Resource Manager Application Manager Resource Scheduler Client Request Start/ Monitor App Master Monitor Node Manager 生命周期
YARN Architecture :Node Manager Task Managing Local Resource Scheduling App Mater Request 生命周期
YARN Architecture :Application Master Start/Monitor App Apply Resource for Tasks Allocate Resource for Tasks
YARN Architecture :Container Containing : Task Runtime Environment Jars Task Resources CPU/Memory Initial Information Start Command Parameters 生命周期 类似docker
YARN Architecture :Terminologies Resource Manager Node Manager Application Master Container
YARN Architecture : Anatomy
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
YARN: Fault-Tolerance Resource Manager Zookeeper HA Node Manager All tasks on this machine fail Resource Manager inform App Master to restart failure tasks Application Master Resource Manager Restart AM Resource Manager keep the context
YARN: Resources Scheduling FIFO Scheduling Capacity Scheduling Fair Scheduling Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
YARN: X on YARN
YARN: Tez on YARN
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
MapReduce MapReduce : Simplified Data Processing on Large Clusters, OSDI 2004 MapReduce: The theory or framework A paradigm in functional language A Software in Google Core of Hadoop 1.0 Core of Hadoop 2.0 (MapReduce on YARN)
MapReduce: Why Functional Programming Lisp
MapReduce Architecture
MapReduce Architecture: 1.0 Terminologies Job Tracker Task Trackers Map Task Reduce Task
MapReduce Architecture: Job Tracker Master Manage Jobs Schedule jobs to Task Trackers Resource Scheduling
MapReduce Architecture: Task Trackers Slaves Map Tasks Reduce Tasks Communicate with Job Tacker
MapReduce Architecture: Map Task Map Engine Input <key1,v1> Output <key2,v2>
MapReduce Architecture: Reduce Task Reduce engine Input <key1 list(value1)> Output <value2>
MapReduce Architecture: 2.0 Terminologies MR App Master Master Manage Jobs Schedule jobs to Task Trackers Ask Resource Manager for Resources.
MapReduce Architecture: MR App Master Map Task Reduce Task
MapReduce : Anatomy
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture
MapReduce : Fault-Tolerance MRAppMaster Failure Resource Manager restart Default twice Map / Reduce Task Failure MRAppMaster request Resource and restart Default 4 times
MapReduce :Backup Tasks Unusual Straggler Restart a same Map/Reduce Task on a different machine.
MapReduce :Applications Distributed Grep
MapReduce :Applications Count of URL Access Frequency:
MapReduce :Applications Inverted Index:
Outlines Evolution of Hadoop Core YARN: MapReduce Why YARN YARN Architecture Other Topics MapReduce Why MapReduce MapReduce Architecture