Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop.

Slides:



Advertisements
Similar presentations
精品学习网 --- 初中频道 海量同步课件、同步备考、同步试题等资 源免费下载!
Advertisements

云计算辅助教学风云录 黎加厚 上海师范大学教育技术系 2010年8月9日.
Linux 安裝入門與基本管理 課程目標: 學習Linux平台下的安裝設定 與 建置一個基本的Linux伺服器
Big Data Ecosystem – Hadoop Distribution
Business Model and Core Technologies of Cloud Computing
教育雲端科技的現況與未來發展 臺北市政府教育局聘任督學 韓長澤.
日月光人力需求簡介 Robert Guo.
简化 IT,促进创新 — 为现代企业带来新生机
Cobbler+RDO= Openstack
當企鵝龍遇上小飛象DRBL-Hadoop當企鵝龍遇上小飛象DRBL-Hadoop Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang
台灣雲端運算應用實驗中心研發計畫 計 畫 期 間:自98年7月1日至99年6月30日止 執行單位名稱 :財團法人資訊工業策進會 國立中山大學.
HADOOP的高能物理分析平台 孙功星 高能物理研究所/计算中心
2012 Project Planning 2012 年技术规划
商業智慧與資料倉儲 課程簡介 靜宜大學資管系 楊子青.
Subversion (SVN) Presented by 李明璋 R /2/21
大数据在医疗行业的应用.
桂小林 西安交通大学电子与信息工程学院 计算机科学与技术系
网络地址转换(NAT) 及其实现.
天文望远镜集成建模研究 杨德华 南京天文光学技术研究所 30 NOV, 年中国虚拟天文台年会 广西师范大学 桂林
Introduction to MapReduce
Building Multi-user Hadoop Cluster using DRBL & Clonezilla
當企鵝龍遇上小飛象 DRBL-Hadoop Jazz Wang Yao-Tsung Wang
YARN & MapReduce 2.0 Boyu Diao
EMC VMware架构下的备份解决方案 中国解决方案中心.
CJLR PDM&SRM 单点登录指南 场景一:在CJLR公司网络中(CJLR办公室/由VPN拨入),使用CJLR公司电脑登录:
International Conference ITIE2010: Inspiration from Best Practices
Linux.
高级软件工程 云计算 主讲:李祥 QQ: 年12月.
異質計算教學課程內容 「異質計算」種子教師研習營 洪士灝 國立台灣大學資訊工程學系
佐登妮斯大樓監控系統簡介 圓 泰 科 技 1.
沈 彤 英特尔中国区嵌入式产品事业部 市场经理 Jul, 26th 2011
王耀聰 陳威宇 國家高速網路與計算中心(NCHC)
不断变迁的闪存行业形势 Memory has changed, especially serial - from a low cost, low pin count, slow memory to an advanced, high performance memory solution to save.
建置、升級與轉換您的資料庫 - Data Tier Applications
CHAPTER 6 認識MapReduce.
Cloud Computing(雲端運算) 技術的現況與應用
讲议: PXE 介绍及实现 Jarvis
Isilon中国区技术经理 杨峰 虚拟天文台年会 存储技术交流 Isilon中国区技术经理 杨峰 Isilon Proprietary and Confidential.
CDR - Continuous Data Replication
The expression and applications of topology on spatial data
網路資源的建立--LINUX系統 國立東華大學電算中心 陳鴻彬.
但是如果你把它发给最少两个朋友。。。你将会有3年的好运气!!!
SAP 架構及基本操作 SAP前端軟體安裝與登入 Logical View of the SAP System SAP登入 IDES
SpringerLink 新平台介绍.
建设 21 世纪 具有国际先进水平的 教育与科研计算机网
客户服务 询盘惯例.
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
第二章 防火墙基础技术.
大数据介绍及应用案例分享 2016年7月 华信咨询设计研究院有限公司.
2010電資院 「頂尖企業暑期實習」 經驗分享心得報告
服務於中國研究的網絡基礎設施 A Cyberinfrastructure for Historical China Studies
Review Final Chinese 2-Chapter 6~10-1
實用郵政英文對話.
資料庫 靜宜大學資管系 楊子青.
Real-Time System Software Group Lab 408 Wireless Networking and Embedded Systems Laboratory Virtualization, Parallelization, Service 實驗室主要是以系統軟體設計為主,
Guide to a successful PowerPoint design – simple is best
Unit 05 雲端分散式Hadoop實驗 -I M. S. Jian
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中国科学技术大学计算机系 陈香兰 Fall 2013 第三讲 线程 中国科学技术大学计算机系 陈香兰 Fall 2013.
Unit 7 Lesson 20 九中分校 刘秀芬.
SpringerLink 新平台介绍.
SAP 架構及基本操作 SAP前端軟體安裝與登入 Logical View of the SAP System SAP登入 IDES
M; Well, let me check again with Jane
TinyDB資料庫 靜宜大學資管系 楊子青.
雲端架構對企業外部管理與內部管理的改變.
11 Overview Cloud Computing 2012 NTHU. CS Che-Rung Lee
怎樣把同一評估 給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.
『基督徒的奧林匹克』 林前 9:24-27 葉裕波 牧師.
OrientX暑期工作总结及计划 XML Group
Presentation transcript:

Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang

2 WHO AM I ? 這傢伙是誰啊? JAZZ ? 講者介紹: – Jazz Yao-Tsung NCHC / NCTU ECE Master – 國網中心 王耀聰 副研究員 / 交大電控八九級碩士 – 所有投影片、參考資料與操作步驟均在網路上 All the slides could be found at – FOSS End User FOSS 使用者 Debian/Ubutnu Access Grid Motion/VLC Red5 Debian Router DRBL/Clonezilla Hadoop FOSS Promoter 自由軟體推廣者 DRBL/Clonezilla Partclone/Tuxboot Hadoop Ecosystem FOSS Developer 行動力薄弱的開發者 TRTC WSU/ Hadop4Win / Haduzilla / Ezilla

3 Data Explosion!! 始於 2007 的「資料大爆炸」時代 出處: The Expanding Digital Universe, A Forecast of Worldwide Information Growth Through 2010, March 2007, An IDC White Paper - sponsored by EMC 年, IDC 預估 2010 年會成長六倍! (相較 2006 年) EB EB ( 預測 )

4 出處: Extracting Value from Chaos, June 2011, An IDC White Paper - sponsored by EMC 追蹤歷年的 IDC 數據: EB EB EB EB (0.8 ZB) EB ( 預測 ) EB (1.2 ZB) EB ( 預測 ) EB (1.8 ZB) Digital Universe expanded 1.6x each year!! 每年約 1.6 倍 景氣差而成長趨緩? 或受新技術抑制?

5 Now we all need to store and process BIG DATA!!

6

7 Features of Hadoop... Hadoop 這套軟體的特色是... 海量 Vast Amounts of Data – 擁有儲存與處理大量資料的能力 – Capability to STORE and PROCESS vast amounts of data. 經濟 Cost Efficiency – 可以用在由一般 PC 所架設的叢集環境內 – Based on large clusters built of commodity hardware. 效率 Parallel Performance – 透過分散式檔案系統的幫助,以致得到快速的回應 – With the help of HDFS, Hadoop have better performance. 可靠 Robustness – 當某節點發生錯誤,能即時自動取得備份資料及佈署運算資源 – Robustness to add and remove computing and storage resource without shutdown entire system.

8 Which companies are powered by Hadoop ?? 有哪些公司在用 Hadoop 這套軟體 ?? Yahoo is the key contributor currently. IBM and Google teach Hadoop in universities … The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth) – from Facebook Tweeter

9 Hadoop in production run.... 商業運轉中的 Hadoop 應用.... February 19, 2008 Yahoo! Launches World's Largest Hadoop Production Application

10 You can store and process BIG DATA via Large Cluster!!

Common method to deploy Cluster in Labs 1. Setup one Templatemachine Templatemachine 2. Cloning tomultiplemachine tomultiplemachine 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark

Challenges of common method in Labs Upgrade Software ? Add New User Account ? Configuration Syncronization How to share user data ?

How to deploy Nodes ?!

Source: Deploying hadoop with smartfrog

Source: Deploying hadoop with smartfrog

If you need to deploy in Cloud - try Puppet 如果要在 Amazon EC2 上佈署 Hadoop 等軟體,可以考慮 Puppet 因為作業系統已由虛擬機器的範本裝好了,只能用「有碟」的作法!

17 Can I install ONE server to deploy hadoop cluster ?

Yes, use DRBL to deploy Hadoop Need to build new debian packages drbl-hadoop – Mounting local disk for HDFS and MapReduce svn co hadoop-register – for multiuser registration and ssh client svn co

About hadoop.nchc.org.tw DRBL Server x 1 Node (hadoop) DRBL Client x 20 Nodes (hadoop101~hadoop120) Powered by Debian Squeeze 6.0.4

使用者註冊頁面 Hadoop-Register Powered by Zterm

系統狀態監控 Ganglia 採用自由軟體 Ganglia 來蒐集電腦叢集的負載狀態

DRBL+Hadoop=Haduzilla 黑肚龍系統架構

23 Can you help me to deploy my own multiuser hadoop cluster like hadoop.nchc.org.tw ?

In Year 2009, I released DRBL- Hadoop Live CD 舊影片: 下載點:

25 But I want it installed to disks for production …. What should I do ?

On 11 Feb 2011, 4$ shared about preseed! Source: 感謝 4$ 大大分享 Debian 6.0 自動化安裝

1st, We install Base System of GNU Debian Linux with Debian Installer and Preseed …... According to i/squeeze/preseed.cfg It will install (1) Base Packages of Debian (2) DRBL, JVM, Hadoop, etc.... (3) Run late_command script 1st, We install Base System of GNU Debian Linux with Debian Installer and Preseed …... According to i/squeeze/preseed.cfg It will install (1) Base Packages of Debian (2) DRBL, JVM, Hadoop, etc.... (3) Run late_command script Linux Kernel Kernel Module GNU Libc Boot Loader Debian Netinst CD

After reboot, we had installed DRBL package and rc.local script will configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... After reboot, we had installed DRBL package and rc.local script will configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... DHCP D TFTPDTFTPDNFSNFS BashBashPerlPerl Network Booting YPYPNISNIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! DRBL Server based on existing Open Source and keep Hacking! SSH D JVMJVMHadoopHadoopApacheApacheGangliaGanglia DRBL Server Hadoop Server Linux Kernel Kernel Module GNU Libc Boot Loader

pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname rc.local script will run “drblsrv” & “drblpush”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each DRBL Client in NFSROOT rc.local script will run “drblsrv” & “drblpush”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each DRBL Client in NFSROOT Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTPDTFTPDNFSNFSYPYPNISNIS SSH D

BIOS PXE 3nd, We enable PXE function in BIOS configuration. 3nd, We enable PXE function in BIOS configuration. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTPDTFTPDNFSNFSYPYPNISNIS SSH D

BIOS PXE While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTPDTFTPDNFSNFSYPYPNISNIS SSH D DHCP D

IP 1 IP 2 IP 3 IP 4 While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTPDTFTPDNFSNFSYPYPNISNIS SSH D DHCP D

IP 1 IP 2 IP 3 IP 4 After PXE get its IP address, it will download booting files from TFTPD. Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD

IP 1 IP 2 IP 3 IP 4 Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd

Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. NFSNFS

Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD Config. Files Ex. hostname Config. Files Ex. hostname NFSNFS Config. 1 Config. 2 Config. 3 Config. 4

DRBL Server YPYPNISNIS DHCP D TFTPDTFTPDNFSNFS HadoopHadoopJVMJVM SSH D JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD Applications and Services will also deployed to each Compute Node via NFS.... Applications and Services will also deployed to each Compute Node via NFS....

DRBL Server DHCP D TFTPDTFTPD With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFSNFS SSH D YPYPNISNIS SSHDSSHDSSHDSSHDSSHDSSHDSSHDSSHD SSH Client

Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang DemoDemo

WANWAN Debian netinst CD tap0 eth0:1 eth0 iptables

Attribution-Noncommercial-Share Alike 3.0 Taiwan These slides could be distributed by Creative Commons License.

Questions? Slides Questions? Slides Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang