當企鵝龍遇上小飛象 DRBL-Hadoop Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw.

1 當企鵝龍遇上小飛象 DRBL-Hadoop Jazz Wang Yao-Tsung Wang

2 Programmer v.s. System Admin.
Source: Source:

3 PART 1 : PART 2 : PART 3 : What is Cluster Computing ?
Agenda PART 1 : What is Cluster Computing ? How to deploy PC cluster ? PART 2 : What is DRBL and Clonezilla ? Can DRBL help to deploy Hadoop ? PART 3 : Live Demo of DRBL Live and Clonezilla Live

4 PART 1 : PC Cluster 101 Jazz Wang Yao-Tsung Wang

5 At First, We have “ 4 + 1 ” PC Cluster
It'd better be 2n Manage Scheduler

6 Then, We connect 5 PCs with Gigabit Ethernet Switch
GiE Switch 10/100/1000 MBps WAN Add 1 NIC for WAN

7 Compute Nodes 4 Compute Nodes will communicate via LAN Switch. Only Manage Node have Internet Access for Security! LAN Switch WAN Manage Node

8 Basic System Setup for Cluster
Compute Nodes Basic System Setup for Cluster Messaging Account Mgnt. MPICH SSHD NIS YP GCC GNU Libc Bash Kernel Module Perl Linux Kernel Boot Loader

9 We need to install Scheduler and Network File System for sharing
On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node Job Mgnt. Messaging Account Mgnt. OpenPBS MPICH SSHD NIS YP File Sharing GCC GNU Libc NFS Bash Kernel Module Perl Extra Linux Kernel Boot Loader

10 Research topics about PC Cluster
Process Architecture Storage Architecture System Architecture Network Architecture System-level Middleware Cluster Computing Parallel Computing Share Memory Programming Distributed Memory Programming Parallel Algorithms And Applications Application-level Middleware Programming Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences

11 Challenges of Cluster Computing
Hardware Ethernet Speed / PC Density Power / Cooling / Heat Network and Storage Architecture Software Job Scheduler ( Cluster level ) Account Management File Sharing / Package Management Limitation Shared Memory Global Memory Management

12 Common Method to deploy Cluster
3. Configure Settings 4. Install Job Scheduler 5. Running Benchmark 2. Cloning to multiple machine 1. Setup one Template machine

13 Challenges of Common Method
Add New User Account ? Upgrade Software ? How to share user data ? Configuration Syncronization

14 How to deploy Nodes ????

15 Advanced Methods to deploy Cluster
SSI ( Single System Image ) Multiple PCs as Single Computing Resources Image-based homogeneous ex. SystemImager, OSCAR, Kadeploy Package-based heterogeneous easy update and modify packages ex. FAI, DRBL Other deploy tools Rocks : RPM only cfengine : configuration engine

16 Comparison of Cluster Deploy Tools
Distribution Support Diskless/ Sysmless Type Node configuration tools Cluster management Database installation System Imager ALL Yes Image No OSCAR RPM-based Kadeploy FAI Debian-Based Package

17 Hadoop Deployment Tool
PART 2-1 : Hadoop Deployment Tool Jazz Wang Yao-Tsung Wang

18 Source: Deploying hadoop with smartfrog

19 Source: Deploying hadoop with smartfrog

20 Source: Deploying hadoop with smartfrog

21 Source: Deploying hadoop with smartfrog

22 Source: Deploying hadoop with smartfrog

23 Source: Deploying hadoop with smartfrog

24 Source: Deploying hadoop with smartfrog

25 Source: Deploying hadoop with smartfrog

26 Source: Deploying hadoop with smartfrog

27 PART 2-2 : 企鵝龍與再生龍 工商服務時間 Jazz Wang Yao-Tsung Wang

28 何謂企鵝龍DRBL ?? = + + Server Diskless Remote Boot in Linux Diskfull PC
網路是便宜的,人的時間才是昂貴的。 企鵝龍簡單來說就是..... 用網路線取代硬碟排線 所有學生的電腦都透過網路連接到一台伺服器主機 Diskfull PC = + + Diskless PC Server source:

29 何謂再生龍Clonezilla ?? Clone (複製) + zilla = Clonezilla (再生龍) 裸機備分還原工具
Norton Ghost 的自由軟體版替代方案 Disk to Disk Disk to Image Image to N Disks

30 降低資訊教育管理成本 需要「化繁為簡」的解決方案! 人力、時間成本高 設備維護成本高 教師1人維護管理多組設備 教學同時分派或收集作業
需分別處理設定(每班約40台) 如:電腦中毒、環境設定 系統操作問題、開關機、 備份還原等 一般國內小學的電腦教室

31 平衡商業軟體與知識教育 知識和軟體都需要讓孩子「帶著走」! 商業軟體授權高成本 知識與法治的學習 在校學習,也需回家複習
學校每台(平均) 2萬 學生家用(平均) 4萬 知識與法治的學習 教育知識,也需教育尊重 尊重智財權觀念

32 國網中心自由軟體開發 多元化資訊教學的新選擇! 企鵝龍DRBL 再生龍Clonezilla
以個人叢集電腦(PC Cluster)經驗發展DRBL&Clonezilla 企鵝龍DRBL 再生龍Clonezilla (Diskless Remote Boot in Linux ) 適合將整個電腦教室轉換 成純自由軟體環境 適用完整系統備份、裸機 還原或災難復原 是自由!不是免費… 分送、修改、存取、使用軟體的自由。免費是附加價值。

33 企鵝龍DRBL與再生龍Clonezilla
電腦教室管理的新利器! ■以每班40台電腦為估算單位

34 NT. 98,595,000 元 節省龐大軟體授權費 降低台灣盜版率 提升台灣形象 降低管理維護成本 帶動自由軟體使用
節樽軟體授權成本(估計) NT. 98,595,000 元 以某商業獨家軟體每機3000元授權費計,每班35台電腦(3000*35*939) 教育單位採用DRBL 高速計算研究 資料儲存備援 擴至全國各單位

35 PART 1-3 : 企鵝龍的開機原理 Jazz Wang Yao-Tsung Wang

36 Redhat, Fedora, CentOS, Mandriva,
1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian, ... Linux Kernel Kernel Module GNU Libc Boot Loader

37 2nd, We install DRBL package and configure it as DRBL Server.
There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server ... DHCPD TFTPD NFS Bash Perl Network Booting YP NIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! SSHD Linux Kernel Kernel Module GNU Libc Boot Loader

38 After running “drblsrv -i” &
“drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

39 3nd, We enable PXE function in
BIOS configuration. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

40 While Booting, PXE will query
IP address from DHCPD. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

41 While Booting, PXE will query
IP address from DHCPD. IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

42 After PXE get its IP address, it will download booting files from TFTPD.
NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

43 pxelinux vmlinuz initrd IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

44 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. initrd-pxe vmlinuz-pxe pxelinux

45 Config. 1 Config. 2 Config. 3 Config. 4 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

46 Applications and Services will also deployed to each Compute Node
Bash Perl SSHD Applications and Services will also deployed to each Compute Node via NFS .... NFS TFTPD DHCPD SSHD NIS YP Perl Bash DRBL Server

47 With the help of NIS and YP, You can login each Compute Node
SSHD SSH Client With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFS TFTPD DHCPD SSHD NIS YP DRBL Server

48 PART 2 -1: 當企鵝龍遇上小飛象 Jazz Wang Yao-Tsung Wang

49 使用DRBL佈署Hadoop 仍在開發中,待整理套件 drbl-hadoop – 掛載本機硬碟給 HDFS 用
svn co hadoop-register – 註冊網站與ssh applet svn co

50 關於hadoop.nchc.org.tw DRBL Server - 1台(hadoop),加大/home與/tftpboot空間。
DRBL Client - 19台(hadoop101~hadoop119) 使用Cloudera的Debian套件 使用drbl-hadoop 的設定跟init.d script來協助部署 使用hadoop-register 來提供使用者註冊與ssh applet介面

51 Lesson Learn Cloudera套件的好處:使用init.d script 來啟動關閉
name node, data node, job tracker, task tracker 建立大量帳號: 可透過DRBL內建指令完成 /opt/drbl/sbin/drbl-useradd 使用者預設HDFS家目錄 跑迴圈切換使用者,下 hadoop fs -mkdir tmp 設定使用者HDFS權限 跑迴圈切換使用者,下 hadoop dfs -chown $(id) /usr/$(id) HDFS會使用/var/lib/hadoop/cache/hadoop/dfs MapReduce會使用/var/lib/hadoop/cache/hadoop/mapred

52 PART 2 -2: Live Demo Jazz Wang Yao-Tsung Wang

53 WAN DRBL-Live

54 Demo with DRBL-Live CD 1. Boot Server with DRBL-Live CD
2. Download DRBL-Hadoop Script 3. Follow the steps

55 Questions? Jazz Wang Yao-Tsung Wang

