當企鵝龍遇上小飛象DRBL-Hadoop當企鵝龍遇上小飛象DRBL-Hadoop Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang
Source: content/uploads/2007/08/programmer.jpghttp:// content/uploads/2007/08/programmer.jpg Source: content/uploads/2007/08/programmer.jpghttp:// content/uploads/2007/08/programmer.jpg Source: Source: Programmer v.s. System Admin.
AgendaAgenda What is Cluster Computing ? How to deploy PC cluster ? What is DRBL and Clonezilla ? Can DRBL help to deploy Hadoop ? Live Demo of DRBL Live and Clonezilla Live PART 3 : PART 1 : PART 2 :
PC Cluster 101 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 1 :
At First, We have “ ” PC Cluster It'd better be 2 n It'd better be 2 n Manage Schedule r Manage
GiE Switch WANWAN Then, We connect 5 PCs with Gigabit Ethernet Switch Then, We connect 5 PCs with Gigabit Ethernet Switch 10/100/1000MBps10/100/1000MBps Add 1 NIC for WAN Add 1 NIC for WAN
LAN Switch WANWAN 4 Compute Nodes will communicate via LAN Switch. Only Manage Node have Internet Access for Security! Compute Nodes Manage Node
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICH BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Compute Nodes BasicSystemSetupforClusterBasicSystemSetupforCluster
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICHOpenPBSOpenPBS BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Job Mgnt. NFSNFS File Sharing Ex tra On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node
Research topics about PC Cluster Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences SystemArchitectureSystemArchitecture ParallelComputingParallelComputing ParallelAlgorithmsAndApplicationsParallelAlgorithmsAndApplications ProcessArchitectureProcessArchitecture NetworkArchitectureNetworkArchitecture StorageArchitectureStorageArchitecture System-levelMiddlewareSystem-levelMiddleware Share Memory Programming Programming Distributed Memory Programming Programming Application-level Middleware Programming Application-level
Challenges of Cluster Computing ● Hardware – Ethernet Speed / PC Density – Power / Cooling / Heat – Network and Storage Architecture ● Software – Job Scheduler ( Cluster level ) – Account Management – File Sharing / Package Management ● Limitation – Shared Memory – Global Memory Management
Common Method to deploy Cluster 1. Setup one Templatemachine Templatemachine 2. Cloning tomultiplemachine tomultiplemachine 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark
Challenges of Common Method Upgrade Software ? Add New User Account ? Configuration Syncronization How to share user data ?
How to deploy Nodes ????
Advanced Methods to deploy Cluster ● SSI ( Single System Image ) – Multiple PCs as Single Computing Resources – Image-based ● homogeneous ● ex. SystemImager, OSCAR, Kadeploy – Package-based ● heterogeneous ● easy update and modify packages ● ex. FAI, DRBL ● Other deploy tools – Rocks : RPM only – cfengine : configuration engine
Comparison of Cluster Deploy Tools
Hadoop Deployment Tool Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2-1 :
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
Source: Deploying hadoop with smartfrog Source: Deploying hadoop with smartfrog
工商服務時間工商服務時間企鵝龍與再生龍企鵝龍與再生龍 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2-2 :
● D iskless R emote B oot in L inux ● 網路是便宜的,人的時間才是昂貴的。 ● 企鵝龍簡單來說就是..... – 用網路線取代硬碟排線 – 所有學生的電腦都透過網路連接到一台伺服器主機 = = Server Diskles s PC Diskles s PC source: source: Diskful l PC Diskful l PC 何謂企鵝龍 DRBL ??
何謂再生龍 Clonezilla ?? ● Clone ( 複製 ) + zilla = Clonezilla ( 再生龍 ) ● 裸機備分還原工具 ● Norton Ghost 的自由軟體版替代方案 Disk to Disk Image to N Disks DisktoImageDisktoImage
需分別處理設定 ( 每班約 40 台 ) 如:電腦中毒、環境設定 系統操作問題、開關機、 備份還原等 教師 1 人維護管理多組設備 教學同時分派或收集作業 需要「化繁為簡」的解決方案! 一般國內小學的電腦教室 人力、時間成本高 設備維護成本高 降低資訊教育管理成本降低資訊教育管理成本
知識和軟體都需要讓孩子「帶著走」! 在校學習,也需回家複習 學校每台 ( 平均 ) 2 萬 學生家用 ( 平均 ) 4 萬 教育知識,也需教育尊重 尊重智財權觀念 商業軟體授權高成本 知識與法治的學習 平衡商業軟體與知識教育平衡商業軟體與知識教育
以個人叢集電腦 (PC Cluster) 經驗發展 DRBL&Clonezilla 多元化資訊教學的新選擇! 企鵝龍 DRBL 再生龍 Clonezilla 適用完整系統備份、裸機 還原或災難復原 是自由!不是免費 … 分送、修改、存取、使用軟體的自由。免費是附加價值。 適合將整個電腦教室轉換 成純自由軟體環境 (Diskless Remote Boot in Linux ) 國網中心自由軟體開發國網中心自由軟體開發
電腦教室管理的新利器! ■ 以每班 40 台電腦為估算單位 企鵝龍 DRBL 與再生龍 Clonezilla
節省龐大軟體授權費 降低台灣盜版率 提升台灣形象 降低管理維護成本 帶動自由軟體使用 節樽軟體授權成本 ( 估計 ) NT. 98,595,000 元 以某商業獨家軟體每機 3000 元授權費計, 每班 35 台電腦 (3000*35*939) 教育單位採用 DRBL 高速計算研究 資料儲存備援 擴至全國各單位
企鵝龍的開機原理企鵝龍的開機原理 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 1-3 :
1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian,... 1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian,... Linux Kernel Kernel Module GNU Libc Boot Loader
2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... 2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... DHCP D TFTP D NFSNFS BashBashPerlPerl Network Booting YPYPNISNIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! DRBL Server based on existing Open Source and keep Hacking! SSH D Linux Kernel Kernel Module GNU Libc Boot Loader
pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTP D NFSNFSYPYPNISNIS SSH D
BIOS PXE 3nd, We enable PXE function in BIOS configuration. 3nd, We enable PXE function in BIOS configuration. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTP D NFSNFSYPYPNISNIS SSH D
BIOS PXE While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTP D NFSNFSYPYPNISNIS SSH D DHCP D
IP 1 IP 2 IP 3 IP 4 While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTP D NFSNFSYPYPNISNIS SSH D DHCP D
IP 1 IP 2 IP 3 IP 4 After PXE get its IP address, it will download booting files from TFTPD. Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D
IP 1 IP 2 IP 3 IP 4 Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd
Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. NFSNFS
Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTP D Config. Files Ex. hostname Config. Files Ex. hostname NFSNFS Config. 1 Config. 2 Config. 3 Config. 4
DRBL Server YPYPNISNIS DHCP D TFTP D NFSNFS BashBashPerlPerl SSH D BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD BashBash PerlPerl SSHDSSHD Applications and Services will also deployed to each Compute Node via NFS.... Applications and Services will also deployed to each Compute Node via NFS....
DRBL Server DHCP D TFTP D With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFSNFS SSH D YPYPNISNIS SSHDSSHDSSHDSSHDSSHDSSHDSSHDSSHD SSH Client
Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2 -1: 當企鵝龍遇上小飛象當企鵝龍遇上小飛象
使用 DRBL 佈署 Hadoop ● 仍在開發中,待整理套件 ● drbl-hadoop – 掛載本機硬碟給 HDFS 用 svn co ● hadoop-register – 註冊網站與 ssh applet svn co
關於 hadoop.nchc.org.tw ● DRBL Server - 1 台 (hadoop) ,加大 /home 與 /tftpboot 空間。 ● DRBL Client - 19 台 (hadoop101~hadoop119) ● 使用 Cloudera 的 Debian 套件 ● 使用 drbl-hadoop 的設定跟 init.d script 來協助部署 ● 使用 hadoop-register 來提供使用者註冊與 ssh applet 介面
Lesson Learn ● Cloudera 套件的好處:使用 init.d script 來啟動關閉 – name node, data node, job tracker, task tracker ● 建立大量帳號: – 可透過 DRBL 內建指令完成 /opt/drbl/sbin/drbl-useradd ● 使用者預設 HDFS 家目錄 – 跑迴圈切換使用者,下 hadoop fs -mkdir tmp ● 設定使用者 HDFS 權限 – 跑迴圈切換使用者,下 hadoop dfs -chown $(id) /usr/$(id) ● HDFS 會使用 /var/lib/hadoop/cache/hadoop/dfs ● MapReduce 會使用 /var/lib/hadoop/cache/hadoop/mapred
Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2 -2: Live Demo
WANWAN DRBL-Live
1. Boot Server with DRBL-Live CD 2. Download DRBL-Hadoop Script Follow the steps 1. Boot Server with DRBL-Live CD 2. Download DRBL-Hadoop Script Follow the steps Demo with DRBL-Live CD
Questions?Questions? Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang