Building Multi-user Hadoop Cluster using DRBL & Clonezilla hadoop.nchc.org.tw營運經驗分享 Building Multi-user Hadoop Cluster using DRBL & Clonezilla Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
WHO AM I ? 這傢伙是誰啊?JAZZ? 講者介紹: 所有投影片、參考資料與操作步驟均在網路上 國網中心 王耀聰 副研究員/交大電控碩士 jazz@nchc.org.tw 所有投影片、參考資料與操作步驟均在網路上 由於雲端資訊變動太快,愛護地球,請減少不必要之列印。 行動力薄弱的開發者 DRBL/Clonezilla Hadoop Ecosystem FLOSS使用者 Debian/Ubutnu Access Grid Motion/VLC Red5 Debian Router DRBL/Clonezilla Hadoop 推廣者 DRBL/Clonezilla Partclone/Tuxboot Hadoop Ecosystem
運用企鵝龍打造多人Hadoop叢集 PART 1 : 叢集佈署工具簡介:企鵝龍與聰明蛙 PART 2 : 運用企鵝龍佈署資料探勘平台的經驗分享 - PaaS : Data Processing (DRBL-Hadoop) PART 3 : 運用再生龍從小硬碟搬家到大硬碟
Introduction to SSI and CMT : DRBL & SmartFrog 叢集佈署工具簡介:企鵝龍與聰明蛙 Introduction to SSI and CMT : DRBL & SmartFrog Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
Programmer v.s. System Admin. Source:http://www.funnyjunksite.com/wp-content/uploads/2007/08/programmer.jpg Source: http://www.sysadminday.com/images/people/136-3697.JPG
傳統實驗室佈署電腦叢集的方法 3. Configure 2. Cloning 4. Install Benchmark Settings ↓ Job Scheduler 5. Running Benchmark 2. Cloning to multiple machine 1. Setup one Template machine
Configuration Syncronization 傳統方式容易面臨的叢集管理問題 Add New User Account ? Upgrade Software ? How to share user data ? Configuration Syncronization
萬一您要佈署四千台以上的叢集呢??
進階叢集佈署工具 SSI ( Single System Image ) Multiple PCs as Single Computing Resources Image-based homogeneous ex. SystemImager, OSCAR, Kadeploy Package-based heterogeneous easy update and modify packages ex. FAI, DRBL Other deploy tools Rocks : RPM only cfengine : configuration engine
叢集佈署工具比較表 System Imager ALL Yes Image No OSCAR RPM-based Kadeploy FAI Distribution Support Diskless/ Sysmless Type Node configuration tools Cluster management Database installation System Imager ALL Yes Image No OSCAR RPM-based Kadeploy DRBL Package FAI Debian-Based
國網中心企鵝龍(DRBL)簡介 = + + Server Diskless Remote Boot in Linux Diskfull PC 網路是便宜的,人的時間才是昂貴的。 企鵝龍簡單來說就是..... 用網路線取代硬碟排線 所有學生的電腦都透過網路連接到一台伺服器主機 Diskfull PC = + + Diskless PC Server source: http://www.mren.com.tw
惠普實驗室的聰明蛙(SmartFrog) Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf
Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf
Installation and Booting Procedure of DRBL 企鵝龍的開機原理 Installation and Booting Procedure of DRBL Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
Redhat, Fedora, CentOS, Mandriva, 1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian, ... Linux Kernel Kernel Module GNU Libc Boot Loader
2nd, We install DRBL package and configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server ... DHCPD TFTPD NFS Bash Perl Network Booting YP NIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! SSHD Linux Kernel Kernel Module GNU Libc Boot Loader
After running “drblsrv -i” & “drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader
3nd, We enable PXE function in BIOS configuration. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader
While Booting, PXE will query IP address from DHCPD. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader
While Booting, PXE will query IP address from DHCPD. IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader
After PXE get its IP address, it will download booting files from TFTPD. NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux
pxelinux vmlinuz initrd IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux
initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. initrd-pxe vmlinuz-pxe pxelinux
Config. 1 Config. 2 Config. 3 Config. 4 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux
Applications and Services will also deployed to each Compute Node Bash Perl SSHD Applications and Services will also deployed to each Compute Node via NFS .... NFS TFTPD DHCPD SSHD NIS YP Perl Bash DRBL Server
With the help of NIS and YP, You can login each Compute Node SSHD SSH Client With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFS TFTPD DHCPD SSHD NIS YP DRBL Server
Building Multi-user Hadoop Cluster using DRBL 運用企鵝龍佈署資料探勘平台的經驗分享 Building Multi-user Hadoop Cluster using DRBL Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
關於hadoop.nchc.org.tw 加大/home與/tftpboot空間。 使用Cloudera的Debian套件 DRBL Server – 1台(hadoop), 加大/home與/tftpboot空間。 DRBL Client – 20台 (hadoop101~hadoop120) 使用Cloudera的Debian套件 使用drbl-hadoop 的設定 跟init.d script來協助部署 使用hadoop-register 來提供 使用者註冊與ssh applet介面
DRBL+Hadoop=Haduzilla 黑肚龍系統架構
使用DRBL佈署Hadoop 仍在開發中,待整理套件 drbl-hadoop – 掛載本機硬碟給 HDFS 用 svn co http://trac.nchc.org.tw/pub/grid/drbl-hadoop-0.1/ hadoop-register – 註冊網站與ssh applet svn co http://trac.nchc.org.tw/pub/cloud/hadoop-register
使用者註冊頁面 Hadoop-Register Powered by Zterm http://zhouer.org/ZTerm/
系統狀態監控 Ganglia 採用自由軟體Ganglia來蒐集電腦叢集的負載狀態 http://ganglia.sourceforge.net/
經驗分享(Lesson Learn) Cloudera套件的好處:使用init.d script 來啟動關閉 name node, data node, job tracker, task tracker 建立大量帳號: 可透過DRBL內建指令完成 /opt/drbl/sbin/drbl-useradd 使用者預設HDFS家目錄 跑迴圈切換使用者,下 hadoop fs -mkdir tmp 設定使用者HDFS權限 跑迴圈切換使用者,下 hadoop dfs -chown $(id) /usr/$(id) HDFS會使用/var/lib/hadoop/cache/hadoop/dfs MapReduce會使用/var/lib/hadoop/cache/hadoop/mapred
雛型開機光碟 DRBL-Hadoop Live CD 舊影片:http://www.youtube.com/watch?hl=en&v=Ix4WigGvE_A 下載點:http://drbl-hadoop.sf.net
Hadoop Cluster disk migration using Clonezilla 運用再生龍從小硬碟搬家到大硬碟 Hadoop Cluster disk migration using Clonezilla Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw
何謂再生龍Clonezilla ?? Clone (複製) + zilla = Clonezilla (再生龍) 裸機備分還原工具 Norton Ghost 的自由軟體版替代方案 http://clonezilla.nchc.org.tw , http://clonezilla.org Disk to Disk Disk to Image Image to N Disks
您也用得上的再生龍功能!! 我要怎樣才能把小一點的硬碟複製到大一點的硬碟上? http://drbl.nchc.org.tw/fine-print.php?path=./faq/1_DRBL_common/34_resize.faq#34_resize.faq
Attribution-Noncommercial-Share Alike 3.0 Taiwan http://creativecommons.org/licenses/by-nc-sa/3.0/tw/ These slides could be distributed by Creative Commons License.
Slides - http://trac.nchc.org.tw/cloud Questions? Slides - http://trac.nchc.org.tw/cloud Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw