Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Multi-user Hadoop Cluster using DRBL & Clonezilla

Similar presentations


Presentation on theme: "Building Multi-user Hadoop Cluster using DRBL & Clonezilla"— Presentation transcript:

1 Building Multi-user Hadoop Cluster using DRBL & Clonezilla
hadoop.nchc.org.tw營運經驗分享 Building Multi-user Hadoop Cluster using DRBL & Clonezilla Jazz Wang Yao-Tsung Wang

2 WHO AM I ? 這傢伙是誰啊?JAZZ? 講者介紹: 所有投影片、參考資料與操作步驟均在網路上
國網中心 王耀聰 副研究員/交大電控碩士 所有投影片、參考資料與操作步驟均在網路上 由於雲端資訊變動太快,愛護地球,請減少不必要之列印。 行動力薄弱的開發者 DRBL/Clonezilla Hadoop Ecosystem FLOSS使用者 Debian/Ubutnu Access Grid Motion/VLC Red5 Debian Router DRBL/Clonezilla Hadoop 推廣者 DRBL/Clonezilla Partclone/Tuxboot Hadoop Ecosystem

3 運用企鵝龍打造多人Hadoop叢集 PART 1 : 叢集佈署工具簡介:企鵝龍與聰明蛙 PART 2 :
運用企鵝龍佈署資料探勘平台的經驗分享 - PaaS : Data Processing (DRBL-Hadoop) PART 3 : 運用再生龍從小硬碟搬家到大硬碟

4 Introduction to SSI and CMT : DRBL & SmartFrog
叢集佈署工具簡介:企鵝龍與聰明蛙 Introduction to SSI and CMT : DRBL & SmartFrog Jazz Wang Yao-Tsung Wang

5 Programmer v.s. System Admin.
Source: Source:

6 傳統實驗室佈署電腦叢集的方法 3. Configure 2. Cloning 4. Install Benchmark Settings ↓
Job Scheduler 5. Running Benchmark 2. Cloning to multiple machine 1. Setup one Template machine

7 Configuration Syncronization
傳統方式容易面臨的叢集管理問題 Add New User Account ? Upgrade Software ? How to share user data ? Configuration Syncronization

8 萬一您要佈署四千台以上的叢集呢??

9 進階叢集佈署工具 SSI ( Single System Image )
Multiple PCs as Single Computing Resources Image-based homogeneous ex. SystemImager, OSCAR, Kadeploy Package-based heterogeneous easy update and modify packages ex. FAI, DRBL Other deploy tools Rocks : RPM only cfengine : configuration engine

10 叢集佈署工具比較表 System Imager ALL Yes Image No OSCAR RPM-based Kadeploy FAI
Distribution Support Diskless/ Sysmless Type Node configuration tools Cluster management Database installation System Imager ALL Yes Image No OSCAR RPM-based Kadeploy DRBL Package FAI Debian-Based

11 國網中心企鵝龍(DRBL)簡介 = + + Server Diskless Remote Boot in Linux Diskfull PC
網路是便宜的,人的時間才是昂貴的。 企鵝龍簡單來說就是..... 用網路線取代硬碟排線 所有學生的電腦都透過網路連接到一台伺服器主機 Diskfull PC = + + Diskless PC Server source:

12 惠普實驗室的聰明蛙(SmartFrog)
Source: Deploying hadoop with smartfrog

13 Source: Deploying hadoop with smartfrog

14 Installation and Booting Procedure of DRBL
企鵝龍的開機原理 Installation and Booting Procedure of DRBL Jazz Wang Yao-Tsung Wang

15 Redhat, Fedora, CentOS, Mandriva,
1st, We install Base System of GNU/Linux on Management Node. You can choose: Redhat, Fedora, CentOS, Mandriva, Ubuntu, Debian, ... Linux Kernel Kernel Module GNU Libc Boot Loader

16 2nd, We install DRBL package and configure it as DRBL Server.
There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server ... DHCPD TFTPD NFS Bash Perl Network Booting YP NIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! SSHD Linux Kernel Kernel Module GNU Libc Boot Loader

17 After running “drblsrv -i” &
“drblpush -i”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each Compute Node in NFSROOT NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

18 3nd, We enable PXE function in
BIOS configuration. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

19 While Booting, PXE will query
IP address from DHCPD. BIOS PXE BIOS PXE BIOS PXE BIOS PXE NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

20 While Booting, PXE will query
IP address from DHCPD. IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP pxelinux vmlinuz-pxe initrd-pxe Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader

21 After PXE get its IP address, it will download booting files from TFTPD.
NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

22 pxelinux vmlinuz initrd IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

23 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. initrd-pxe vmlinuz-pxe pxelinux

24 Config. 1 Config. 2 Config. 3 Config. 4 initrd initrd initrd initrd vmlinuz vmlinuz vmlinuz vmlinuz pxelinux pxelinux pxelinux pxelinux IP 1 IP 2 IP 3 IP 4 NFS TFTPD DHCPD SSHD NIS YP Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader initrd-pxe vmlinuz-pxe pxelinux

25 Applications and Services will also deployed to each Compute Node
Bash Perl SSHD Applications and Services will also deployed to each Compute Node via NFS .... NFS TFTPD DHCPD SSHD NIS YP Perl Bash DRBL Server

26 With the help of NIS and YP, You can login each Compute Node
SSHD SSH Client With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFS TFTPD DHCPD SSHD NIS YP DRBL Server

27 Building Multi-user Hadoop Cluster using DRBL
運用企鵝龍佈署資料探勘平台的經驗分享 Building Multi-user Hadoop Cluster using DRBL Jazz Wang Yao-Tsung Wang

28 關於hadoop.nchc.org.tw 加大/home與/tftpboot空間。 使用Cloudera的Debian套件
DRBL Server – 1台(hadoop), 加大/home與/tftpboot空間。 DRBL Client – 20台 (hadoop101~hadoop120) 使用Cloudera的Debian套件 使用drbl-hadoop 的設定 跟init.d script來協助部署 使用hadoop-register 來提供 使用者註冊與ssh applet介面

29 DRBL+Hadoop=Haduzilla 黑肚龍系統架構

30 使用DRBL佈署Hadoop 仍在開發中,待整理套件 drbl-hadoop – 掛載本機硬碟給 HDFS 用
svn co hadoop-register – 註冊網站與ssh applet svn co

31 使用者註冊頁面 Hadoop-Register
Powered by Zterm

32 系統狀態監控 Ganglia 採用自由軟體Ganglia來蒐集電腦叢集的負載狀態

33 經驗分享(Lesson Learn) Cloudera套件的好處:使用init.d script 來啟動關閉
name node, data node, job tracker, task tracker 建立大量帳號: 可透過DRBL內建指令完成 /opt/drbl/sbin/drbl-useradd 使用者預設HDFS家目錄 跑迴圈切換使用者,下 hadoop fs -mkdir tmp 設定使用者HDFS權限 跑迴圈切換使用者,下 hadoop dfs -chown $(id) /usr/$(id) HDFS會使用/var/lib/hadoop/cache/hadoop/dfs MapReduce會使用/var/lib/hadoop/cache/hadoop/mapred

34 雛型開機光碟 DRBL-Hadoop Live CD
舊影片: 下載點:

35 Hadoop Cluster disk migration using Clonezilla
運用再生龍從小硬碟搬家到大硬碟 Hadoop Cluster disk migration using Clonezilla Jazz Wang Yao-Tsung Wang

36 何謂再生龍Clonezilla ?? Clone (複製) + zilla = Clonezilla (再生龍) 裸機備分還原工具
Norton Ghost 的自由軟體版替代方案 , Disk to Disk Disk to Image Image to N Disks

37 您也用得上的再生龍功能!! 我要怎樣才能把小一點的硬碟複製到大一點的硬碟上?

38 Attribution-Noncommercial-Share Alike 3.0 Taiwan
These slides could be distributed by Creative Commons License.

39 Slides - http://trac.nchc.org.tw/cloud
Questions? Slides - Jazz Wang Yao-Tsung Wang


Download ppt "Building Multi-user Hadoop Cluster using DRBL & Clonezilla"

Similar presentations


Ads by Google