Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop.

Similar presentations


Presentation on theme: "Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop."— Presentation transcript:

1 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw

2 2 WHO AM I ? 這傢伙是誰啊? JAZZ ? 講者介紹: – Jazz Yao-Tsung Wang @ NCHC / NCTU ECE Master – 國網中心 王耀聰 副研究員 / 交大電控八九級碩士 – jazz@nchc.org.tw jazz@nchc.org.tw 所有投影片、參考資料與操作步驟均在網路上 All the slides could be found at – http://trac.nchc.org.tw/cloud http://trac.nchc.org.tw/cloud FOSS End User FOSS 使用者 Debian/Ubutnu Access Grid Motion/VLC Red5 Debian Router DRBL/Clonezilla Hadoop FOSS Promoter 自由軟體推廣者 DRBL/Clonezilla Partclone/Tuxboot Hadoop Ecosystem FOSS Developer 行動力薄弱的開發者 TRTC WSU/ Hadop4Win / Haduzilla / Ezilla

3 3 Data Explosion!! 始於 2007 的「資料大爆炸」時代 出處: The Expanding Digital Universe, A Forecast of Worldwide Information Growth Through 2010, March 2007, An IDC White Paper - sponsored by EMC http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf 2007 年, IDC 預估 2010 年會成長六倍! (相較 2006 年) 2006 161 EB 2010 988 EB ( 預測 )

4 4 出處: Extracting Value from Chaos, June 2011, An IDC White Paper - sponsored by EMC http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf 追蹤歷年的 IDC 數據: 2006 161 EB 2007 281 EB 2008 487 EB 2009 800 EB (0.8 ZB) 2010 988 EB ( 預測 ) 2010 1200 EB (1.2 ZB) 2011 1773 EB ( 預測 ) 2011 1800 EB (1.8 ZB) Digital Universe expanded 1.6x each year!! 每年約 1.6 倍 景氣差而成長趨緩? 或受新技術抑制?

5 5 Now we all need to store and process BIG DATA!!

6 6

7 7 Features of Hadoop... Hadoop 這套軟體的特色是... 海量 Vast Amounts of Data – 擁有儲存與處理大量資料的能力 – Capability to STORE and PROCESS vast amounts of data. 經濟 Cost Efficiency – 可以用在由一般 PC 所架設的叢集環境內 – Based on large clusters built of commodity hardware. 效率 Parallel Performance – 透過分散式檔案系統的幫助,以致得到快速的回應 – With the help of HDFS, Hadoop have better performance. 可靠 Robustness – 當某節點發生錯誤,能即時自動取得備份資料及佈署運算資源 – Robustness to add and remove computing and storage resource without shutdown entire system.

8 8 Which companies are powered by Hadoop ?? 有哪些公司在用 Hadoop 這套軟體 ?? Yahoo is the key contributor currently. IBM and Google teach Hadoop in universities … http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth) – from http://en.wikipedia.org/wiki/Hadoophttp://en.wikipedia.org/wiki/Hadoop http://wiki.apache.org/hadoop/AmazonEC2 http://wiki.apache.org/hadoop/PoweredBy Facebook Tweeter

9 9 Hadoop in production run.... 商業運轉中的 Hadoop 應用.... February 19, 2008 Yahoo! Launches World's Largest Hadoop Production Application http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html

10 10 You can store and process BIG DATA via Large Cluster!!

11 Common method to deploy Cluster in Labs 1. Setup one Templatemachine Templatemachine 2. Cloning tomultiplemachine tomultiplemachine 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark 3. Configure Settings↓ 4. Install JobScheduler↓ 5. Running Benchmark

12 Challenges of common method in Labs Upgrade Software ? Add New User Account ? Configuration Syncronization How to share user data ?

13 How to deploy 4000+ Nodes ?!

14 Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

15 Source: Deploying hadoop with smartfrog http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf

16 If you need to deploy in Cloud - try Puppet 如果要在 Amazon EC2 上佈署 Hadoop 等軟體,可以考慮 Puppet 因為作業系統已由虛擬機器的範本裝好了,只能用「有碟」的作法! https://github.com/hstack/puppet http://hstack.org/hstack-automated-deployment-using-puppet/ http://www.cioinsight.com/images/stories/slideshows/SS_142511_CIO_TechSkills/

17 17 Can I install ONE server to deploy hadoop cluster ?

18 Yes, use DRBL to deploy Hadoop Need to build new debian packages drbl-hadoop – Mounting local disk for HDFS and MapReduce svn co http://trac.nchc.org.tw/pub/grid/drbl-hadoop-0.1/ hadoop-register – for multiuser registration and ssh client svn co http://trac.nchc.org.tw/pub/cloud/hadoop-register

19 About hadoop.nchc.org.tw DRBL Server x 1 Node (hadoop) DRBL Client x 20 Nodes (hadoop101~hadoop120) Powered by Debian Squeeze 6.0.4

20 使用者註冊頁面 Hadoop-Register Powered by Zterm http://zhouer.org/ZTerm/

21 系統狀態監控 Ganglia 採用自由軟體 Ganglia 來蒐集電腦叢集的負載狀態 http://ganglia.sourceforge.net/

22 DRBL+Hadoop=Haduzilla 黑肚龍系統架構

23 23 Can you help me to deploy my own multiuser hadoop cluster like hadoop.nchc.org.tw ?

24 In Year 2009, I released DRBL- Hadoop Live CD 舊影片: http://www.youtube.com/watch?hl=en&v=Ix4WigGvE_A http://www.youtube.com/watch?hl=en&v=Ix4WigGvE_A 下載點: http://drbl-hadoop.sf.net http://drbl-hadoop.sf.net

25 25 But I want it installed to disks for production …. What should I do ?

26 On 11 Feb 2011, 4$ shared about preseed! Source: http://fourdollars.blogspot.tw/2011/02/4-debian-60.htmlhttp://fourdollars.blogspot.tw/2011/02/4-debian-60.html 感謝 4$ 大大分享 Debian 6.0 自動化安裝

27 1st, We install Base System of GNU Debian Linux with Debian Installer and Preseed …... According to http://example.com/d- i/squeeze/preseed.cfg It will install (1) Base Packages of Debian 6.0.4 (2) DRBL, JVM, Hadoop, etc.... (3) Run late_command script 1st, We install Base System of GNU Debian Linux with Debian Installer and Preseed …... According to http://example.com/d- i/squeeze/preseed.cfg It will install (1) Base Packages of Debian 6.0.4 (2) DRBL, JVM, Hadoop, etc.... (3) Run late_command script Linux Kernel Kernel Module GNU Libc Boot Loader Debian Netinst CD

28 After reboot, we had installed DRBL package and rc.local script will configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... After reboot, we had installed DRBL package and rc.local script will configure it as DRBL Server. There are lots of service needed: SSHD, DHCPD, TFTPD, NFS Server, NIS Server, YP Server... DHCP D TFTPDTFTPDNFSNFS BashBashPerlPerl Network Booting YPYPNISNIS Account Mgnt. DRBL Server based on existing Open Source and keep Hacking! DRBL Server based on existing Open Source and keep Hacking! SSH D JVMJVMHadoopHadoopApacheApacheGangliaGanglia DRBL Server Hadoop Server Linux Kernel Kernel Module GNU Libc Boot Loader

29 pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname rc.local script will run “drblsrv” & “drblpush”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each DRBL Client in NFSROOT rc.local script will run “drblsrv” & “drblpush”, there will be pxelinux, vmlinux-pex, initrd-pxe in TFTPROOT, and different configuration files for each DRBL Client in NFSROOT Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTPDTFTPDNFSNFSYPYPNISNIS SSH D

30 BIOS PXE 3nd, We enable PXE function in BIOS configuration. 3nd, We enable PXE function in BIOS configuration. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader DHCP D TFTPDTFTPDNFSNFSYPYPNISNIS SSH D

31 BIOS PXE While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTPDTFTPDNFSNFSYPYPNISNIS SSH D DHCP D

32 IP 1 IP 2 IP 3 IP 4 While Booting, PXE will query IP address from DHCPD. While Booting, PXE will query IP address from DHCPD. pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader TFTPDTFTPDNFSNFSYPYPNISNIS SSH D DHCP D

33 IP 1 IP 2 IP 3 IP 4 After PXE get its IP address, it will download booting files from TFTPD. Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD

34 IP 1 IP 2 IP 3 IP 4 Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader NFSNFSYPYPNISNIS SSH D DHCP D pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd

35 Config. Files Ex. hostname Config. Files Ex. hostname Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD After downloading booting files, scripts in initrd-pxe will config NFSROOT for each Compute Node. NFSNFS

36 Linux Kernel Kernel Module GNU Libc Boot Loader YPYPNISNIS SSH D DHCP D initrdinitrdinitrdinitrdinitrdinitrd IP 1 IP 2 IP 3 IP 4 pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz pxelinuxpxelinux vmlinuzvmlinuz initrdinitrd pxelinuxpxelinux vmlinuz-pxevmlinuz-pxe initrd-pxeinitrd-pxe TFTPDTFTPD Config. Files Ex. hostname Config. Files Ex. hostname NFSNFS Config. 1 Config. 2 Config. 3 Config. 4

37 DRBL Server YPYPNISNIS DHCP D TFTPDTFTPDNFSNFS HadoopHadoopJVMJVM SSH D JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD JVMJVM HadoopHadoop SSHDSSHD Applications and Services will also deployed to each Compute Node via NFS.... Applications and Services will also deployed to each Compute Node via NFS....

38 DRBL Server DHCP D TFTPDTFTPD With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! With the help of NIS and YP, You can login each Compute Node with the Same ID / PASSWORD stored in DRBL Server! NFSNFS SSH D YPYPNISNIS SSHDSSHDSSHDSSHDSSHDSSHDSSHDSSHD SSH Client

39 Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw DemoDemo

40 WANWAN Debian netinst CD tap0 eth0:1 eth0 iptables

41 Attribution-Noncommercial-Share Alike 3.0 Taiwan http://creativecommons.org/licenses/by-nc-sa/3.0/tw/ These slides could be distributed by Creative Commons License.

42 Questions? Slides - http://trac.nchc.org.tw/cloud http://trac.nchc.org.tw/cloud Questions? Slides - http://trac.nchc.org.tw/cloud http://trac.nchc.org.tw/cloud Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw Jazz Wang Yao-Tsung Wang jazz@nchc.org.tw


Download ppt "Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop 叢集 Haduzilla - Building hadoop cluster with Debian preseed 黑肚龍:無人值守自動安裝 Hadoop."

Similar presentations


Ads by Google