Unit 06 雲端分散式Hadoop實驗 -II M. S. Jian Department of Computer Science and Information Engineering National Formosa University Yunlin, Taiwan, ROC
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 首先要進行Java相關安裝 由於Java相關套件並不一定直接做為自由軟體包裝於Ubuntu的軟體庫中,因此需要針對套件庫作內容連結的更新 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 首先針對套件來源路徑更新 sudo gedit(或vi) /etc/apt/sources.list 將出現所有套件路徑中的 tw.achieve.com 置換成ubuntu.stu.edu.tw 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 安裝Java sudo apt-get purge java-gcj-compat sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" sudo apt-get update 以上會因為Ubuntu不同而有 不同的安裝結果訊息 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 安裝Java(2) sudo apt-get install sun-java6-jdk sun-java6-plugin sudo update-java-alternatives -s java-6-sun ----------------以上為sun Java安裝------- sudo apt-get install openjava-6-jdk ----------------以上為OpenJava安裝----- 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 確認Java版本 安裝結束後可以確認Java版本 java –version OpenJava 須為6以上版本 Sun Java須為1.6以上版本 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 安裝通訊連線 接著安裝ssh & rsync sudo apt-get install ssh rsync Hadoop在不同機器上運作時 會使用的通訊連線方式 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 若已經是超級使用者 可以不使用sudo指令 先進入opt路徑 cd /opt 下載Hadoop套件 sudo wget http://apache.stu.edu.tw//hadoop/common/hadoop-0.20.203.0/ hadoop-0.20.203.0rc1.tar.gz 解壓縮 sudo tar zxvf hadoop-0.20.203.0rc1.tar.gz 移動資料夾 sudo mv hadoop-0.20.203.0/ hadoop 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 設定群組與使用者 若已經是超級使用者 可以不使用sudo指令 創建群組 sudo addgroup hadoop 設定一個使用者名稱是Hadoop,在Hadoop群組下 sudo adduser --ingroup hadoop hadoop 更改資料夾權限 sudo chown -R hadoop:hadoop hadoop 或 chown -R hadoop /opt/hadoop 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 若已經是超級使用者 可以不使用sudo指令 建立資料夾 sudo mkdir /var/hadoop 更改權限 sudo chown -R hadoop:hadoop /var/hadoop 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
以Hadoop身分登入 su – hadoop 於opt路徑中,修改設定 sudo gedit(或vi) hadoop/conf/hadoop-env.sh 新增設定 export JAVA_HOME=/usr/lib/jvm/java-6-sun export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=/opt/hadoop/conf 若使用OpenJava 路徑更改為 /usr/lib/jvm/java-6-openjdk 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 修改core-site.xml sudo gedit(或vi) hadoop/conf/core-site.xml <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <name>hadoop.tmp.dir</name> <value>/var/hadoop/hadoop-${user.name}</value> </configuration> 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 修改hdfs-site.xml sudo gedit(或vi) hadoop/conf/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 修改mapred-site.xml sudo gedit(或vi) hadoop/conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 在Hadoop身分下,到/opt/hadoop路徑啟動 格式化 namenode bin/hadoop namenode -format 啟動Hadoop服務 bin/start-all.sh 查看啟動結果 jps 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Hadoop與MapReduce範例操作 wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/nchc-example.jar 測試 bin/hadoop jar nchc-example.jar bin/hadoop jar nchc-example.jar hello 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
實際練習Hadoop與MapReduce Word Count 初級 mkdir lab4_input echo “I like NCHC Cloud Course.” > lab4_input/input1 echo “I like nchc Cloud Course, and we enjoy this course.” > lab4_input/input2 bin/hadoop fs -put lab4_input lab4_input bin/hadoop fs -ls lab4_input 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 執行Word Count範例 於 wget http://trac.nchc.org.tw/cloud/raw-attachment/wiki/Hadoop_Lab4/WordCount.java Mkdir MyJava javac –classpath hadoop-core-0.20.203.0.jar -d MyJava WordCount.java jar -cvf wordcount.jar -C MyJava . bin/hadoop jar wordcount.jar WordCount lab4_input/ lab4_out1/ bin/hadoop fs -cat lab4_out1/part-r-00000 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II Web網頁型態界面 http://localhost:50030/ - Hadoop 管理介面 http://localhost:50060/ - Hadoop Task Tracker 狀態 http://localhost:50070/ - Hadoop DFS 狀態 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II
Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II 範例 11/22/2018 Cloud Operating System - Unit 06: 雲端分散式Hadoop實驗 -II