高速運算於生物資訊之應用 HPC for Bioinformatics 高速運算於生物資訊之應用 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang
高速運算於生物資訊之應用 HPC for Bioinformatics 高速運算於生物資訊之應用 ( 60 % ) HPC = High Performance Computing What is HPC? Types of HPC ? Can I solve my problem with HPC ? ( 30% ) HPC & Bioinformatics Application ( 10% ) Open Source for Bioinformatics PART 1 : PART 2 : PART 3 :
HPC 101 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 1 :
What is HPC ? Why HPC ? &
Source:
Source:
Source:
Source:
Source:
Types of HPC ?
Back to Year 1960s... Source:
Brief History of Computing (1/5) Mainframe MainframeSuperComputer SuperComputer Source: PDP PDP st Unix 1960 PDP PDP st Unix
Evolution of Computing Architecture (1/5) Mainframe MainframeSuperComputer SuperComputer Single Super Computer Single MultipleUsersMultipleUsersSingleCPUSingleCPUSharedMemorySharedMemoryOneAdmin.OneAdmin.
使用者心裡的『謎之聲』 (1/5) 可惡,程式又死掉了,又得重排一次可惡,程式又死掉了,又得重排一次 等執行程式,要排隊排好久喔 ~ 超級電腦是有錢人才玩得起的玩具~超級電腦是有錢人才玩得起的玩具~ 真希望自己有一台電腦可以跑 !!
Back to Year 1970s Apple II 1981 IBM 1 st PC 5150
Back to Year 1980s TCP/IP 1983 GNU 1991 Linux
Brief History of Computing (2/5) Mainframe MainframeSuperComputer SuperComputer PC / Linux PC / LinuxClusterParallel ClusterParallel Source:
Multiple PC in One Location Multiple PC in One Location Evolution of Computing Architecture (2/5) MultipleUsersMultipleUsers Separat e CPU CPUSeparateMemorySeparateMemory OneAdmin.OneAdmin. Mainframe MainframeSuperComputer SuperComputer PC / Linux PC / LinuxClusterParallel ClusterParallel
使用者心裡的『謎之聲』 (2/5) 奇怪,我的程式為什麼不能跑?奇怪,我的程式為什麼不能跑? 可惡,記憶體不夠大,程式又死掉了可惡,記憶體不夠大,程式又死掉了 管理員老大,可以幫我裝 LiBT 嗎? 真希望自己有一組叢集可以跑 !!
Back to Year 1990s World Wide Web by CERN …… 1993 Web Browser Mosaic by NCSA 1990 World Wide Web by CERN …… 1993 Web Browser Mosaic by NCSA 1991 CORBA... Java RMI Microsoft DCOM... Distributed Objects 1991 CORBA... Java RMI Microsoft DCOM... Distributed Objects
Brief History of Computing (3/5) Mainframe MainframeSuperComputer SuperComputer PC / Linux PC / LinuxClusterParallel ClusterParallelInternet Distributed DistributedComputingInternet Computing Source:
Network Evolution of Computing Architecture (3/5) PC / Linux PC / LinuxClusterParallel ClusterParallelInternet Distributed DistributedComputingInternet Computing Single Powerful Server Single Singl e CPU CPU Share d Memo ry Share d Memo ry Single Powerful Server Single Singl e CPU CPU Share d Memo ry Share d Memo ry Multipl e Users Users One Adm in. One Multipl e Users Users One Adm in. One One One Single Broker
使用者心裡的『謎之聲』 (3/5) 啊!網路斷線了~不能動了~啊!網路斷線了~不能動了~ 分散式物件怎麼這麼抽象啊~XD分散式物件怎麼這麼抽象啊~XD 給我網路遊戲,其餘免談!給我網路遊戲,其餘免談! 大家把閒置電腦都貢獻出來吧 !!
2002 Berkley BOINC Back to Year 2000s Volunteer Computing Volunteer Computing Globus Toolkit EGEE gLite
Brief History of Computing (4/5) Mainframe MainframeSuperComputer SuperComputer PC / Linux PC / LinuxClusterParallel ClusterParallelInternet Distributed DistributedComputingInternet Computing Virtual Org. GridComputing GridComputing Source:
Network Evolution of Computing Architecture (4/5) Internet Distributed DistributedComputingInternet Computing Multiple PC in one location Multiple PC in one location Multiple PC in other location Multiple PC in other location Multipl e Users Users One Adm in. One Multipl e Users Users One Adm in. One Grid Middleware Virtual Org. GridComputing GridComputing Virtual Organization HeterogeneousHeterogeneous CyberInfrastructureCyberInfrastructure
使用者心裡的『謎之聲』 (4/5) 啥?可用資源在美國,慢慢搬檔案吧!啥?可用資源在美國,慢慢搬檔案吧! 已給我認證了,為什麼要不到資源?已給我認證了,為什麼要不到資源? 長官,請幫我們去談好資源共享政策吧!長官,請幫我們去談好資源共享政策吧! 為什麼人家 Google 那麼會算 ?!
2005 Utility Computing Amazon EC2 / S Utility Computing Amazon EC2 / S3 Back to Year Autonomic Computing IBM IBM 2007 Cloud Computing Google + IBM 2007 Cloud Computing Google + IBM 2006 Apache Hadoop
Brief History of Computing (5/5) Mainframe MainframeSuperComputer SuperComputer PC / Linux PC / LinuxClusterParallel ClusterParallelInternet Distributed DistributedComputingInternet Computing Virtual Org. GridComputing GridComputing Data Explode CloudComputing CloudComputing Source:
Evolution of Computing Architecture (5/5) Multiple PC in different location s Multiple PC in different location s EachUser|| Virtua l Admin. EachUser|| Virtua l Admin. Multipl e Admin. Admin. Virtual Org. GridComputing GridComputing Data Explode CloudComputing CloudComputing Physical World Virtual World Acces s anytime,anywherewith mobil e device Acces s anytime,anywherewith mobil e device What is NEXT ?! Mobile Computing ?! What is NEXT ?! Mobile Computing ?!
使用者心裡的『謎之聲』 (5/5) 按使用時間計費,真的比較省?按使用時間計費,真的比較省? 雲端運算合適我用嗎?雲端運算合適我用嗎? 我們自己可以架雲端運算的環境嗎?我們自己可以架雲端運算的環境嗎? Google 到底有沒有偷窺我的信 ?!
Source:
Source: Falling to the Ground...
Which Type of HPC is the Right ONE to solve My Problem ? Which Type of HPC is the Right ONE to solve My Problem ?
不負責解析
HPC & Bioinformatics Application Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2 :
BLAST (Basic Local Alignment Search Tool) National Center for Biotechnology Information BLAST is an algorithm for comparing primary biological sequence information. ( BLAST 用來比對生物序列的主要結構 ) amino-acid – the amino-acid sequences of different proteins – the nucleotides of DNA sequences 氨基酸 – ( 例如:不同蛋白質的氨基酸序列 DNA 序列的核甘酸 ) 用途:搜尋其他物種 ( 如:老鼠 ) 未知基因,是否也存在人類基因中 優點:使用啟發式搜索來找出相關的序列,比動態規劃快上 50 倍。 缺點:不能夠保證搜尋到的序列和所要找的序列之間的相關性。 巨大的序列資料庫 技術問題:巨大的序列資料庫需要進行比對,怎樣計算才快? Source: 生物資訊學 )&variant=zh-twhttp://zh.wikipedia.org/w/index.php?title=BLAST_( 生物資訊學 )&variant=zh-tw
Cluster 101 & mpiBLAST Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2.1 :
At First, We have “ ” PC Cluster It'd better be 2 n It'd better be 2 n Manage Schedule r Manage
GiE Switch WANWAN Then, We connect 5 PCs with Gigabit Ethernet Switch Then, We connect 5 PCs with Gigabit Ethernet Switch 10/100/1000MBps10/100/1000MBps Add 1 NIC for WAN Add 1 NIC for WAN
LAN Switch WANWAN 4 Compute Nodes will communicate via LAN Switch. Only Manage Node have Internet Access for Security! Compute Nodes Manage Node
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICH BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Compute Nodes BasicSystemSetupforClusterBasicSystemSetupforCluster
Linux Kernel Kernel Module GNU Libc Boot Loader MPICHMPICHOpenPBSOpenPBS BashBash PerlPerl MessagingMessaging YPYPNISNIS Account Mgnt. SSH D GCCGCC Job Mgnt. NFSNFS File Sharing Ex tra On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node On Manage Node, We need to install Scheduler and Network File System for sharing Files with Compute Node
An open-source, parallel implementation of NCBI BLAST 特點: – Database fragmentation – Query segmentation – Parallel input/output 設計理念: – The Design, Implementation, and Evaluation of mpiBLAST. – 類似工具: – TurboWorx TurboBLAST – Parallel BLAST by Caltech Parallel BLAST mpiBLASTmpiBLAST
mpiBLASTmpiBLAST BLASTBLAST GenBank
Grid 101 & mpiBLAST-G2 Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2.2 :
Grid =~ Cluster of Cluster
mpiBLAST-G2 is an enhanced parallel program of LANL's mpiBLAST. It is based on Globus Toolkit 2.x and MPICH-g2. Bioinformatics Technology and Service (BITS) team of Academia Sinica Computing Centre (ASCC), Taiwan 參考: – The MPIBLAST-g2 Introduction The MPIBLAST-g2 Introduction – MPIBLAST-g2 Example MPIBLAST-g2 Example – mpiBlast-G2 with GT4 mpiBlast-G2 with GT4mpiBLAST-G2mpiBLAST-G2
Cloud 101 & CloudBLAST Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 2.3 :
Cloud =~ Virtualization + Cluster
RunBLAST : mpiBLAST in Amazon EC2 Video:
Map/ReduceMap/Reduce Ref. MapReduce: Simplified Data Processing on Large Clusters, GoogleMapReduce: Simplified Data Processing on Large Clusters
“CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications”, eScience 2008 特點:採用 MapReduce 演算法進行 BLAST 運算 CloudBLASTCloudBLAST
Open Source for Bioinformatics Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang PART 3 :
自由軟體 ( Free Software ) Stand On the Shoulder of Giants 站在巨人的肩膀上,是自由軟體發展 的理念。其靈活、可自由複製、分享 的價值,將有效解決資訊教育的管理 成本及商業軟體高成本負擔的問題。
Open Bioinformatics Foundation - – BioPerl - – BioPython - – BioPHP- – BioJava - C++ Bio Sequence Library – – C++ 版本的序列分析函式庫 Bio-SPICE - BioEra - – 跟腦科學有蠻強的關聯性,主要功能是在做訊號處理。 NCBI Viewer - Open Source is your Friend !!
ConclusionConclusion HOW BIG CAN YOU CAN YOU THINK ?? THINK ?? 找個好題目 高速計算的工具很多,困難的是找個好題目 !! HOW BIG CAN YOU CAN YOU THINK ?? THINK ?? 找個好題目 高速計算的工具很多,困難的是找個好題目 !!
Questions? Slides Questions? Slides Jazz Wang Yao-Tsung Wang Jazz Wang Yao-Tsung Wang
Research topics about PC Cluster Ref: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences SystemArchitectureSystemArchitecture ParallelComputingParallelComputing ParallelAlgorithmsAndApplicationsParallelAlgorithmsAndApplications ProcessArchitectureProcessArchitecture NetworkArchitectureNetworkArchitecture StorageArchitectureStorageArchitecture System-levelMiddlewareSystem-levelMiddleware Share Memory Programming Programming Distributed Memory Programming Programming Application-level Middleware Programming Application-level