Machine Learning and Bioinformatics 機器學習與生物資訊學
What is machine learning? Machine Learning and Bioinformatics
K-Nearest-Neighbors (KNN) A very trivial machine learning tool The predicted class of the query sample depends on the voting among its k nearest neighbors O X X O O X O ? X X O O X X O Machine Learning and Bioinformatics
Machine Learning and Bioinformatics When k = 3 O X X O O X O O X X O O X X O Machine Learning and Bioinformatics
Machine Learning and Bioinformatics When k = 5 O X X O O X O X X X O O X X O Machine Learning and Bioinformatics
Although KNN is very trivial, it can Example: in vitro fertilization Given: embryos described by 60 features Problem: selection of embryos that will survive Data: historical records of embryos and outcome Given a set of known instances Predict outcome for newly coming instances So, KNN learnt something related to “the definition of embryo goodness” Machine Learning and Bioinformatics
Can machines really learn? Notice that here we call KNN a machine Definitions of “learning” from dictionary: To get knowledge of by study, experience, or being taught To become aware by information or from observation To commit to memory To be informed of, ascertain; to receive instruction Operational definition: Things learn when they change their behavior in a way that makes them perform better in the future Difficult to measure Trivial for computers Does a slipper learn? Machine Learning and Bioinformatics
Shortly speaking, machine learning is Knowledge/Information Training data A set of known instances Testing data A query instance Machine E.g. KNN Outcome Class of the query instance Machine Learning and Bioinformatics
Furthermore, learning is Knowledge/Information When training data increases Training data A set of known instances It delivers better (e.g. higher accuracy) outcome Testing data A query instance Machine E.g. KNN Outcome Class of the query instance Machine Learning and Bioinformatics
Usually, we don’t invent the wheel Knowledge/Information Training data A set of known instances Convert data (e.g. embryos) to vector is not trivial Testing data A query instance Machine E.g. KNN Outcome Class of the query instance Machine Learning and Bioinformatics
Machine learning-based approaches Transform your problem to a machine learning problem usually a classification problem classification is so-called “supervised learning” a typical unsupervised learning problem: clustering The most critical step is to encode a sample as a vector (to extract appropriate features) Machine Learning and Bioinformatics
about Machine Learning Machine Learning and Bioinformatics
Machine Learning and Bioinformatics Finance Machine Learning and Bioinformatics
think of any data (feature) for stock prediction? Could you think of any data (feature) for stock prediction? Machine Learning and Bioinformatics
How do you recognize man and woman http://www.sagennext.com/wp-content/uploads/2010/02/Business-Man-and-Woman1.jpg
Machine Learning and Bioinformatics Stock/股票 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 股票的市場價值 股票是表彰股東(投資人)出資參與某一家公司所有權的憑證,股東因此可分享該公司經營成果的股息及股利或於公司解散時參與分派剩餘財產 每股盈餘 = (稅後淨利-特別股股利)/流通在外股數 帳面值 = 公司淨值/發行股數 股票價值 = f(公司獲利能力、發展前景…) 股票價格由交易市場買賣雙方決定 由於公司獲利能力不可能恆久不變,所反應的股票價格亦將隨之變動,投資人購置股票除預期未來之股利收入外,也可望賺取股票持有期間的差價(即資本利得) Machine Learning and Bioinformatics
Machine Learning and Bioinformatics Systematic Risk / 系統風險 市場風險、不可分散風險 是指由於某種共通因素導致股市上所有股票下跌 誘因發生在企業外部,企業本身無法控制 影響面較大,且無法通過分散投資來加以消除 經濟方面的如利率、匯率、通貨膨脹、能源危機、經濟周期迴圈等;政治方面的如政權更迭、戰爭衝突等;社會方面的如體制變革、所有制改造等 有些股票的敏感度較高,如基礎性行業、原材料行業等 Machine Learning and Bioinformatics
Nonsystematic Risk / 非系統風險 非市場風險、可分散風險 只對某個行業或個別公司產生影響的風險 例如公司的工人罷工、新產品開發失敗、失去重要的銷售合同、訴訟失敗或宣告發現新礦藏等 可通過分散投資來加以消除 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics What I thought of Stock property 歷史股價(技術分析) TWSE 臺灣證券交易所 Yahoo Finance Google Finance RSI (Relative Strength Indicator) MACD (Moving Average Convergence-Divergence) 公司資料(基本分析) 個別損益表 資產負債表(PChome) 公司資料(PChome) 企業投資法則(Warren E. Buffett) 淨資產現值投資法(Benjamin Graham) Nonsystematic risk Exclude right/dividend (除權/息) News Systematic risk Other markets 美股動態 (PChome) 國際股市指數 (Wantgoo) 黃金價格 證券櫃檯買賣中心(債券) Institutional investors / 法人 Exchange rate (between currency) / 匯率 Machine Learning and Bioinformatics
Institutional investors / 法人 Do you know Institutional investors / 法人 Machine Learning and Bioinformatics
Let’s start from the beginning Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 金融商品 銀行定存 民間互助會(標會) 票券 外幣 公債 公司債 轉換債券 保險契約 股票 期貨 選擇權 認購權證 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics How many do you know Machine Learning and Bioinformatics
Machine Learning and Bioinformatics A story… Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 銀行定存:不同銀行提供不同利率 民間互助會:即一般之「標會」 票券:係指「國庫券 (TB)」、「商業本票(CP)」、「銀行承兌匯票 (BA)」、「可轉讓定期存單 (CD)」等工具 外幣:風險來自於國際事件及總體經濟之變化 公債:政府所發行的債券 公司債:企業所發行的債券 轉換債券:企業發行債券時,約定債權人在特定日期之後可將債券轉換為股票 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 保險契約:儲蓄型保險契約不可轉讓,可視為強迫儲蓄 股票:視各種因素決定買賣,獲利及風險須視市場行情及投資人的操作狀況而定 期貨:依附於現貨市場的衍生性契約,期初需繳納保證金,用來鎖定買價或賣價 選擇權:持有人付出權利金後,有權以約履約價格購買標的物,但沒有義務履行契約 認購權證:與選擇權類似,由企業或第三者所發行,持有人在約定日期之後,以約定價格購買公司的股票 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics Back to this Stock property 歷史股價(技術分析) TWSE 臺灣證券交易所 Yahoo Finance Google Finance RSI (Relative Strength Indicator) MACD (Moving Average Convergence-Divergence) 公司資料(基本分析) 個別損益表 資產負債表(PChome) 公司資料(PChome) 企業投資法則(Warren E. Buffett) 淨資產現值投資法(Benjamin Graham) Nonsystematic risk Exclude right/dividend (除權/息) News Systematic risk Other markets 美股動態 (PChome) 國際股市指數 (Wantgoo) 黃金價格 證券櫃檯買賣中心(債券) Institutional investors / 法人 Exchange rate (between currency) / 匯率 Machine Learning and Bioinformatics
Some technical analysis indices 技術分析學院 頭肩型態 RSI (Relative Strength Indicator) / 相對強弱指標 MACD (Moving Average Convergence-Divergence)* PSY (Psychological Line) / 心理線指標 %R/威廉R指標 OBV (On Balance Volume) / 能量潮指標* VR (Volume Ratio) / 成交量比率指標* The wave principle / 波浪理論 K線* Resistance and Support / 壓力與支撐* Machine Learning and Bioinformatics
Fundamental analysis indices 投資實務—價值型投資 華倫.巴菲特 (Warren E. Buffett)企業投資法則 班傑明.葛拉漢(Benjamin Graham)淨資產現值投資法 約翰.奈夫(John Neff)低本益比投資法 彼得.林區(Peter Lynch)草根調查選股法則 Machine Learning and Bioinformatics
Exclude right/dividend (除權/息) 因應發放股票股利或現增而向下調整股價就是除權,因應發放現金股利而向下調整股價就是除息 除息日申報參考價 = 前一交易日收盤價 -現金股利金額 除權參考價=前一交易日該股票收盤價/(1+配股率) 除權又除息參考價 = (前一交易日該股票收盤價-現金股利金額)/(1+配股率) 投資人在除權除息基準日前後買進該公司的股票代價應該是一致的,不然大家一定早就知道怎樣是有利可圖而一窩蜂參與除權除息或是放棄除權除息了 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 除息後股價減少 因為參與除息的股東所領到的現金是直接從公司的淨值扣掉的,就像是本來有一盒巧克力一百元,裡面一共有十顆一樣大小的巧克力,你花一百元買來後馬上吃掉一顆,這時候這盒巧克力當然是只能賣九十元 同樣的道理,你花三萬元買了一張每股30元含息2元的股票,參與除息得到2000元後(就像你吃掉一顆巧克力),股價就要跟著反應跌到28 除權後股價減少 除權只是把公司淨值中盈餘的一部份轉成股本的形式 假設一家公司是一盒披薩,股東一共有八個人且持股比例相等,換句話說每個人都擁有八分之一個披薩,後來有人提議把披薩切成十六片,因為每個人還是擁有八分之一的披薩,所以每個人就變成都擁有兩片比較小的披薩,後來一片的大小是原來的一半,當然價值也變成原來的一半了 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 市場動態實例 美股大跌 台股開7779.44點下跌53.21點 美國聯邦準備理事會(Fed)公布會議紀錄,衝擊周三美股的表現 道瓊工業指數重跌105.44點,跌幅0.7%,收14897.55點 NASDAQ下跌13.8點,跌幅0.38%,收3599.79點 S&P 500指數收跌9.55點,跌幅0.58%,以1642.8點作收 費城半導體指數下跌3.3點,跌幅0.71%,收458.53點 美股重挫,拖累台股開低,權值股、高價股齊跌,大立光以1050元小低盤開出 早盤電子股、金融股齊挫,傳產股部分,水泥股、營建股、汽車股跌幅逾1%,盤中亮點則是蘋果概念股 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 債券 固定收益證券 (Fixed Income Securities) 是在特定的時間內,對一連串收益的求償權 (Claim) 面額 (Par Value):台灣多為新台幣100,000元,美國多為美金1000元 發行日 (Issue Date):發行債券之日期 到期日 (Maturity Date):發行人償還持有人債券面額之日期 票面利率 (Coupon Interest Rate):一般債券的票面利率為固定 殖利率 (Yield to Maturity, YTM):可視為債權人握有債券開始至到期所賺到的利率 Machine Learning and Bioinformatics
Institutional investors 三大法人 foreign investment institution / foreign Investor / 外資 investment trust / domestic institution / 投資信託基金 / 投信 dealer / 證券自營商 Net buy / 買超, net sell / 賣超 三大法人 (PChome 股市) 三大法人進出行情 (WantGoo) 三大法人買賣超日報 (TWSE) Retail investors / 散戶 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics 自然人(natural person)是一個專門法律術語,是法人的對稱,指能够享受權利和承擔義務的個人。自然人的概念比公民要廣泛,例如在我國所稱的自然人,既包括我國依法享受公民權的公民,也包括在我國領域內居住的外國人和無國籍人士。 法人(legal personality)也是一個專門的法律術語,是自然人的對稱,其具有民事權利能力和民事符為能力,依法獨立享有民事權利和承擔民事義務的單位。例如企業、公司、學校等。一般而言,法人應俱備以下四個條件;一是依法成立的;二是有必要的資金和經費;三是有自己的名稱、組織機構和場所;四是能够獨立承擔民事責任。具有以上條件的單位稱為法人。 Machine Learning and Bioinformatics
Machine Learning and Bioinformatics about investment Machine Learning and Bioinformatics
Machine Learning & Bioinformatics Today’s exercise Machine Learning & Bioinformatics
Machine Learning & Bioinformatics Grab raw data Grab stock historical prices from TWSE into an organized format. Send TA Lin a report before 23:59 10/16 (Wed). Machine Learning & Bioinformatics