商業智慧實務 Practices of Business Intelligence

Slides:



Advertisements
Similar presentations
網站經營心得分享 林文宗 明新科技大學資管系助理教授 麟瑞科技顧問 工研院資通所無線通訊技術組顧問 明新科技大學電算中心網路組組長 國立清華大學資訊工程學系博士.
Advertisements

第一篇 管理資訊系統之 基本概念. 學習目標  認識現代企業中對資訊人員的挑戰和機會  了解造成資訊管理日益受到重視的環境因 素  區分資料與資訊  建立系統觀點  了解管理資訊系統的分類  認識組織中的資訊需求層面.
陳春賢 老師 長庚大學 資管系 報告人 : ( 研究方向、成果與計畫 ) 資料探勘與生醫資訊相關研究 ( 研究方向、成果與計畫 )
《互联网运营管理》系列课程 觉浅网 荣誉出品
企業入口網站(EIP)/ 應用系統(ERP, SCM, CRM)
第一章 会计信息系统 第一节 计算机会计概述.
Big Data Ecosystem – Hadoop Distribution
Some Knowledge of Machine Learning(1)
從研究生指導經驗談 研究生如何管理論文研究
第六章 資料倉儲與採礦技術 6.1 資料倉儲與採礦定義 6.2 資料採礦之步驟與技術分類 6.3 資料採礦在顧客關係管理之應用
METAEDGE Corporation Taiwan
频繁模式与关联规则挖掘 林琛 博士、副教授.
手持裝置應用系統之設計 與未來發展 黃有評 大同大學 資訊工程系.
CH3 關聯規則 授課老師:簡禎富 講座教授 簡禎富、許嘉裕©2014 著作權所有.
关联.
人際溝通 Interpersonal Communication
商業智慧與資料倉儲 課程簡介 靜宜大學資管系 楊子青.
資訊管理 第三章 數位化企業.
Service survey center, NBS
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
数据仓库与数据挖掘 复习.
資訊管理 第九章 資料採礦.
關聯式資料庫.
Chapter 4 歸納(Induction)與遞迴(Recursion)
应用SAS/EM进行数据挖掘 赛仕软件研究所(上海)有限公司.
第 1 章 ERP的演变.
Knowledge Engineering & Artificial Intelligence Lab (知識工程與人工智慧)
Data Mining 工具介紹 (Weka+JDBC)
圖形溝通大師 Microsoft Visio 2003
Special Topics in Social Media Services 社會媒體服務專題
Decision Support System (靜宜資管楊子青)
Course 9 NP Theory序論 An Introduction to the Theory of NP
巨量資料分析與應用 (1) 楊立偉教授 台大工管系暨商研所 2014 Fall.
Data Mining 工具介紹 (Weka/R + ODBC)
第8章 關聯分析 王海.
The Issue of Information Security Management 資安管理專題
数据挖掘: 概念和技术 — Chapter 6 — ©张晓辉 复旦大学 (国际)数据库研究中心
Data Mining 資料探勘 Introduction to Data Mining Min-Yuh Day 戴敏育
Tamkang University Data Mining 資料探勘
Formal Pivot to both Language and Intelligence in Science
邏輯設計 Logic Design 顧叔財, Room 9703, (037)381864,
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
基于类关联规则的分类 Classification Based on Class-Association Rules
Decision Support System (靜宜資管楊子青)
Version Control System Based DSNs


常見的巨量資料分析與應用 楊立偉教授 台大工管系暨商研所 2018.
EndNote X6 進階 Advance your Research and Publish Instantly
Maintaining Frequent Itemsets over High-Speed Data Streams
每周三交作业,作业成绩占总成绩的15%; 平时不定期的进行小测验,占总成绩的 15%;
電子商務安全 Secure Electronic Commerce
資訊安全概論 Introduction to Information Security
人際溝通 Interpersonal Communication
主講人:陳鴻文 副教授 銘傳大學資訊傳播工程系所 日期:3/13/2010
A Data Mining Algorithm for Generalized Web Prefetching
系统科学与复杂网络初探 刘建国 上海理工大学管理学院
資料庫管理 Database Managent Ex.1-2 課本範例練習
資料庫系統實驗室 指導教授:張玉盈.
資訊數位服務 Information Service
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
常見的巨量資料分析與應用 楊立偉教授 台大工管系暨商研所 2017.
MODELING GENERALIZATION & REFINING THE DOMAIN MODEL
人際溝通 Interpersonal Communication
Advanced Basic Key Terms Dependency Generalization Actor Stereotype
MGT 213 System Management Server的昨天,今天和明天
第四课时 Unit 1 My school 天津市津南区双桥河第二小学 段忠华 绿色圃中小学教育网
Section 1 Basic concepts of web page
When using opening and closing presentation slides, use the masterbrand logo at the correct size and in the right position. This slide meets both needs.
Presentation transcript:

商業智慧實務 Practices of Business Intelligence Tamkang University 商業智慧實務 Practices of Business Intelligence 商業智慧的資料探勘 (Data Mining for Business Intelligence) 1022BI06 MI4 Wed, 9,10 (16:10-18:00) (B113) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學 資訊管理學系 http://mail. tku.edu.tw/myday/ 2014-03-26

課程大綱 (Syllabus) 週次 (Week) 日期 (Date) 內容 (Subject/Topics) 1 103/02/19 商業智慧導論 (Introduction to Business Intelligence) 2 103/02/26 管理決策支援系統與商業智慧 (Management Decision Support System and Business Intelligence) 3 103/03/05 企業績效管理 (Business Performance Management) 4 103/03/12 資料倉儲 (Data Warehousing) 5 103/03/19 商業智慧的資料探勘 (Data Mining for Business Intelligence) 6 103/03/26 商業智慧的資料探勘 (Data Mining for Business Intelligence) 7 103/04/02 教學行政觀摩日 (Off-campus study) 8 103/04/09 資料科學與巨量資料分析 (Data Science and Big Data Analytics)

課程大綱 (Syllabus) 週次 日期 內容(Subject/Topics) 9 103/04/16 期中報告 (Midterm Project Presentation) 10 103/04/23 期中考試週 (Midterm Exam) 11 103/04/30 文字探勘與網路探勘 (Text and Web Mining) 12 103/05/07 意見探勘與情感分析 (Opinion Mining and Sentiment Analysis) 13 103/05/14 社會網路分析 (Social Network Analysis) 14 103/05/21 期末報告 (Final Project Presentation) 15 103/05/28 畢業考試週 (Final Exam)

A Taxonomy for Data Mining Tasks Source: Turban et al. (2011), Decision Support and Business Intelligence Systems

Market Basket Analysis Source: Han & Kamber (2006)

Association Rule Mining Apriori Algorithm Source: Turban et al. (2011), Decision Support and Business Intelligence Systems

Basic Concepts: Frequent Patterns and Association Rules Itemset X = {x1, …, xk} Find all the rules X  Y with minimum support and confidence support, s, probability that a transaction contains X  Y confidence, c, conditional probability that a transaction having X also contains Y Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F Customer buys diaper buys both buys beer Let supmin = 50%, confmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A  D (60%, 100%) D  A (60%, 75%) A  D (support = 3/5 = 60%, confidence = 3/3 =100%) D  A (support = 3/5 = 60%, confidence = 3/4 = 75%) Source: Han & Kamber (2006)

Market basket analysis Example Which groups or sets of items are customers likely to purchase on a given trip to the store? Association Rule Computer  antivirus_software [support = 2%; confidence = 60%] A support of 2% means that 2% of all the transactions under analysis show that computer and antivirus software are purchased together. A confidence of 60% means that 60% of the customers who purchased a computer also bought the software. Source: Han & Kamber (2006)

Association rules Association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold. Source: Han & Kamber (2006)

Frequent Itemsets, Closed Itemsets, and Association Rules Support (A B) = P(A  B) Confidence (A B) = P(B|A) Source: Han & Kamber (2006)

Support (A B) = P(A  B) Confidence (A B) = P(B|A) The notation P(A  B) indicates the probability that a transaction contains the union of set A and set B (i.e., it contains every item in A and in B). This should not be confused with P(A or B), which indicates the probability that a transaction contains either A or B. Source: Han & Kamber (2006)

itemset K-itemset Example: A set of items is referred to as an itemset. K-itemset An itemset that contains k items is a k-itemset. Example: The set {computer, antivirus software} is a 2-itemset. Source: Han & Kamber (2006)

The set of frequent k-itemsets is commonly denoted by LK If the relative support of an itemset I satisfies a prespecified minimum support threshold, then I is a frequent itemset. i.e., the absolute support of I satisfies the corresponding minimum support count threshold The set of frequent k-itemsets is commonly denoted by LK Source: Han & Kamber (2006)

the confidence of rule A B can be easily derived from the support counts of A and A  B. once the support counts of A, B, and A  B are found, it is straightforward to derive the corresponding association rules AB and BA and check whether they are strong. Thus the problem of mining association rules can be reduced to that of mining frequent itemsets. Source: Han & Kamber (2006)

Transactional data for an AllElectronics branch Source: Han & Kamber (2006)

Example: Apriori Let’s look at a concrete example, based on the AllElectronics transaction database, D. There are nine transactions in this database, that is, |D| = 9. Apriori algorithm for finding frequent itemsets in D Source: Han & Kamber (2006)

Example: Apriori Algorithm Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2. Source: Han & Kamber (2006)

Example: Apriori Algorithm C1  L1 Source: Han & Kamber (2006)

Example: Apriori Algorithm C2  L2 Source: Han & Kamber (2006)

Example: Apriori Algorithm C3  L3 Source: Han & Kamber (2006)

The Apriori algorithm for discovering frequent itemsets for mining Boolean association rules. Source: Han & Kamber (2006)

Generating Association Rules from Frequent Itemsets Source: Han & Kamber (2006)

Example: Generating association rules frequent itemset l = {I1, I2, I5} If the minimum confidence threshold is, say, 70%, then only the second, third, and last rules above are output, because these are the only ones generated that are strong. Source: Han & Kamber (2006)

關聯分析衡量的機率統計值— Support & Confidence A B C D E B C E Rule A  D C  A A  C B & C  D Support 2/5 1/5 Confidence 2/3 2/4 1/3 Source: SAS Enterprise Miner Course Notes, 2014, SAS

Support & Confidence 高的關聯規則就一定是有用的規則? Checking Account No Yes 4,000 500 3500 No Saving Account 6,000 1000 5000 Yes 10,000 Support(SVG  CK) = 50%=5,000/10,000 Confidence(SVG  CK) = 83%=5,000/6,000 Expected Confidence(SVG  CK) = 85%=8,500/10,000 Lift (SVG  CK) = Confidence/Expected Confidence = 0.83/0.85 < 1 Source: SAS Enterprise Miner Course Notes, 2014, SAS

關聯分析衡量的機率統計值— Lift增益值 信心水準最高的就是最好的規則? 「如果 Saving account 則 Checking account」這個規則的發生機率 比單獨計算Checking account的發生機率還低。 增益值(Lift): 一條規則在預測結果時能比 隨機發生的機會好多少。 Lift (SVG  CK) = Confidence/Expected Confidence = 0.83/0.85 < 1 Source: SAS Enterprise Miner Course Notes, 2014, SAS

Support (AB) Confidence (AB) Expected Confidence (AB) Lift (AB)

Support (A B) = P(A  B) Confidence (A B) = P(B|A) A與B 共同出現次數/總交易次數 Count(A&B)/Count(Total) Confidence (A B) = P(B|A) Conf (A  B) = Supp (A  B)/ Supp (A) A與B 共同出現次數/A出現的次數 Count(A&B)/Count(A) Expected Confidence (AB) = Support(B) Count(B) Lift (A  B) = Confidence (AB) / Expected Confidence (AB) Lift (A  B) = Supp (A  B) / (Supp (A) x Supp (B)) Lift (Correlation) Lift (AB) = Confidence (AB) / Support(B)

Lift (AB) Lift (AB) = Confidence (AB) / Expected Confidence (AB) = Confidence (AB) / Support(B) = (Supp (A&B) / Supp (A)) / Supp(B) = Supp (A&B) / Supp (A) x Supp (B) Lift 增益值 (提升值) Lift (AB) = 2 表示 AB 這條規則的增益值為 2, 代表已知在買A的前題下又買B的機率, 比直接買B 的機率提升 (增益)了2倍。

「買芭比娃娃就會買糖果」 你的行銷策略如何? 把兩項商品擺在一起 特意把兩項商品擺在相距較遠的地方 將糖果和芭比娃娃組合起來一起賣 糖果+芭比娃娃+銷售較差的商品一起組合銷售 定價策略:提供一個單價,降低另一個商品價格 廣告策略:芭比娃娃和糖果不需要同時廣告活動 產品設計:設計芭比娃娃形狀的糖果 提供芭比娃娃的配件,提升銷售 Source: SAS Enterprise Miner Course Notes, 2014, SAS

我的資料適合進行 購物籃分析嗎? D A B Source: SAS Enterprise Miner Course Notes, 2014, SAS

Web Site Usage Associations 個案分析與實作二 (SAS EM 關連分析): Case Study 2 (Association Analysis using SAS EM) Web Site Usage Associations

網站使用行為關聯分析

案例情境 ABC音樂廣播電台為了服務更多聽眾,設置了電台網站,讓更多的線上聽眾也可以透過網站服務以隨時掌握電台的各個節目資訊,網站提供了流行音樂趨勢(music streams)、音樂下載(podcasts)、新聞訊息(news streams)、線上收聽(live Web )以及歷史節目收聽(archives)等服務功能頁面。 分析人員想要藉由關聯分析以進一步了解線上聽眾的使用行為,做為網站服務功能更新的依據。 分析樣本為撈取近兩個月約150萬筆的客戶交易資料。 Source: SAS Enterprise Miner Course Notes, 2014, SAS

資料欄位說明 資料集名稱: webstation.sas7bdat ARCHIVE 廣播節目回顧 EXTREF 好站相連 LIVESTREAM 熱門節目收聽 MUSICSTREAM 流行音樂區 NEWS 最新消息 PODCAST 音樂下載 SIMULCAST 同步收聽 WEBSITE 首頁 Source: SAS Enterprise Miner Course Notes, 2014, SAS

網站使用行為關聯分析實機演練 分析目的 依據使用者網站交易資料,利用關聯分析演算方法產生網站使用行為關聯規則。 演練重點: • 產生關聯分析資料集 • 進行關聯分析 • 關聯分析結果解釋 Source: SAS Enterprise Miner Course Notes, 2014, SAS

SAS Enterprise Miner (SAS EM) Case Study Step 1. 新增專案 (New Project) Step 2. 新增資料館 (New / Library) Step 3. 建立資料來源 (Create Data Source) Step 4. 建立流程圖 (Create Diagram) SAS EM SEMMA 建模流程

Download EM_Data.zip (SAS EM Datasets) http://mail.tku.edu.tw/myday/teaching/1022/DM/Data/EM_Data.zip http://mail.tku.edu.tw/myday/teaching.htm

Upzip EM_Data.zip to C:\DATA\EM_Data

Upzip EM_Data.zip to C:\DATA\EM_Data

VMware Horizon View Client softcloud.tku.edu.tw SAS Enterprise Miner

SAS Enterprise Guide (SAS EG)

SAS EG New Project

SAS EG Open Data

SAS EG Open webstation.sas7bdat

webstation.sas7bdat

webstation.sas7bdat

SAS Enterprise Miner 12.1 (SAS EM)

SAS EM 資料匯入4步驟 Step 1. 新增專案 (New Project) Step 2. 新增資料館 (New / Library) Step 3. 建立資料來源 (Create Data Source) Step 4. 建立流程圖 (Create Diagram)

Step 1. 新增專案 (New Project)

Step 1. 新增專案 (New Project)

Step 1. 新增專案 (New Project)

SAS Enterprise Miner (EM_Project2)

Step 2. 新增資料館 (New / Library)

Step 2. 新增資料館 (New / Library)

Step 2. 新增資料館 (New / Library)

Step 2. 新增資料館 (New / Library)

Step 2. 新增資料館 (New / Library)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source) DatabaseName.TableName LibraryName.TableName EM_LIB.WEBSTATION

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source) Data Source Attribute Role: Transaction

Step 3. 建立資料來源 (Create Data Source)

Step 3. 建立資料來源 (Create Data Source)

Step 4. 建立流程圖 (Create Diagram)

Step 4. 建立流程圖 (Create Diagram)

Step 4. 建立流程圖 (Create Diagram)

SAS Enterprise Miner (SAS EM) Case Study Step 1. 新增專案 (New Project) Step 2. 新增資料館 (New / Library) Step 3. 建立資料來源 (Create Data Source) Step 4. 建立流程圖 (Create Diagram) SAS EM SEMMA 建模流程

案例情境模型流程

樣本資料匯入 (Sample)

EM_Lib.Webstation

樣本資料匯入 (Sample) Edit Variable

樣本資料匯入 (Sample) Edit Variable - Explore …

樣本資料匯入 (Sample) Edit Variable - Explore …

Explore - Association

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis) Support : 1% (Minimum Support = 1%)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis)

關聯分析 (Association Analysis) 檢視/規則/規則表格 (Rules Table)

關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table)

關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table)

關聯分析 (Association Analysis) 檢視/規則/連結圖形 (Link Graph)

關聯分析 (Association Analysis) 連結圖形 (Link Graph)

關聯分析 (Association Analysis) Maximum Number of Items: 3000000

關聯分析 (Association Analysis)

關聯分析 (Association Analysis) Association Rules - 規則表格 (Rules Table)

關聯分析 (Association Analysis) 連結圖形 (Link Graph)

References Efraim Turban, Ramesh Sharda, Dursun Delen, Decision Support and Business Intelligence Systems, Ninth Edition, 2011, Pearson. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, 2006, Elsevier Jim Georges, Jeff Thompson and Chip Wells, Applied Analytics Using SAS Enterprise Miner, SAS, 2010 SAS Enterprise Miner Course Notes, 2014, SAS SAS Enterprise Miner Training Course, 2014, SAS SAS Enterprise Guide Training Course, 2014, SAS