資料庫系統實驗室 指導教授:張玉盈.

Slides:



Advertisements
Similar presentations
期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学 习和生活习惯; 1. 学生住校不利于了解外 界信息; 2 可与老师及同学充分交流有 利于共同进步。 2. 和家人交流少。 在寄宿制高中,大部分学生住校,但仍有一部分学生选 择走读。你校就就此开展了一次问卷调查,主题为.
Advertisements

《互联网运营管理》系列课程 觉浅网 荣誉出品
中职英语课程改革中 如何实践“以就业为导向,服务为宗旨”的办学理念
大数的认识 公顷和平方千米 角的度量、平行四边形和梯形 四年级上册 三位数乘两位数 除数是两位数的除法 统计.
資料庫設計 Database Design.
-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學 資訊管理系 李麗華 教授.
第七章 财务报告 财务报告 第一节 财务报告概述 一、财务报告及其目标: 1、概念:财务报告是指企业对外提供的反映企业某一特定日期
大数据在医疗行业的应用.
勾股定理 说课人:钱丹.
沐阳老年社区.
云实践引导产业升级 沈寓实 博士 教授 MBA 中国云体系产业创新战略联盟秘书长 微软云计算中国区总监 WinHEC 2015
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Minimum Spanning Trees
資料庫系統實驗室 指導教授:張玉盈.
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
Population proportion and sample proportion
關聯式資料庫.
模式识别 Pattern Recognition
資料庫結構與組織.
微積分網路教學課程 應用統計學系 周 章.
樹狀結構 陳怡芬 2018/11/16 北一女中資訊專題研究.
On Some Fuzzy Optimization Problems
第 1 章 ERP的演变.
Knowledge Engineering & Artificial Intelligence Lab (知識工程與人工智慧)
Flash数据管理 Zhou da
Digital Terrain Modeling
第4章(2) 空间数据库 —关系数据库 北京建筑工程学院 王文宇.
巨量資料分析與應用 (1) 楊立偉教授 台大工管系暨商研所 2014 Fall.
第8章 關聯分析 王海.
第五組 : 廖震昌 / 謝坤吉 / 黃麗珍 陳曉伶 / 陳思因 / 林慧佳
数据挖掘: 概念和技术 — Chapter 6 — ©张晓辉 复旦大学 (国际)数据库研究中心
圖表製作 集中指標 0628 統計學.
Interval Estimation區間估計
子博弈完美Nash均衡 我们知道,一个博弈可以有多于一个的Nash均衡。在某些情况下,我们可以按照“子博弈完美”的要求,把不符合这个要求的均衡去掉。 扩展型博弈G的一部分g叫做一个子博弈,如果g包含某个节点和它所有的后继点,并且一个G的信息集或者和g不相交,或者整个含于g。 一个Nash均衡称为子博弈完美的,如果它在每.
重點 資料結構之選定會影響演算法 選擇對的資料結構讓您上天堂 程式.
Formal Pivot to both Language and Intelligence in Science
增强型MR可解决 临床放射成像的 多供应商互操作性问题
人教版数学四年级(下) 乘法分配律 单击页面即可演示.
客户服务 询盘惯例.
第4章(1) 空间数据库 —数据库理论基础 北京建筑工程学院 王文宇.
基于类关联规则的分类 Classification Based on Class-Association Rules
Chp.4 The Discount Factor
IBM SWG Overall Introduction
Version Control System Based DSNs
研究技巧與論文撰寫方法 中央大學資管系 陳彥良.
Dept. of Information Management OCIT February, 2002
Maintaining Frequent Itemsets over High-Speed Data Streams
Guide to a successful PowerPoint design – simple is best
每周三交作业,作业成绩占总成绩的15%; 平时不定期的进行小测验,占总成绩的 15%;
Chp.4 The Discount Factor
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
OvidSP Introduction Flexible. Innovative. Precise.
公钥密码学与RSA.
從 ER 到 Logical Schema ──兼談Schema Integration
主講人:陳鴻文 副教授 銘傳大學資訊傳播工程系所 日期:3/13/2010
计算机问题求解 – 论题 算法方法 2016年11月28日.
A Data Mining Algorithm for Generalized Web Prefetching
Chp.4 The Discount Factor
第10章 存储器接口 罗文坚 中国科大 计算机学院
An organizational learning approach to information systems development
计算机问题求解 – 论题1-5 - 数据与数据结构 2018年10月16日.
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
TinyDB資料庫 靜宜大學資管系 楊子青.
何正斌 博士 國立屏東科技大學工業管理研究所 教授
Class imbalance in Classification
商業智慧實務 Practices of Business Intelligence
Principle and application of optical information technology
Gaussian Process Ruohua Shi Meeting
史 努 比 資料來源 : 姓名 :孫睿 座號 :26 班級 :2-5 指導老師 :黃源弘 資料來源 :
Presentation transcript:

資料庫系統實驗室 指導教授:張玉盈

Relational Database SNOOPYFAMILY 利用SQL做查詢: Select NAME Male Female Primary Key Domains 利用SQL做查詢: ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN 3 SALLY BROWN Female 4 LUCY VAN PELT 5 LINUS VAN PELT 6 PEPPERMINT PATTY 7 MARCIE 8 SCHROEDER 9 WOODSTOCK - Select NAME From SNOOPYFAMILY Where SEX = ‘Male’; Cardinality 結果: Tuples ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN 5 LINUS VAN PELT 8 SCHROEDER Attributes Degree

Introduction Data mining is widely used to mine or extract previously unknown, hidden, and potentially useful information from the large database. Frequent pattern mining is a basic research topic in pattern mining. It generates all frequent patterns with no smaller supports (or frequency) than a given minimum support threshold. Data mining 被廣泛用來從 database 中 mine 有用的資訊 Frequent pattern mining 在這領域是一個基本的研究主題 他產生所有的frequent 的patterns

Frequent Pattern Mining This technique has two limitations. First, it only considers that each item exists or does not exist in the binary form. Second, all items have same value with the same importance.

Data Mining Data mining is the process of finding hidden and useful knowledge form the large databases. However, items have different importance in the real world. For example, the iPhone (cellphone) is expensive and the telephone is cheap. Therefore, we have to consider the importance and the count of the items at the same time.

User wants to know which pattern can make money and the most items. Mining Weight Maximal Frequent Patterns User wants to know which pattern can make money and the most items.

Example TID Transaction 1 A, C 2 B, C 3 A, B, C 4 A, B 5 Item Weight A 0.6 B 0.8 C 0.4

Frequent Itemsets We can use the Apriori algorithm to find frequent patterns. 𝐿 1 ={A,B,C} 𝐿 1 => 𝐶 2 𝐶 2 ={AB,AC,BC} 𝐿 2 ={AB,AC,BC} 𝐿 2 => 𝐶 3 𝐶 3 ={ABC} 𝐿 3 = ø {A,B,C}:1 {A,C}:2 {B,C}:3 {A,B}:2 {A}:3 {B}:4 {C}:4 {item set}:count Min_Sup:2

Weighted Frequent pattern {A,B,C}:0.6 {A,C}:1.0 {A,B}:1.4 {A}:1.8 {B}:3.2 {C}:1.6 {B,C}:1.8 Item Weight A 0.6 B 0.8 C 0.4 WSup(PS) = sup(PS)* 𝑖=1 𝑙𝑒𝑛𝑔𝑡ℎ(𝑃𝑆) ( 𝑃 𝑖 ) 𝑙𝑒𝑛𝑔𝑡ℎ(𝑃𝑆) {item set}:WSup Min_Sup:1.8

In this case, the weighted frequent patterns are {A}, {B}, {B,C}. The weighted maximal frequent pattern is a pattern which does not have any weighted frequent super pattern. So, the weighted maximal frequent patterns in this case are {A}, {B,C}.

In the real world, each item has different profit and the number of items purchased by consumers may be not only one. In utility mining, each item has internal utility value that represents the quantity of the item in each transaction, and external utility value such as profit or price.

Mining High Utility Patterns Which itemset can contribute the most profit value of all the transactions?

In recent years, many applications have generated stream data such as transactions of retail markets. These data are continuous, unbounded, and usually coming with high speed.

Traditional vs. Stream Data Traditional Databases Data stored in finite, persistent data sets. Stream Data (Big data in cloud) Data as ordered, continuous, rapid, huge amount, time-varying data streams. (In-Memory Databases) 傳統的資料庫與資料串流的差異 傳統資料庫資料儲存於有限且永久保存的資料集合中,屬於靜態資料並不會隨著時間不斷的變化。 資料串流的資料是有序、連續、快速、大量且隨著時間變化的資料型態,隨著時間資料不斷的湧入資料庫中,無法去預測資料量且必須在有效的時間內將資料處理完畢。 15

Sliding Window Model t0 t1 t2 ti tj tj+1 tj+2 W1 W2 W3 time … t2 ti tj tj+1 tj+2 W1 W2 W3 time Figure 2. Sliding Window Sliding Window Model較異於前兩者,其探測方式永遠考慮最近的資料,設定一個固定的window size隨著時間移動window也跟著移動,其動作除了資料的新增還包含刪除的動作,將超過window size的舊資料移除,不將舊的資料加入探勘的動作中。 16

In the sliding window model, only recent data in a fixed size window are employed to discover meaningful patterns over data streams. This model is widely used for stream mining because of its ability to emphasize recent data and requires bounded memory resources.

Mining high utility patterns Problem Statement Given a data stream and a user-specified minimum utility threshold, mining high utility patterns in a window over the data stream is equivalent to discover a set of patterns having no smaller utilities than the minimum utility threshold from this window. 在數據流中的窗口中挖掘高效用模式 等價於從該窗口發現 一組不超過最小效用閾值的實用程序的模式 給定一個data stream 和 minimum utility threshold 從data stream的window中 mining high utility patterns 也等同於在這window中找到哪些pattern所貢獻的價值是不小於utility thrshold.

Simple Example TID Transaction TU T1 (A, 2) (B, 3) (C, 1) 1550 T2 (A, 1) (B, 2) 300 T3 (A, 2) 400 Item Profit A 200 B 50 C 1000 uT(AB, T1) = uT (A, T1) + uT (B, T1) = 200 × 2 + 50 × 3 = 550 u(AB) = uT (AB, T1) + uT (AB, T2) = 550 + 200 × 1 + 50 × 2 = 850 TU(T1) = 200 × 2 + 50 × 3 + 1000 × 1 = 1550 TWU(AB) = TU(T1) + TU(T2) = 1550 + 300 = 1850

Periodicity Mining in Time Series Databases Three types of periodic patterns: Symbol periodicity T = abd acb aba abc Symbol a , p = 3, stPos = 0 Sequence periodicity (partial periodic patterns) T = bbaa abbd abca abbc abcd Sequence ab, p = 4, stPos = 4 Segment periodicity (full-cycle periodicity) T = abcab abcab abcab Segement abcab, p = 5, stPos = 0

Mining Frequent Periodic Patterns User wants to know whether the pattern periodic or not in the time- series database. How to earn money? Find frequent periodic patterns and predict the future tend of the time-series database. Use computer to analyze time-series database.

Customers buy something, storage item and time-interval. Mining Time-Interval Sequential Patterns Customers buy something, storage item and time-interval. Find Time-interval patterns not only reveals the order of items but also the time intervals between successive items. Use computer to analyze database.

知識的表達 處理 效率分析 圖例. 資料庫系統的研究領域 資料庫模型、資料結構、資料整體的維護 查詢語言、使用方便性 查詢處理、簡單性、回應 時間、空間需求 查詢語言、使用方便性 圖例. 資料庫系統的研究領域