Download presentation
Presentation is loading. Please wait.
1
資料庫系統實驗室 指導教授:張玉盈
2
Relational Database SNOOPYFAMILY 利用SQL做查詢: Select NAME
Male Female Primary Key Domains 利用SQL做查詢: ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN 3 SALLY BROWN Female 4 LUCY VAN PELT 5 LINUS VAN PELT 6 PEPPERMINT PATTY 7 MARCIE 8 SCHROEDER 9 WOODSTOCK - Select NAME From SNOOPYFAMILY Where SEX = ‘Male’; Cardinality 結果: Tuples ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN 5 LINUS VAN PELT 8 SCHROEDER Attributes Degree
3
S H T M Image Databases (a) An image picture
(b) The corresponding symblic representation 2D String : x : M<H<T=S y : H=T<M<S
4
Image Database 應用層面:辦公室自動化、電腦輔助設計、醫學影像擷取…等等。 影像資料庫中的查詢(Queries):
Spatial Reasoning(空間推理) : 在一張影像中推論兩兩物件之間的空間關係。 Pictorial Query(圖像查詢) : 允許使用者給予特定的空間關係以查詢相對應的影像。 Similarity Retrieval(圖形相似擷取) : 藉由使用者所提供的資訊在影像資料庫中找尋出最相似的圖形。 Marc Lucy Pe Fr Linu Char (a) An image picture (b) Symbolic Picture
5
Uids of 13 spatial operators
B < 1 <* 2 | 3 |* 4 / 5 /* 6 ] 7 [ 8 % 9 = 10 ]* 11 [* 12 %* 13
6
Another View of 169 relations
7
5 Category Relationships(CAB)
Disjoin : A B Contain : A B Meet : A B B A Inside : B A Partly Overlap :
8
Decision tree of the CATEGORY function
oid , oid > 4 x y T F 7 ≦ oid , oid ≦ 10 oid , oid > 2 x y x y T F T F 10 ≦ oid , oid ≦ 13 Contain x y Join Disjoin T F Belong Part_Overlap
9
UID Matrix representation(cont.)
b d c f1 a b c d a b c d
10
Similarity Retrieval based on the UID Matrix(1)
Definition1 Picture f’ is a type-i unit picture of f, if (1) f’ is a picture containing the two objects A and B, represented as x: A rx’A,B B, y: A ry’A,B B. (2) A and B are also contained in f. (3) the relationships between A and B in f are represented as x: A rxA,B B, and y: A ryA,B B. Then, (Type-0): Category(rx’A,B , ry’A,B) (Type-1): (Type-0) and (rx’A,B = rxA,B or ry’A,B =ryA,B) (Type-2): rx’A,B = rxA,B and ry’A,B =ryA,B
11
3 type-i similarities A B B A B A A B f(A/B, A/*B) type-1 (A/B, A[*B)
12
Finding Frequent Patterns in Image Databases
Image Mining: Finding Frequent Patterns in Image Databases Setting the minimum support to ½. Charlie Brown often appears to the right of Snoopy.
13
Video : Image + Time + + + + + 範例: 一幕幕的Snoopy影像,編織成一部精彩的Snoopy影片 ……
Image N 範例: 一幕幕的Snoopy影像,編織成一部精彩的Snoopy影片 + + + + +
14
Multimedia Database - Pictures - Pictures with the depicted texts
-Voice -Video - Flow Chart
15
Spatial Database : Nearest Neighbor Query
Where is the nearest restaurant to our location ?
16
Query Types 精確比對查詢: 哪一個城市位在北緯43度與西經88度? 部分比對查詢: 哪些城市的緯度屬於北緯39度43分?
給定範圍查詢: 哪些城市的經緯度介於北緯39度43分 至43度與西經53度至58度之間? 近似比對查詢: 最靠近東勢鎮的城市是?
17
Difficulty No total ordering of spatial data objects that preserves the spatial proximity. c c d d 但空間資料的存取的困難點在於距離遠或近的空間資料,很難使用一個順序將這些空間資料儲存於一維的磁碟空間中, 並且保證存於一維磁碟空間中相鄰的空間資料就是二維平面以上空間中相鄰的空間資料。 例如:我們可能以這左邊圖示來循序存取 a, b, c, d,也可能以又邊圖示來循序存取 a, c, b, d , 在二維平面上距離 d 相近的兩點 b, c,但在在一維磁碟空間中,未必能 d 相鄰兩側存取得到。 But there is no total ordering among spatial objects that preserves spatial proximity. It will affect the management of the spatial data and the searching for the spatial query. a a b b a b c d ? / a c b d ?
18
Space Decomposition and DZ expression
19
The Bucket-Numbering Scheme
Bigger Smaller (c) (a) the uptrend of the bucket numbers of an object N-order Peano Curve
20
Example O(l,u) = (12,26) Max_bucket = 63
The total number of buckets depends on the expected number of data objects. maximum bucket number: Max_bucket = 63
21
Example the data (b) the corresponding NA-tree structure (bucket_capacity = 2)
22
The basic structure of the revised version of the NA-tree
23
NN (Nearest Neighbor) NN problem is to find the nearest neighbor of q (query point). q Query point Nearest neighbor of q Managed by a Peer
24
Where are the 2 nearest points with keywords B and C?
Spatial Databases: KNN Keyword Query Where are the 2 nearest points with keywords B and C?
25
Road Network Databases: K Nearest Neighbor Query
Where are the 3 nearest restaurants?
26
Top-k Spatial Keyword Query
Spatial Databases: Top-k Spatial Keyword Query Where are the top-1 ‘Snoopy hotel’ near Kaohsiung?
27
Reverse nearest neighbor of q
RNN (Reversed NN) The q is the nearest neighbor of the blue points. RNN is a complement of NN problem. Query point Reverse nearest neighbor of q q Managed by a Peer
28
Application : Business strategy
Reverse Nearest Neighbor(RNN) Query means : to obtain the objects which treat the query as their nearest neighbor. Application : Business strategy Location B Location A Five residents treat Location B as their NN. Three residents treat Location A as their NN. Location B is a better place to run the store. Query q Residents
29
Reverse Nearest Neighbor(RNN) Query means : to obtain the objects which treat the query as their nearest neighbor. Application : Traffic police B A Traffic smooth Traffic jam Five cars treat Location A as their NN. Three cars treat Location B as their NN. Location A is a better place to the police for patrol. Query q A Query move Cars
30
Spatial Database : Continuous Nearest Neighbor Query
Find the nearest gas stations from the starting point to the ending point.
31
Spatio-temporal Database
What is the traffic condition ahead of me during the next 30 minutes? Where is the available gas station around my location after 20 minutes?
32
I have it and let’s share it.
P2P System I want to eat a pumpkin. Who has it? I have it and let’s share it.
33
Client-server vs. Peer-to-Peer network
Example : How to find an object in the network Client-server approach Use a big server store objects and provide a directory for look up. Peer-to-Peer approach Data are fully distributed. Each peer acts as both a client and a server. By asking. 方式的不同: 1.用一個超大的伺服器來儲存所有的object並且提供一個directory來做查詢 2.資料是完全分散在整個網路上的,依靠”詢問”的方式來找尋資料。
34
Data Grids I want File-A. I want File-X. 34
35
Find the patterns from the protein database.
判斷蛋白質所屬家族 判斷蛋白質功能
36
Data Mining 大家排隊來結帳 PC 收銀台 顧客通常在買麵包時也會買牛奶 利用資料挖礦的技術 對大家購買的紀錄作分析
Peanuts Supermarket 大家排隊來結帳 PC 顧客通常在買麵包時也會買牛奶 利用資料挖礦的技術 對大家購買的紀錄作分析
38
Data Clustering 形成三個較為單純群集再做分析較為容易 一組非常雜亂的資料,分析困難 Girl Animal Boy
產生三個相似的群集 找到資料間彼此相似的特性
39
Example object cluster age income
40
Classification 從目前的 資料中學習 GIRLS 對新的資料 做準確的 預測分類
41
Classification of Uroflowmetry Curves
Uroflow patterns: (a) Bell-shaped; (b) Tower-shaped; (c) Staccato-shaped; (d) Interrupted-shaped; (e) Plateau-shaped; (f) Obstructive-shaped.
42
Sample Training Data No Attributes Class Location Age Marriage status
Gender Low 1 Urban Below 21 Married Female 2 Male 3 Suburban High 4 Rural Between 21 and 30 5 Above 30 Single 6 7 8 9 10 11 12 13 14
43
A Complex Decision Tree
Predictive power low
44
A Compact Decision Tree
Its predictive power is often higher than that of a complex decision tree.
45
Subspace Clustering Subspace Cluster :
以圖例來說明,左邊這張圖是將microarray以曲線圖來表示,每條線代表的是基因,X軸上的是condition, 我們可以看到右邊這兩張圖,當我們挑出某些基因和某些不連續的condition,我們可以發現這些基因在這些condition上有一致性的表現,也就是他們有類似的波動,即使這筆基因和這兩筆基因的距離很遠,但是因為他們有一致性的表現,所以我們將他分在一群, 底下的圖示另外一個subspace clustering,我們可以看到它包含的基因或condition可能會overlap Subspace Cluster : {gene1, gene2, gene3} x {b, c, h, j, e} {gene1, gene3} x {c, e, g, b} 45
46
Web DB Profile index Web Pages Matching process Profile Interests
Filtered result Recommend the page which introduces “basketball” to those people whose interest is “ basketball ” .
47
Web Mining
48
從封包的Stream Data中找出DOS攻擊的IP
Data Stream Mining 從封包的Stream Data中找出DOS攻擊的IP
49
Traditional vs. Stream Data
Traditional Databases Data stored in finite, persistent data sets. Stream Data (Big data in cloud) Data as ordered, continuous, rapid, huge amount, time-varying data streams. (In-Memory Databases) 傳統的資料庫與資料串流的差異 傳統資料庫資料儲存於有限且永久保存的資料集合中,屬於靜態資料並不會隨著時間不斷的變化。 資料串流的資料是有序、連續、快速、大量且隨著時間變化的資料型態,隨著時間資料不斷的湧入資料庫中,無法去預測資料量且必須在有效的時間內將資料處理完畢。 49
50
Landmark Window Model t0 t1 t2 t i tj tj+1 tj+2 W1 W2 W3 time
… t2 t i tj tj+1 tj+2 W1 W2 W3 time Figure 1. Landmark Window Landmark Window Model探勘模式主要是以設立一個探測的起始時間點開始蒐集至設定的window size做一次探勘,接著隨著時間移動持續累積資料並做探勘,這種方式可能會導致記憶體不足以容納所有資料的問題。 50
51
Titlted-Time Window Model
31 days … 24 hours 4 qtrs time Figure 3. Tilted-Time Window Tilted-Time Window Model是Landmark Window的一種變型,其方法為根據距離現在的時間點最近的資料以較小的測量刻度來記錄資料,隨著時間移動越舊的資料會已較大的刻度來記錄資料的資訊,此種方法可以將資料壓縮並考慮資料的熱門潮流訊息。 51
52
Sliding Window Model t0 t1 t2 ti tj tj+1 tj+2 W1 W2 W3 time
… t2 ti tj tj+1 tj+2 W1 W2 W3 time Figure 2. Sliding Window Sliding Window Model較異於前兩者,其探測方式永遠考慮最近的資料,設定一個固定的window size隨著時間移動window也跟著移動,其動作除了資料的新增還包含刪除的動作,將超過window size的舊資料移除,不將舊的資料加入探勘的動作中。 52
53
False-Positive answer
Exactly Real Answer False-Positive Answer
54
False-Negative answer
Exactly Real Answer
55
Periodicity Mining in Time Series Databases
Three types of periodic patterns: Symbol periodicity T = abd acb aba abc Symbol a , p = 3, stPos = 0 Sequence periodicity (partial periodic patterns) T = bbaa abbd abca abbc abcd Sequence ab, p = 4, stPos = 4 Segment periodicity (full-cycle periodicity) T = abcab abcab abcab Segement abcab, p = 5, stPos = 0
56
Mining Frequent Periodic Patterns
User wants to know whether the pattern periodic or not in the time-series database. How to earn money? Find frequent periodic patterns and predict the future tend of the time-series database. Use computer analyzes time-series database.
57
Customers buy something, storage item and time-interval.
Mining Time-Interval Sequential Patterns Customers buy something, storage item and time-interval. Find Time-interval patterns not only reveals the order of items but also the time intervals between successive items. Use computer analyzes database.
58
User wants to know which pattern can make money and the most items.
Mining Weight Maximal Frequent Patterns User wants to know which pattern can make money and the most items.
59
Mining High Utility Patterns
Which itemset can contribute the most profit value of all the transactions?
60
Monomg Repeating Patterns in Music Databases
舉例來說,這是歌曲小蜜蜂的譜,藍框和紅框所框住的長度為3且出現頻率是3的兩個RP,但這兩個RP卻被一個更長綠框的RP所包含,且頻率也是3,因此這兩個小的RP並不是non-trivial repeating pattern,綠框才是,藉此來減少表示一首歌RP的數量。 60
61
Co-Location Patterns
62
Mining Spatial Co-Location Patterns
Ex {A,C} ───────── {(3,1),(4,1)} {(2,3),(1,2)} {(2,3),(3,3)}
63
Where is good location for retailers to open an after-market ?
Co-Location Patterns A = Auto dealers R = auto Repair shops D = Department stores G = Gift stores H = Hotels Co-location patterns: {A, R}, {D, G} Where is good location for retailers to open an after-market ?
64
知識的表達 處理 效率分析 圖例. 資料庫系統的研究領域 資料庫模型、資料結構、資料整體的維護 查詢語言、使用方便性 查詢處理、簡單性、回應
時間、空間需求 查詢語言、使用方便性 圖例. 資料庫系統的研究領域
Similar presentations