Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Topics in Data Mining: Sequential Patterns

Similar presentations


Presentation on theme: "Advanced Topics in Data Mining: Sequential Patterns"— Presentation transcript:

1 Advanced Topics in Data Mining: Sequential Patterns

2 Sequential Pattern Analysis

3 Sequential Pattern Mining
Progress in bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data A record in such data typically consists of the transaction date and the items bought in the transaction Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card Catalog companies also collect such data using the orders they receive

4 Sequential Pattern Mining
An example of such a pattern is that customers typically rent “Star Wars (星際大戰)”, then “Empire Strikes Back (帝國大反擊)”, and then “Return of the Jedi (絕地大反攻)” These rentals need not be consecutive Customers who rent some other videos in between also support this sequential pattern Elements of a sequential pattern need not be simple items “Computer Science and Programming Language”, followed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items

5 Sequential Pattern Mining
Given Transaction Time, Customer Id, Items Bought Original Database Answer Set

6 Definition The length of a sequence is the number of itemsets in the sequence A sequence of length k is called a k-sequence The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction The itemset i and the 1-sequence <i> have the same support An itemset with minimum support is called a large (frequent) itemset or litemset

7 AprioriAll Algorithm Each itemset in a large sequence must have minimum support Any large sequence must be a list of litemsets Finding all sequential patterns in five phases Sort Phase Litemset Phase Transformation Phase Sequence Phase Maximal Phase

8 AprioriAll Algorithm: Sort Phase
Customer-Sequence Version of the Database

9 AprioriAll Algorithm: Litemset Phase
Apriori/DHP FP Growth min_sup_count=2

10 AprioriAll Algorithm: Transformation Phase

11 AprioriAll Algorithm: Sequence Phase
Large 2-Sequences Customer Sequences Large 1-Sequences Large 4-Sequences Maximal Large Sequences Large 3-Sequences

12 Sequence Phase: Candidate Generation

13 AprioriAll Algorithm: Maximal Phase
The sequence <(3) (4 5) (8)> is contained in <(7) (3 8) (9) (4 5 6) (8)>, since (3)  (3 8), (4 5)  (4 5 6) and (8)  (8) The sequence <(3) (5)> is not contained in <(3 5)> (and vice versa) The former represents items 3 and 5 being bought one after the other The latter represents items 3 and 5 being bought together. In a set of sequences, a sequence s is maximal if s is not contained in any other sequence.

14 AprioriAll Algorithm Answer Set With minimum support set to 25%, i.e., a minimum support of 2 customers < (30) (90)> and <(30) (40 70)> are maximal <(10 20) (30)> which is only supported by customer 2 does not have minimum support <(30)>, <(40)>, <(70)>, <(90)>, <(30) (40)>, <(30) (70)> and <(40 70)>, though having minimum support, are not in the answer because they are not maximal.

15 Summary

16 Discussions AprioriAll algorithm will generate a huge set of candidate sequences If there are 1000 frequent sequences of length-1, the algorithm will generate 1000 × (1000 × 999) / 2 = 1,499,500 candidate sequences Many scans of databases in mining Difficulties at mining long sequential patterns

17 Methods to Improve AprioriAll’s Efficiency
PrefixSpan Without Candidate Generation Reduce Database Scan (Scan Database Twice) & Database Size The general idea of the method is to use projected sequence databases to confine the search and the growth of subsequence fragments

18 PrefixSpan PrefixSpan-1 PrefixSpan-2 PrefixSpan use Pseudo-Projection
Single-Level Projection PrefixSpan-2 Bi-Level Projection S-Matrix PrefixSpan use Pseudo-Projection

19 <(ef)(ab)(df)cb> <(ad)c(bc)(ae)> <a(abc)(ac)d(cf)>
Definition A sequence : < (ef) (ab) (df) c b > Elements items within an element are listed alphabetically <a(bc)dc> is a subsequence of <a(abc)(ac)d(cf)> Let min_sup = 2, <(ab)c> is a sequential pattern A Sequence Database <eg(af)cbc> 40 <(ef)(ab)(df)cb> 30 <(ad)c(bc)(ae)> 20 <a(abc)(ac)d(cf)> 10 Sequence SID

20 <(abc)(ac)d(cf)> <(_bc)(ac)d(cf)> <(_c)(ac)d(cf)>
Definition Prefix and Postfix (Projection) <a>, <aa>, <a(ab)>, <a(abc)>, … are prefixes of sequence <a(abc)(ac)d(cf)> Given Sequence <a(abc)(ac)d(cf)> Prefix Postfix /Projection <a> <(abc)(ac)d(cf)> <aa> <(_bc)(ac)d(cf)> <ab> <(_c)(ac)d(cf)>

21 PrefixSpan-1 Find Length-1 (L1) Sequential Patterns
Construct Projected Database According to L1 Mining Each Projected DB Recursively

22 PrefixSpan-1: An Example
Sequence_ID Sequence 10 < a ( abc ) ( ac ) d ( cf ) > 20 < ( ad ) c ( bc ) ( ae ) > 30 < ( ef ) ( ab ) ( df ) cb > 40 < eg ( af ) cbc > Min_Support_Count = 2 L1: <a>: 4, <b>: 4, <c>: 4 <d>: 3, <e>: 3, <f>: 3

23 PrefixSpan-1: An Example
10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> PrefixSpan-1: An Example Prefix Projected (Postfix) Database < a > <(abc)(ac)d(cf)>, <(_d)c(bc)(ae)> <(_b)(df)cb>, <(_f)cbc> < b > <(_c)(ac)d(cf)>, <(_c)(ae)>, <(df)cb>, <c> < c > <(ac)d(cf)>, <(bc)(ae)>, <b>, <bc> < d > <(cf)>, <c(bc)(ae)>, <(_f)cb> < e > <(_f)(ab)(df)cb>, <(af)cbc> < f > <(ab)(df)cb>, <cbc>

24 PrefixSpan-1: An Example
<(abc)(ac)d(cf)>, <(_d)c(bc)(ae)> <(_b)(df)cb>, <(_f)cbc> Scanning <a>-Projected database once: a:2, b:4, c:4, d:2, e:1, f:2 (_b):2, (_c):1, (_d):1, (_e):1, (_f):1 L2: <aa>: 2 , <ab>: 4 , <(ab)>: 2 <ac>: 4 , <ad>: 2 , <af>: 2

25 PrefixSpan-1: An Example
Projected (Postfix) Database < aa > <(_bc)(ac)d(cf)> < ab > <(_c)(ac)d(cf)>, <(_c)a>, <c> < (ab) > <(_c)(ac)d(cf)>, <(df)cb> < ac > <(ac)d(cf)>, <(bc)a>, <b>, <bc> < ad > <(cf)>, <(_f)cb> < af > <cb>

26 PrefixSpan-1: An Example
< ab > <(_c)(ac)d(cf)>, <(_c)a>, <c> Scanning <ab>-Projected database once: a:2 , c:2 , d:1 , f:1 , (_c):2 L3: <a(bc)>: 2, <aba>: 2, <abc>: 2

27 PrefixSpan-1: An Example
Projected (Postfix) Database < a(bc) > <(ac)d(cf)> , <a> < aba > <(_c)d(cf)> < abc > <d(cf)> Scanning <a(bc)>-Projected database once: a:2 , c:1 , d:1 , f:1 L4: <a(bc)a>: 2

28 PrefixSpan-1: An Example
Sequential Patterns < a > <a>, <aa>, <ab>, <a(bc)>, <a(bc)a>, <aba>, <abc>, <(ab)>, <(ab)c>, <(ab)d>, <(ab)f>, <(ab)dc>, <ac>, <aca>, <acb>, <acc>, <ad>, <adc>, <af> < b > <b>, <ba>, <bc>, <(bc)>, <(bc)a>, <bd>, <bdc>, <bf> < c > <c>, <ca>, <cb>, <cc> < d > <d>, <db>, <dc>, <dcb> < e > <e>, <ea>, <eab>, <eac>, <eacb>, <eb>, <ebc>, <ec>, <ecb>, <ef>, <efb>, <efc>, <efcb> < f > <f>, <fb>, <fbc>, <fc>, <fcb>

29 Completeness of PrefixSpan-1
<eg(af)cbc> 40 <(ef)(ab)(df)cb> 30 <(ad)c(bc)(ae)> 20 <a(abc)(ac)d(cf)> 10 sequence SID SDB Length-1 sequential patterns <a>, <b>, <c>, <d>, <e>, <f> <a>-projected database <(abc)(ac)d(cf)> <(_d)c(bc)(ae)> <(_b)(df)cb> <(_f)cbc> Length-2 sequential patterns <aa>, <ab>, <(ab)>, <ac>, <ad>, <af> Having prefix <a> Having prefix <aa> <aa>-proj. db <af>-proj. db Having prefix <af> Having prefix <c>, …, <f> <b>-projected database Having prefix <b> … …

30 Analysis No candidate sequence needs to be generated by PrefixSpan
Projected databases keep shrinking The major cost of PrefixSpan is the construction of projected databases

31 PrefixSpan-2 Find Length-1 Sequential Patterns
Construct Triangular Matrix M (S-Matrix) By scanning DB second time, the S-matrix can be filled up Construct Projected Database For each length-2 sequential pattern, construct its projected DB Mining each projected DB recursively

32 PrefixSpan-2: An Example
Sequence_ID Sequence 10 < a ( abc ) ( ac ) d ( cf ) > 20 < ( ad ) c ( bc ) ( ae ) > 30 < ( ef ) ( ab ) ( df ) cb > 40 < eg ( af ) cbc > Min_Support = 2 L1: <a>: 4, <b>: 4 , <c>: 4 <d>: 3, <e>: 3, <f>: 3

33 PrefixSpan-2: An Example
<ab> happens 4 times 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> <bb> happens 1 times (2,1,1) (2,2,0) (1,2,1) (1,1,1) (2,0,1) (1,2,0) (1,1,0) (1,3,0) (4,2,1) (3,3,2) (4,2,2) 1 3 2 a b c d e f <dc> happens 3 times <(ef)> happens 1 times S-Matrix

34 PrefixSpan-2: An Example
(2,1,1) (2,2,0) (1,2,1) (1,1,1) (2,0,1) (1,2,0) (1,1,0) (1,3,0) (4,2,1) (3,3,2) (4,2,2) 1 3 2 a b c d e f 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> No hope to form (_cc),So no need to count it <ab>-projected database <(_c)(ac)d(cf)> <(_c)a> <c> Local length-1 sequential patterns: <a>, <c>, <(_c)> Lead to pattern <a(bc)a> (,2, ) (,1, ) (1,0,1) 1 a c (_c)

35 Benefits of Bi-Level Projection
More patterns are found in each shoot Much Less Projections In this example, there are 53 patterns 53 Level-by-Level Projections 22 Bi-Level Projections

36 Speed-Up by Pseudo-Projection
Major Cost of PrefixSpan: Projection Postfixes of sequences often appear repeatedly in recursive projected databases When (projected) database can be held in main memory, use pointers to form projections Pointer to the sequence Offset of the postfix s=<a(abc)(ac)d(cf)> <(abc)(ac)d(cf)> <(_c)(ac)d(cf)> <a> <ab> s|<a>: ( , 2) s|<ab>: ( , 4)

37 Mining Time-Gap Sequential Patterns (TGSP)
A  B  C Time Gap Sequential Pattern A  B  C (3-5) (5-7)

38 交易時間序列資料庫 顧客交易時間序列 交易資料庫 交易時間序列資料庫 交易 編號 顧客 項目集 時間 1 { a , c } 11 2 3
{ a , d } 13 { a } 4 15 5 16 6 { b } 17 7 { c , d } 8 { c } 18 9 { d } 20 10 22 顧客 編號 顧客交易時間序列 1 a(11) , c(11) , a(16) , c(16) 2 a(13) , c(17) , d(17) , d(20) 3 a(13) , d(13) , c(18) 4 a(15) , b(17) , c(22)

39 交易時間序列 K-交易時間序列 < I1(T1), I2(T2), …, Ik(Tk)> 顧客交易時間序列
顧客1存在3-交易時間序列 <c(11), a(16), c(16)> 顧客 編號 顧客交易時間序列 1 a(11) , c(11) , a(16) , c(16) 3 a(13) , c(17) , d(17) 2 a(13) , d(13) , c(18) 4 a(15) , b(17) , c(22)

40 交易時間間隔&項目序列 K-交易時間間隔序列 K-項目序列
表示成<I1, (t1), I2, (t2), …, (tk-1), Ik> 其中Ii為單一項目,ti為Ii與Ii+1購買時間間隔 3-交易時間序列<A(10), B(15), D(30)>的交易時間間隔序列為<A, (5), B, (15), D> K-項目序列 表示成<I1, I2, …, Ik>,為多個項目依照購買時間先後排列而成的,若其相同時間購買之項目,則以編號較小之項目排在前面 3-交易時間序列< A(10), B(15), D(30)>所對應的3-項目序列為<A, B, D>

41 時間間隔序列 & 包含 K-時間間隔序列 4-時間間隔序列
表示成<I1,R1,I2,R2,…,Rk-1,Ik>,其中Ii為一個單一項目,Ri = li ~ ui,為一段時間範圍,表示項目Ii與Ii+1的購買時間間隔範圍介於li和 ui中間 4-時間間隔序列 <A, (5~8), B, (3~6), C, (5~8), D> 交易時間間隔序列<A, (7), B, (4), C, (5), D>包含於時間間隔序列<A, (5~8), B, (3~6), C, (5~8), D>

42 支持 顧客交易時間序列 C =<A(15), B(22), C(26), D(31), E(39)> 存在一個4-交易時間序列 <A(15), B(22), C(26), D(31)> 此交易時間序列的交易時間間隔序列為 <A, (7), B, (4), C, (5), D> 包含於時間間隔序列 S =<A, (5~8), B, (3~6), C, (5~8), D> 所以顧客交易時間序列C支持時間間隔序列S ,且此顧客交易時間序列C支持項目序列 < A B C D >

43 支持度 K-時間間隔序列的支持度為支持此時間間隔序列的顧客數與資料庫中所有顧客數的比值
若K-時間間隔序列的支持度大於或等於使用者所訂定的最小支持度的話,我們將其稱為K-頻繁時間間隔序列 K-項目序列的支持度為支持此項目序列的顧客數與資料庫中所有顧客數的比值 若K-項目序列的支持度大於或等於最小支持度,則我們稱之為K-頻繁項目序列

44 挖掘時間間隔序列型樣 找出1-頻繁項目序列 找出2-頻繁項目序列 產生2-項目序列資料庫 找出2-頻繁時間間隔序列
產生K-項目序列資料庫(K≧3) 找出K-頻繁時間間隔序列(K≧3) 找出時間間隔序列型樣

45 找出1-頻繁項目序列 假設最小支持度為1/2 各項目支持度為
ID 顧客交易時間序列 1 A(5) B(10) C(19) D(27) E(32) 2 A(8) B(13)F(13) C(23) D(31) 3 A(9) B(14) C(23) D(31) 4 A(13) B(19) C(29) D(37) 5 A(15) B(21) F(21) D(28) A(36) 6 C(16) A(21) B(26) F(26) D(31) 7 E(18) C(27) A(34) B(40) F(40) 8 A(18) B(24) F(24) C(27) E(33) 假設最小支持度為1/2 各項目支持度為 A=8/8=1, B=8/8=1, C=7/8, D=6/8, E=3/8, F=5/8,則 項目A,B,C,D,F為1-頻繁項目序列

46 找出2-頻繁項目序列 產生2-候選項目序列 產生2-頻繁項目序列
由 1-頻繁項目序列A,B,C,D,F 配對後可以產生<AA>, <AB>, <AC>, <AD>, <AF>, <BA>, <BB>, <BC>, <BD>, <BF>, <CA>, <CB>, <CC>, <CD>, <CF>, <FA>, <FB>, <FC>, <FD>, <FF>的2-候選項目序列 產生2-頻繁項目序列 掃描資料庫,計算各2-候選項目序列的支持度 <AA>=1/8, <AB>=1, <AC>=5/8, <AD>=6/8, <AF>=5/8, <BA>=1/8, <BB>=0, <BC>=5/8, <BD>=6/8, <BF>=5/8, <CA>=2/8, <CB>=2/8, <CC>=0, <CD>=5/8, <CF>=2/8, <FA>=1/8, <FB>=0, <FC>=2/8, <FD>=3/8, <FF>=0 產生2-頻繁項目序列(1/2) <AB>, <AC>, <AD>, <AF>, <BC>, <BD>, <BF>, <CD> 為2-頻繁項目序列

47 產生2-項目序列資料庫 ID 顧客交易時間序列 1 A(5) B(10) C(19) D(27) E(32) 2
A(8) B(13)F(13) C(23) D(31) 3 A(9) B(14) C(23) D(31) 4 A(13) B(19) C(29) D(37) 5 A(15) B(21) F(21) D(28) A(36) 6 C(16) A(21) B(26) F(26) D(31) 7 E(18) C(27) A(34) B(40) F(40) 8 A(18) B(24) F(24) C(27) E(33) 2-頻繁項目序列 <AB>, <AC>, <AD>, <AF>, <BC>, <BD>, <BF>, <CD>

48 產生2-項目序列資料庫 ID 顧客交易時間序列 1 A(5) C(10) B(13) A(15) C(20) 2 A(8) B(13) C(23) D(31) 在產生2-項目序列資料庫時,顧客1會拆解出{ A(5) C(10) }、 { A(5) B(13) }、 { A(5) A(15) }、 { C(10) B(13) }、 { C(10) A(15) }、 { C(10) C(20) }、 { B(13) A(15) } 、{ B(13) C(20) }、 { A(15) C(20) } { A(5) C(20) } 則不產生。

49 2-項目序列資料庫 AB 1 5,10 5 2 8,13 3 9,14 4 13,19 6 15,21 21,26 7 34,40 8 18,24 AC 1 5,19 14 2 8,23 15 3 9,23 4 13,29 16 8 18,27 9 AD 1 5,27 22 2 8,31 23 3 9,31 4 13,37 24 5 15,28 13 6 21,23 10 AF 2 8,13 5 15,21 6 21,26 7 34,40 8 18,24 BC 1 10,19 9 2 13,23 10 3 14,23 4 19,29 8 24,27 BD 1 10,27 17 2 13,31 18 3 14,31 4 19,37 5 21,28 7 6 26,31 CD 1 19,27 8 2 23,31 3 4 29,37 6 16,31 15 BF 2 13,13 5 21,21 6 26,26 7 40,40 8 24,24

50 找出2-頻繁時間間隔序列 最小密度:5 (個) 最小支持度:15 (個) 單元長度:1 輸出頻繁時間間隔序列A  B [1,3]
1.若項目序列AB資料列表沒有產生任何頻繁時間間隔序列, 則刪除項目序列AB的資料列表 2.刪減項目序列AB資料列表中投影點不在u1的資料

51 找出2-頻繁時間間隔序列 最小密度:2 (個) 最小支持度:4 (個) 單元長度:2 BD B  D [17,18] 1 4 BD 1
10,27 17 2 13,31 18 3 14,31 4 19,37 5 21,28 7 6 26,31 最小密度:2 (個) 最小支持度:4 (個) 單元長度:2 BD 1 4 B  D [17,18]

52 2-頻繁時間間隔序列 A  B A  C A  D A  F B  C B  D B  F C  D [5,6]
[13,16] [21,24] [5,6] [9,10] B  D B  F C  D [17,18] [ 0,0 ] [7,8]

53 2-項目序列資料庫 AB 1 5,10 5 2 8,13 3 9,14 4 13,19 6 15,21 21,26 7 34,40 8 18,24 AC 1 5,19 14 2 8,23 15 3 9,23 4 13,29 16 8 18,27 9 AD 1 5,27 22 2 8,31 23 3 9,31 4 13,37 24 5 15,28 13 6 21,23 10 AF 2 8,13 5 15,21 6 21,26 7 34,40 8 18,24 BC 1 10,19 9 2 13,23 10 3 14,23 4 19,29 8 24,27 BD 1 10,27 17 2 13,31 18 3 14,31 4 19,37 5 21,28 7 6 26,31 CD 1 19,27 8 2 23,31 3 4 29,37 6 16,31 15 BF 2 13,13 5 21,21 6 26,26 7 40,40 8 24,24

54 刪除後的2-項目序列資料庫 AB 1 5,10 5 2 8,13 3 9,14 4 13,19 6 15,21 21,26 7 34,40 8 18,24 AC 1 5,19 14 2 8,23 15 3 9,23 4 13,29 16 AD 1 5,27 22 2 8,31 23 3 9,31 4 13,37 24 AF 2 8,13 5 15,21 6 21,26 7 34,40 8 18,24 BC 1 10,19 9 2 13,23 10 3 14,23 4 19,29 BD 1 10,27 17 2 13,31 18 3 14,31 4 19,37 BF 2 13,13 5 21,21 6 26,26 7 40,40 8 24,24 CD 1 19,27 8 2 23,31 3 4 29,37

55 產生K-項目序列資料庫(K≧3) ABC 1 5,10,19 (5,9) 2 8,13,23 (5,10) 3 9,14,23 4
13,19,29 (6,10) ABCD 1 5,10,19,27 (5,9,8) 2 8,13,23,31 (5,10,8) 3 9,14,23,31 4 13,19,29,37 (6,10,8) BCD 1 10,19,27 (9,8) 2 13,23,31 (10,8) 3 14,23,31 4 19,29,37

56 利用2-項目序列資料庫所產生的3-項目序列資料庫
ABC 1 5,10,19 (5,9) 2 8,13,23 (5,10) 3 9,14,23 4 13,19,29 (6,10) ABD 1 5,10,27 (5,17) 2 8,13,31 (5,18) 3 9,14,31 4 13,19,37 (6,18) ABF 2 8,13,13 (5,0) 5 15,21,21 (6,0) 6 21,26,26 7 34,40,40 8 18,24,24 ACD 1 5,19,27 (14,8) 2 8,23,31 (15,8) 3 9,23,31 4 13,29,37 (16,8) BCD 1 10,19,27 (9,8) 2 13,23,31 (10,8) 3 14,23,31 4 19,29,37

57 找出3-頻繁時間間隔序列 BC ABC 1 10,15,30 (5,15) * (5, 15) 15 5 AB

58 找出3-頻繁時間間隔序列 BC AB ABC 1 10,15,30 (5,15) *

59 找出3-頻繁時間間隔序列 BC 輸出頻繁時間間隔序列 A  B  C [15,39] [25,34] 輸出頻繁時間間隔序列
35 25 [15,39] [25,34] A  B  C 輸出頻繁時間間隔序列 40 20 [20,34] [20,39] A  B  C 輸出頻繁時間間隔序列 刪減項目序列ABC 資料列表中的資料

60 找出3-頻繁時間間隔序列 最小密度:5 (個) 最小支持度:50(個) r1=99 r2 =114 r3 =40 r4 =128
A  B  C r1 = [1,5] [4,6] r2 = [1,4] [4,7] r4 = [2,5] [2,6] r3 = [2,8] [2,2]

61 找出3-頻繁時間間隔序列 r1 r2 r4 A  B  C r1 = [1,5] [4,6] r2 = [1,4] [4,7]

62 找出3-頻繁時間間隔序列 刪除r1 所輸出的頻繁時間間隔序列 刪除項目序列ABC資料列表中所有不在r2 和 r4 範圍的顧客交易時間序列

63 3-頻繁時間間隔序列 A  B  C A  B  D A  B  F A  C  D B  C  D
[5,6] [9,10] A  B  D [5,6] [17,18] A  B  F [5,6] [0,0] A  C  D B  C  D [9,10] [7,8] [13,16] [7,8]

64 刪除後的3-項目序列資料庫 ABC 1 5,10,19 (5,9) 2 8,13,23 (5,10) 3 9,14,23 4 13,19,29 (6,10) ABD 1 5,10,27 (5,17) 2 8,13,31 (5,18) 3 9,14,31 4 13,19,37 (6,18) ABF 2 8,13,13 (5,0) 5 15,21,21 (6,0) 6 21,26,26 7 34,40,40 8 18,24,24 ACD 1 5,19,27 (14,8) 2 8,23,31 (15,8) 3 9,23,31 4 13,29,37 (16,8) BCD 1 10,19,27 (9,8) 2 13,23,31 (10,8) 3 14,23,31 4 19,29,37

65 利用3-項目序列資料庫所產生的4-項目序列資料庫
ABCD 1 5,10,19,27 (5,9,8) 2 8,13,23,31 (5,10,8) 3 9,14,23,31 4 13,19,29,37 (6,10,8)

66 找出4-頻繁時間間隔序列 最小密度:2 (個) 最小支持度:4 (個) 單元長度:2 CD BC A  B  C  D AB
1 5,10,19,27 (5,9,8) 2 8,13,23,31 (5,10,8) 3 9,14,23,31 4 13,19,29,37 (6,10,8) BC 8 7 9 10 A  B  C  D 5 6 [5,6] [9,10] [7,8] AB

67 所有的頻繁時間間隔序列 A  B A  C A  D A  F B  C B  D B  F C  D A  B  C
[5,6] A  C [13,16] A  D A  F B  C [21,24] [5,6] [9,10] B  D B  F C  D [17,18] [0,0] [7,8] A  B  C [5,6] [9,10] A  B  D [5,6] [17,18] A  B  F [5,6] [0,0] A  C  D [13,16] [7,8] B  C  D [9,10] [7,8] A  B  C  D [5,6] [9,10] [7,8]

68 產生時間間隔序列型樣 頻繁時間間隔序列 時間間隔序列型樣 A  B  C A  B  C A  B B  C
[5,6] [9,10] A  B  C [5,6] [9,10] A  B [3,6] B  C [7,12]

69 時間間隔序列型樣 A  B  D A  B  F A  C  D A  B  C  D [5,6] [17,18]
[5,6] [17,18] A  B  F [5,6] [0,0] A  C  D [13,16] [7,8] A  B  C  D [5,6] [9,10] [7,8]

70 Important Issues Discovering Episodes
Collection of ordered events within an interval Web page C is accessed 2 min after A & B Sliding Window Concept


Download ppt "Advanced Topics in Data Mining: Sequential Patterns"

Similar presentations


Ads by Google