Download presentation
Presentation is loading. Please wait.
Published byFabia Ambra Bernardini Modified 5年之前
1
A Data Mining Algorithm for Generalized Web Prefetching
author: Alexandros Nanopoulos、 Dimitrios Katsaros、 Yannis Manolopoulos source: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 5 year: OCTOBER. 2003 presented by C. W. Hsu
2
Outline Introduction BACKGROUND GENERALIZED ALGORITHM
Experiment Result Conclusion
3
Introduction The Web has become the primary means for information dissemination. It is being used for commercial, entertainment, or educational purposes, and, thus, its popularity resulted in heavy traffic in the Internet
4
Web Data Traffic
5
The cache of web document
6
Cache problems
7
Transparent Informed Prefetching
8
Proposed architecture of a prediction-enabled Web server
9
Motivation Predictive Web prefetching Server - Cache Client - Cache
一般伺服器會將文件快取到主記憶中,但是目前也有很多人將快取資料直接傳送到客戶端。
10
Method Transparent Informed Prefetching DG:Dependency Graph
PPM:Prediction by Partial Match WM:Web log Mining WMo:Web log Mining - Ordering
11
Dependency Graph First Markov Model ABCACBD and CCABCBCA
12
Prediction by Partial Match
All-m-Order Markov model(Second Markov Model) ABCACBD and CCABCBCA
13
Pruning Criteria support-pruning confidence-pruning error-pruning
14
WMo The use of Web log mining methods for the discovery of association rules among user accesses. Association rule discovery algorithms represent candidates as sets of documents, which do not consider this ordering.
15
WMo C1 = {A, B} T = {B,C,A,D} S1 = {A}, S2={B} WMo={A,B}、{B、A}
S1 = {A,B,C} S2 = {A,B,D} WMo = {A,B,C,D}、{A,B,D,C} 當C1 = {A,B}時,T={B,C,A,D}將無法有效的找到
16
Candidate generation procedure
17
PERFORMANCE RESULTS Usefulness Accuracy Network traffic
18
The Parameters for the Generator
Mean value of the noise = 噪音的平均值 Variance of the noise = 噪音的變化 corProb = 最後一個節點時的移動到另一個節點的可能性 Noise = 分成兩種一種是開始移動的節點與隨機插入一個節點至交易中。
19
Experimental Evaluation
Mean value of the noise = 噪音的平均值 Variance of the noise = 噪音的變化
20
Experimental Evaluation
Mean value of the noise = 噪音的平均值 Variance of the noise = 噪音的變化
21
Experimental Evaluation
Mean value of the noise = 噪音的平均值 Variance of the noise = 噪音的變化
22
Cache hits as a function of the cache size
Mean value of the noise = 噪音的平均值 Variance of the noise = 噪音的變化
23
PERFORMANCE RESULTS http://ita.ee.lbl.gov/html/traces.html T = Traffic
A = Accuracy U = Usefulness W = 5 => 統計的交易最高長度(代表一次鎖定的長度)
24
PERFORMANCE RESULTS WMo/wp = 沒有使用任何Pruning的技術
25
Discussion WMo prefetches more documents correctly than PPM and DG.
Algorithm WM presents the worst performance because it does not consider
26
Discussion It is respect to the two factor: Order -> DG
Noise -> PPM 實驗上指出PPM容易被Noise影響,反之DG和WM不容易造成影響。 PPM會因為Noise增加而影響正確性的部份。 DG容易因為order的因素而造成影嚮,但是PPM和WMo則不會。
27
END
Similar presentations