Approximate Frequent Itemset Mining for Streaming Data on FPGA

Slides:



Advertisements
Similar presentations
組織氣候與工作投入關係之研究 - 以某醫學中心暨委託經營管理醫院為例 中文摘要 本研究主要目的在探討某醫學中心暨二家委託經營管理醫院之組織氣候及員工工作投入之程度,及 比較不同個人屬性與醫院屬性之組織氣候與工作投入之差異,採橫斷式調查法、用多階段隨機抽樣 方式,以某醫學中心暨委託經營管理的二家醫院員工為研究對象,進行結構式問卷調查,收集時間.
Advertisements

Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order
2013届高考复习方案(第一轮) 专题课件.
全国一级建造师执业资格考试 《建设工程法规及相关知识》 高 唱
CHAPTER 9 虛擬記憶體管理 9.2 分頁需求 9.3 寫入時複製 9.4 分頁替換 9.5 欄的配置法則 9.6 輾轉現象
Foundations of Computer Science
Some Knowledge of Machine Learning(1)
加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁. 加快数据中心运转速度 — 加速业务发展 约翰•福勒 甲骨文公司系统事业部执行副总裁.
频繁模式与关联规则挖掘 林琛 博士、副教授.
关联.
云实践引导产业升级 沈寓实 博士 教授 MBA 中国云体系产业创新战略联盟秘书长 微软云计算中国区总监 WinHEC 2015
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
Minimum Spanning Trees
Author: Shigeki Takeuchi,Hiroyuki Koga, Katsuyoshi Iida,
Euler’s method of construction of the Exponential function
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
Feng Lin, Chen Song, Yan Zhuang, Wenyao Xu, Changzhi Li, Kui Ren
Thinking of Instrumentation Survivability Under Severe Accident
Population proportion and sample proportion
Manifold Learning Kai Yang
異質計算教學課程內容 「異質計算」種子教師研習營 洪士灝 國立台灣大學資訊工程學系
計算方法設計與分析 Design and Analysis of Algorithms 唐傳義
Knowledge Engineering & Artificial Intelligence Lab (知識工程與人工智慧)
第十章 基于立体视觉的深度估计.
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Sampling Theory and Some Important Sampling Distributions
What makes your Dream fly further
HLA - Time Management 陳昱豪.
微程序控制器 刘鹏 Dept. ISEE Zhejiang University
第8章 關聯分析 王海.
数据挖掘: 概念和技术 — Chapter 6 — ©张晓辉 复旦大学 (国际)数据库研究中心
第三章 基本觀念 電腦繪圖與動畫 (Computer Graphics & Animation) Object Data Image
Interval Estimation區間估計
Online job scheduling in Distributed Machine Learning Clusters
基于类关联规则的分类 Classification Based on Class-Association Rules
第三章 基本觀念 電腦繪圖與動畫 (Computer Graphics & Animation) Object Data Image
A high payload data hiding scheme based on modified AMBTC technique
Version Control System Based DSNs
研究技巧與論文撰寫方法 中央大學資管系 陳彥良.
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
2019/4/8 A Load Balancing Mechanism for multiple SDN Controllers based on Load Informing Strategy Miultiple controller 的 load balancing 機制,使用一個叫 Load informing.
Maintaining Frequent Itemsets over High-Speed Data Streams
Guide to a successful PowerPoint design – simple is best
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
中国科学技术大学计算机系 陈香兰 Fall 2013 第三讲 线程 中国科学技术大学计算机系 陈香兰 Fall 2013.
從 ER 到 Logical Schema ──兼談Schema Integration
会计基础 第二章 会计要素与会计等式 刘颖
软件测试技术-白盒测试 张志强 2006.
Inter-band calibration for atmosphere
A Data Mining Algorithm for Generalized Web Prefetching
Lightweight Data-flow Analysis for Execution-driven Constraint Solving
資料庫系統實驗室 指導教授:張玉盈.
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
第10章 存储器接口 罗文坚 中国科大 计算机学院
BiCuts: A fast packet classification algorithm using bit-level cutting
高精度延时发生器在 Xilinx 7 Series FPGA 中的实现
演算法分析 (Analyzing Algorithms)
CHAPTER 6 Concurrency:deadlock And Starvation
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
Resources Planning for Applied Research
Reversible Data Hiding in Color Image with Grayscale Invariance
Race Conditions and Semaphore
商業智慧實務 Practices of Business Intelligence
OrientX暑期工作总结及计划 XML Group
Experimental Analysis of Distributed Graph Systems
A Trie-based Approach to Fast Flow Recognition for OpenFlow
Gaussian Process Ruohua Shi Meeting
OPTIMA Optical Technology(Shenzhen) Co., Ltd 奥蒂玛光学科技(深圳)有限公司
Presentation transcript:

Approximate Frequent Itemset Mining for Streaming Data on FPGA Yubin Li1, Yuliang Sun1, Guohao Dai1, Qiang Xu2, Yu Wang1, Huazhong Yang1 1Dept. of E.E., Tsinghua University, Beijing, China 2Dept. of C.S., The Chinese University of Hong Kong, Hong Kong, China

Introduction to FIM FIM: Frequent Itemset Mining is designed to find frequently occurring itemsets among a series of transactions. It is a fundamental problem of mining association rules. FIM-DS: Frequent Itemset Mining from a Data Stream (real time) Challenges: Exponential candidate space an L-length transaction generates 2L subsets Complexity in data itself itemsets have different number of items (input with different width) Real-time requirements storing the infinite data into memory is infeasible 介绍FIM概念:找到输入集合中同时出现频率高的项组成的集合; 说明主要难点: 1)候选项集非常多,指数级; 2)输入数据不一致,每个集合可能含有不定数量的项; 3) 实时性的要求

Related Work Multi-scan approaches (Exact Method) Algorithms: Aprior[1], FP-growth[2], Eclat[3] Require to scan original data more than one time (real-time violation) Approximate approaches Sample algorithms: take parts of the new candidates into consideration when the candidate table is full (Sticky Sampling[4], Chernoff-based algorithm[5]) Delete algorithms: count all candidates but delete lower-support candidates from current memory (Lossy Counting[4], StreamMining algorithm[6]) 介绍现有工作,精确算法需要多次扫面原始数据,难以满足实时性的需求; 介绍现有近似算法,主要分为两类,一类是采样算法,在候选项存储buffer满了之后,从输入集合生成的所有子集中选择部分进行统计 ; 另一类是删除算法, 将输入集合生成的所有子集进行频度统计,在候选项存储buffer满了之后,删除掉buffer中频度低的候选集; 上述两类近似算法主要解决的指数级的候选集和有限的存储空间的矛盾,都需要将输入的每个集合产生它的所有或大部分子集,然后一一对每个子集进行处理 ; 我们工作是通过避免指数级子集的生成与比较过程,以每个输入的集合作为一个处理对象,加速FIM;(Motivation) Exponential candidates are generated from each received transaction. Then they treat each candidate as an element and compare it with candidates in the candidate table. [1] R. Agrawal, et al., “Fast algorithms for mining association rules,” VLDB1994. [2] J. Han, et al, “Frequent pattern mining: current status and future directions,” 2007 [3] Y. Zhang et al, An fpga-based accelerator for frequent item-set mining, TRETS2013. [4] G. S. Manku et al, Approximate frequency counts over data streams, VLDB2002. [5] R.C.-W. et al, Mining top-k frequent itemsets from data streams, 2006. [6] R. Jin et al, An algorithm for in-core frequent itemset mining on streaming data, 2005

Assume a new input {A,C,D,E} Motivation Candidate table {A,B}:12 {A,C}:10 {B,D}:9 {A,B,D}:9 {A,D}:9 {A,E}:7 {B,E}:4 {A,B,E}:3 {A,C}:11 {A,D}:10 {A,D} {A,C} {A,C} {A,D} {A,D} {A,D} {A,D} Assume a new input {A,C,D,E} Subsets: {A,C} {A,D} {A,E} {C,D} {C,E} {D,E} {A,C,D} {A,C,E} {C,D,E} {A,C,D,E} Weaks: Exponential subsets generation and comparisons The width of input is variable because of the different number of items Itemset comparisons may need to compare one item each cycle and consumes different cycles for different input We try to: Regard one input as one unit and avoid exponential subsets generation Adopt special data representation to fix the data width and decrease the bandwidth requirement Use simple operation to replace multiple item comparisons Accelerate it with high parallelism of FPGA 我们简单示例上述算法。 假设一个候选项集的buffer内容如上表,每个地址的内容是候选项集和它的频度。 有一个新的输入{A,C,D,E},首先产生它的子集,然后把每个子集作为独立个体与表中的候选项集一一比较,直到相等,表中该集合的频度加1; 图中示例了子集{A,C} {A,D}的操作; 上述算法会对所有子集或者部分子集完成上述过程,会消耗大量的时间和计算。 然后总结现有方案的缺点,尤其是在硬件实现方面:1)指数级的子集产生与比较; 2)输入位宽不等,限制了数据传输速度,增大了集合比较的复杂度,消耗更多的时间; 我们想要达到的效果: 1)输入集合作为单位数据进行处理,避免指数级子集的操作;2)使用特殊的数据格式表示输入,固定位宽,增加硬件中数据传输速度;3)使用更简单的“位与”操作替换集合中多个数据大小的比较; 4)利用FPGA的高并行度加速

Our Work Propose the Space-Saving based FIM-DS algorithm EHBR data representation: adopt the Equivalent Horizontal Bitvector Representation to represent every transaction (itemset). Transaction independent (real time), while EVBR (Eclat algorithm) depend on all the input transaction Avoids exponential candidates generation Take “Bitwise-AND” operation between bitvectors to find their complex set relationships Avoids exponential candidates comparisons Bitwise-AND operation: bitvector a represent one input transaction bitvector b represent one frequent candidate if (a & b == b) b is subset of a, and increase its support 开始介绍我们的算法,避免了指数级子集的生成与比较过程: 1. 采用等效横向比特向量表示一个输入集合, 好处: 1)每个输入被处理成相同位宽,便于硬件系统的数据传输; 2) 可以通过简单的“位与”操作获得输入集合与候选项集buffer中候选集之间的包含关系,避免了依次比较集合每一项的过程; 2. Ppt中的四个表:a)输入集合示例 (横向表示) b)输入集合的纵向表示 c)输入集合的等效横向比特向量表示 d)输入集合的等效纵向比特向量表示 (a) Example input transactions (b) Corresponding vertical representation (c) EVBR data representation (d) EHBR data representation

Our Work Space-Saving based FIM-DS algorithm Initialization Phase Initialize the candidate table with interested itemsets or subsets of the first few input transactions. Frequency Counting Phase (support update) Take “bitwise-and” operation between input and candidates in table, and update their supports. Replacement Phase (candidate update) Replace small support candidates in table with some subsets frequently occurring in recent period Frequency counting phase and replacement phase runs alternately. The number of operations in either phase can be adjusted. 说明上一页比特向量输入后的算法流程: 初始化候选项集buffer, 可以使用感兴趣的集合 或者 最开始的输入的子集; 候选项集频度统计过程:将输入的每一个集合的横向比特向量与候选项集buffer中的每个候选项集“位与”操作,并更新buffer中候选项集的频度;该过程不会更改buffer中的候选项集,只更新频度; 候选项集更新过程: 当候选项集频度统计过程统计了一定量的输入之后,使用最近频度较高的项组成的集合输入,替换掉buffer中频度低的候选项集;

Our Work Hardware Accelerator Translators : translate input transactions to bitvectors, and vice versa. Counter: count the number of input transactions processed in one frequent counting phase. When it reaches the user-defined threshold, the system steps into replacement phase. PEs-pipeline accelerator: PEs are arranged in a ring-pipeline. It implements the frequency counting phase and replacement phase alternately. Encoder/Decoder: compress the bitvector (binary sequece) to decrease the bandwidth requirement (applied when item database is very large). 说明硬件系统: 1) translators: 负责将输入集合转换成横向比特向量, 并将输出比特向量转换成输出集合; 2)counter: 统计每个循环周期中 候选项集频度统计过程 处理了多少输入, 当达到一定量之后 进入 候选项集更新过程; 同时 统计 输入中每一项的频度, 用来产生可能的频繁项集 在候选项集更新过程中 替换 buffer中的频度较低的候选项集; 3)PE-ring: 将处理单元连接成环状,尽可能增加硬件系统的流水并行度,提高比特向量处理的并行度; 4)压缩和解压缩:当比特向量位宽很大但比较稀疏时,可以使用针对二进制序列的压缩和解压算法对原始比特向量进行处理,减少硬件系统的数据传输带宽需求; hardware system overview PEs pipeline accelerator processing element (PE)

Evaluation Experimental Setup Software: Hardware: Datasets: Intel(R) Core(TM) i7-4790 CPU (@3.60GHz) Hardware: VC707 board with an Virtex7 485t chip working at 150MHz Datasets: Dataset Num.Items Num.Trans. Average Length Size (MB) chess 75 3196 37 0.342 connect 129 67557 43 9.300 T40I10D0 3N500K 299 500k 40 214.000 T10.I4.1000K 10k 1000k 10 121.000 简单说明我们的FIM的测试环境: 软硬件 测试集:项的数量和输入集合的数量变化很大,我们都有覆盖;

Evaluation Resource Utilization Performance Available Acc.128 Acc.256 Acc.512 Used Utilization LUTs 303600 60903 20.06% 121957 40.17% 270508 89.10% REGs 607200 48698 8.02% 104500 17.21% 231647 38.15% Performance Dataset Time(s) [work] Our SW.512 Our SW.1024 Our SW.1024x10 Our FPGA.512 Speedup chess.dat >3.3[1] 0.072 45.8 0.375 8.8 4.398 0.75 1.9x10-4 1.7x104 connect.dat >121.5[1] 1.152 105.5 1.863 65.2 14.482 8.4 2.4x10-3 5.0x104 T40I10D0 3N500K 12.05[2] 8.592 1.4 17.048 0.7 - 0.21 57 T10.I4.1000K 17.0[3] 209.920 0.1 405.409 22.6 4.0[4] 5.3 硬件资源使用情况,在系统上最大实现了512PE的加速电路,buffer大小为1024,每个地址可以存储128bit; 速度比较:1)软件实现,对项数较小的测试集取得了更好的加速效果,但对于项数非常大的集合,没有获得较高的处理速度; 2)硬件实现,对项数较小的测试集取得了四个数量级的加速比,在项数非常大的集合,也取得了优于现有方案的处理速度; 数据集T10.I4.1000K的事务平均长度为10,但是它的项数据库大小为10k。这 意味着我们需要用一个10000-bit位宽的数值描述一个事务,而一个10项的事务 本身只需要10个32-bit(或16-bit)就可以。在软件方案的实现中,每次只处理一 个32-bit,所以在数据集T10.I4.1000K上,软件方案速度有限,大于6分钟。另一方 面,因为一个事务的平均长度为10,即使生成全部的子集,也可以较快的实现, 因此,性能相对较好。数据集T40I10D0 3N500K具有一个相对小一些的项数据库, 平均一个事务含有更多的项。我们软件方案的性能相对就好一些。 1-2 FPGA 3-4 CPU Our proposed algorithm is efficient when item database is small, and its performance decreases as the item database grows; Our hardware accelerator achieves better performance on both small item database datasets and large item database datasets. [1] S. Sun, et al, Design and analysis of a reconfigurable platform for frequent pattern mining, Parallel and Distributed Systems 2011 [2] Y. Zhang et al, An fpga-based accelerator for frequent item-set mining, TRETS2013. [3] G. S. Manku et al, Approximate frequency counts over data streams, VLDB2002. [4] R. Jin et al, An algorithm for in-core frequent itemset mining on streaming data, 2005

To Do… Further Investigate the relationship between accuracy rate and different parameters in the proposed algorithm: threshold_trans : the number of transactions to process in one frequency counting phase; threshold_item : item whose support is not less than the threshold can be one element of the input subset in replacement phase; threshold_replacement : the maximal number of replacement can occurs in one replacement phase; … 后续工作主要探讨算法中的几个参量设置与算法准确度之间的关系,并体现在后续硬件加速系统中;

Thanks for your listening!