Approximate Frequent Itemset Mining for Streaming Data on FPGA

Approximate Frequent Itemset Mining for Streaming Data on FPGA
Yubin Li1, Yuliang Sun1, Guohao Dai1, Qiang Xu2, Yu Wang1, Huazhong Yang1 1Dept. of E.E., Tsinghua University, Beijing, China 2Dept. of C.S., The Chinese University of Hong Kong, Hong Kong, China

Introduction to FIM FIM: Frequent Itemset Mining is designed to find frequently occurring itemsets among a series of transactions. It is a fundamental problem of mining association rules. FIM-DS: Frequent Itemset Mining from a Data Stream (real time) Challenges: Exponential candidate space an L-length transaction generates 2L subsets Complexity in data itself itemsets have different number of items (input with different width) Real-time requirements storing the infinite data into memory is infeasible 介绍FIM概念：找到输入集合中同时出现频率高的项组成的集合；说明主要难点： 1）候选项集非常多，指数级； 2）输入数据不一致，每个集合可能含有不定数量的项； 3）实时性的要求

Related Work Multi-scan approaches (Exact Method)
Algorithms: Aprior[1], FP-growth[2], Eclat[3] Require to scan original data more than one time (real-time violation) Approximate approaches Sample algorithms: take parts of the new candidates into consideration when the candidate table is full (Sticky Sampling[4], Chernoff-based algorithm[5]) Delete algorithms: count all candidates but delete lower-support candidates from current memory (Lossy Counting[4], StreamMining algorithm[6]) 介绍现有工作，精确算法需要多次扫面原始数据，难以满足实时性的需求；介绍现有近似算法，主要分为两类，一类是采样算法，在候选项存储buffer满了之后，从输入集合生成的所有子集中选择部分进行统计；另一类是删除算法，将输入集合生成的所有子集进行频度统计，在候选项存储buffer满了之后，删除掉buffer中频度低的候选集；上述两类近似算法主要解决的指数级的候选集和有限的存储空间的矛盾，都需要将输入的每个集合产生它的所有或大部分子集，然后一一对每个子集进行处理；我们工作是通过避免指数级子集的生成与比较过程，以每个输入的集合作为一个处理对象，加速FIM；（Motivation） Exponential candidates are generated from each received transaction. Then they treat each candidate as an element and compare it with candidates in the candidate table. [1] R. Agrawal, et al., “Fast algorithms for mining association rules,” VLDB [2] J. Han, et al, “Frequent pattern mining: current status and future directions,” 2007 [3] Y. Zhang et al, An fpga-based accelerator for frequent item-set mining, TRETS2013. [4] G. S. Manku et al, Approximate frequency counts over data streams, VLDB2002. [5] R.C.-W. et al, Mining top-k frequent itemsets from data streams, [6] R. Jin et al, An algorithm for in-core frequent itemset mining on streaming data, 2005

Assume a new input {A,C,D,E}
Motivation Candidate table {A,B}:12 {A,C}:10 {B,D}:9 {A,B,D}:9 {A,D}:9 {A,E}:7 {B,E}:4 {A,B,E}:3 {A,C}:11 {A,D}:10 {A,D} {A,C} {A,C} {A,D} {A,D} {A,D} {A,D} Assume a new input {A,C,D,E} Subsets: {A,C} {A,D} {A,E} {C,D} {C,E} {D,E} {A,C,D} {A,C,E} {C,D,E} {A,C,D,E} Weaks: Exponential subsets generation and comparisons The width of input is variable because of the different number of items Itemset comparisons may need to compare one item each cycle and consumes different cycles for different input We try to: Regard one input as one unit and avoid exponential subsets generation Adopt special data representation to fix the data width and decrease the bandwidth requirement Use simple operation to replace multiple item comparisons Accelerate it with high parallelism of FPGA 我们简单示例上述算法。假设一个候选项集的buffer内容如上表,每个地址的内容是候选项集和它的频度。有一个新的输入{A,C,D,E}，首先产生它的子集，然后把每个子集作为独立个体与表中的候选项集一一比较，直到相等，表中该集合的频度加1；图中示例了子集{A,C} {A,D}的操作；上述算法会对所有子集或者部分子集完成上述过程，会消耗大量的时间和计算。然后总结现有方案的缺点，尤其是在硬件实现方面：1）指数级的子集产生与比较； 2）输入位宽不等，限制了数据传输速度，增大了集合比较的复杂度，消耗更多的时间；我们想要达到的效果： 1）输入集合作为单位数据进行处理，避免指数级子集的操作；2）使用特殊的数据格式表示输入，固定位宽，增加硬件中数据传输速度；3）使用更简单的“位与”操作替换集合中多个数据大小的比较； 4）利用FPGA的高并行度加速

Our Work Propose the Space-Saving based FIM-DS algorithm
EHBR data representation: adopt the Equivalent Horizontal Bitvector Representation to represent every transaction (itemset). Transaction independent (real time), while EVBR (Eclat algorithm) depend on all the input transaction Avoids exponential candidates generation Take “Bitwise-AND” operation between bitvectors to find their complex set relationships Avoids exponential candidates comparisons Bitwise-AND operation: bitvector a represent one input transaction bitvector b represent one frequent candidate if (a & b == b) b is subset of a, and increase its support 开始介绍我们的算法，避免了指数级子集的生成与比较过程： 1. 采用等效横向比特向量表示一个输入集合，好处： 1）每个输入被处理成相同位宽，便于硬件系统的数据传输； 2）可以通过简单的“位与”操作获得输入集合与候选项集buffer中候选集之间的包含关系，避免了依次比较集合每一项的过程； 2. Ppt中的四个表：a）输入集合示例（横向表示） b）输入集合的纵向表示 c）输入集合的等效横向比特向量表示 d）输入集合的等效纵向比特向量表示 (a) Example input transactions (b) Corresponding vertical representation (c) EVBR data representation (d) EHBR data representation

Our Work Space-Saving based FIM-DS algorithm Initialization Phase
Initialize the candidate table with interested itemsets or subsets of the first few input transactions. Frequency Counting Phase (support update) Take “bitwise-and” operation between input and candidates in table, and update their supports. Replacement Phase (candidate update) Replace small support candidates in table with some subsets frequently occurring in recent period Frequency counting phase and replacement phase runs alternately. The number of operations in either phase can be adjusted. 说明上一页比特向量输入后的算法流程：初始化候选项集buffer，可以使用感兴趣的集合或者最开始的输入的子集；候选项集频度统计过程：将输入的每一个集合的横向比特向量与候选项集buffer中的每个候选项集“位与”操作，并更新buffer中候选项集的频度；该过程不会更改buffer中的候选项集，只更新频度；候选项集更新过程：当候选项集频度统计过程统计了一定量的输入之后，使用最近频度较高的项组成的集合输入，替换掉buffer中频度低的候选项集；

Our Work Hardware Accelerator
Translators : translate input transactions to bitvectors, and vice versa. Counter: count the number of input transactions processed in one frequent counting phase. When it reaches the user-defined threshold, the system steps into replacement phase. PEs-pipeline accelerator: PEs are arranged in a ring-pipeline. It implements the frequency counting phase and replacement phase alternately. Encoder/Decoder: compress the bitvector (binary sequece) to decrease the bandwidth requirement (applied when item database is very large). 说明硬件系统： 1） translators：负责将输入集合转换成横向比特向量，并将输出比特向量转换成输出集合； 2）counter: 统计每个循环周期中候选项集频度统计过程处理了多少输入，当达到一定量之后进入候选项集更新过程；同时统计输入中每一项的频度，用来产生可能的频繁项集在候选项集更新过程中替换 buffer中的频度较低的候选项集； 3）PE-ring：将处理单元连接成环状，尽可能增加硬件系统的流水并行度，提高比特向量处理的并行度； 4）压缩和解压缩：当比特向量位宽很大但比较稀疏时，可以使用针对二进制序列的压缩和解压算法对原始比特向量进行处理，减少硬件系统的数据传输带宽需求； hardware system overview PEs pipeline accelerator processing element (PE)

Evaluation Experimental Setup Software: Hardware: Datasets:
Intel(R) Core(TM) i CPU Hardware: VC707 board with an Virtex7 485t chip working at 150MHz Datasets: Dataset Num.Items Num.Trans. Average Length Size (MB) chess 75 3196 37 0.342 connect 129 67557 43 9.300 T40I10D0 3N500K 299 500k 40 T10.I4.1000K 10k 1000k 10 简单说明我们的FIM的测试环境：软硬件测试集：项的数量和输入集合的数量变化很大，我们都有覆盖；

Evaluation Resource Utilization Performance
Available Acc.128 Acc.256 Acc.512 Used Utilization LUTs 303600 60903 20.06% 121957 40.17% 270508 89.10% REGs 607200 48698 8.02% 104500 17.21% 231647 38.15% Performance Dataset Time(s) [work] Our SW.512 Our SW.1024 Our SW.1024x10 Our FPGA.512 Speedup chess.dat >3.3[1] 0.072 45.8 0.375 8.8 4.398 0.75 1.9x10-4 1.7x104 connect.dat >121.5[1] 1.152 105.5 1.863 65.2 14.482 8.4 2.4x10-3 5.0x104 T40I10D0 3N500K 12.05[2] 8.592 1.4 17.048 0.7 - 0.21 57 T10.I4.1000K 17.0[3] 0.1 22.6 4.0[4] 5.3 硬件资源使用情况，在系统上最大实现了512PE的加速电路，buffer大小为1024，每个地址可以存储128bit；速度比较：1）软件实现，对项数较小的测试集取得了更好的加速效果，但对于项数非常大的集合，没有获得较高的处理速度； 2）硬件实现，对项数较小的测试集取得了四个数量级的加速比，在项数非常大的集合，也取得了优于现有方案的处理速度；数据集T10.I4.1000K的事务平均长度为10,但是它的项数据库大小为10k。这意味着我们需要用一个10000-bit位宽的数值描述一个事务,而一个10项的事务本身只需要10个32-bit(或16-bit)就可以。在软件方案的实现中,每次只处理一个32-bit,所以在数据集T10.I4.1000K上,软件方案速度有限,大于6分钟。另一方面,因为一个事务的平均长度为10,即使生成全部的子集,也可以较快的实现, 因此,性能相对较好。数据集T40I10D0 3N500K具有一个相对小一些的项数据库, 平均一个事务含有更多的项。我们软件方案的性能相对就好一些。 1-2 FPGA 3-4 CPU Our proposed algorithm is efficient when item database is small, and its performance decreases as the item database grows; Our hardware accelerator achieves better performance on both small item database datasets and large item database datasets. [1] S. Sun, et al, Design and analysis of a reconfigurable platform for frequent pattern mining, Parallel and Distributed Systems 2011 [2] Y. Zhang et al, An fpga-based accelerator for frequent item-set mining, TRETS [3] G. S. Manku et al, Approximate frequency counts over data streams, VLDB2002. [4] R. Jin et al, An algorithm for in-core frequent itemset mining on streaming data, 2005

To Do… Further Investigate the relationship between accuracy rate and
different parameters in the proposed algorithm: threshold_trans : the number of transactions to process in one frequency counting phase; threshold_item : item whose support is not less than the threshold can be one element of the input subset in replacement phase; threshold_replacement : the maximal number of replacement can occurs in one replacement phase; … 后续工作主要探讨算法中的几个参量设置与算法准确度之间的关系，并体现在后续硬件加速系统中；

Thanks for your listening!

Approximate Frequent Itemset Mining for Streaming Data on FPGA

Similar presentations

Presentation on theme: "Approximate Frequent Itemset Mining for Streaming Data on FPGA"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Approximate Frequent Itemset Mining for Streaming Data on FPGA

Similar presentations

Presentation on theme: "Approximate Frequent Itemset Mining for Streaming Data on FPGA"— Presentation transcript:

Similar presentations

About project

反馈