POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA

Slides:



Advertisements
Similar presentations
Exercise 1 EECS, Peking University Exercise in Query Processing.
Advertisements

Chapter 2 Combinatorial Analysis 主講人 : 虞台文. Content Basic Procedure for Probability Calculation Counting – Ordered Samples with Replacement – Ordered.
國立成功大學工程科學系 Department of Engineering Science -National Cheng Kung University 控制與訊號處理實驗室 Control & Signal Processing Lab MATLAB/Simulink 教學.
MMN Lab 未來教室與雲端化學習 Yueh-Min Huang Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan
Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order
專利申請實務 王淑靜
統合分析臨床試驗實之文獻品質評分:以針灸療法之統合分析為例
Performance Evaluation
資料庫設計 Database Design.
多菌株乳酸菌組合在飼料添加物及保健食品之應用-
-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學 資訊管理系 李麗華 教授.
Chapter 8 Liner Regression and Correlation 第八章 直线回归和相关
Chaoping Li, Zhejiang University
Leftmost Longest Regular Expression Matching in Reconfigurable Logic
Operating System CPU Scheduing - 3 Monday, August 11, 2008.
Operators and Expressions
An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET
Rate and Distortion Optimization for Reversible Data Hiding Using Multiple Histogram Shifting Source: IEEE Transactions On Cybernetics, Vol. 47, No. 2,February.
Improving classification models with taxonomy information
Thinking of Instrumentation Survivability Under Severe Accident
指導教授:許子衡 教授 報告學生:翁偉傑 Qiangyuan Yu , Geert Heijenk
Population proportion and sample proportion
資料庫結構與組織.
計算方法設計與分析 Design and Analysis of Algorithms 唐傳義
第五讲 数据的分组、合并与转换.
Journal Citation Reports® 期刊引文分析報告的使用和檢索
Logistics 物流 昭安國際物流園區 總經理 曾玉勤.
Sampling Theory and Some Important Sampling Distributions
Decision Support System (靜宜資管楊子青)
HLA - Time Management 陳昱豪.
Responsibility Accounting
组合逻辑3 Combinational Logic
971研究方法課程第九次上課 認識、理解及選擇一項適當的研究策略
Jia Zhao Simon Fraser University BC, Canada
Dynamic Traffic Diversion in SDN: Testbed vs Mininet
Programmable Logic Architecture Verilog HDL FPGA Design
Interval Estimation區間估計
2019/1/2 Experimental Analysis on Performance Anomaly for Download Data Transfer at IEEE n Wireless LAN 在IEEE n無線LAN上下載數據傳輸的性能異常的實驗分析 Author:
Formal Pivot to both Language and Intelligence in Science
消費者偏好與效用概念.
基于类关联规则的分类 Classification Based on Class-Association Rules
Decision Support System (靜宜資管楊子青)
Towards Emotional Awareness in Software Development Teams
COMPUTEX 2014 A note given in BCC class on June 4, 2014
Microsoft SQL Server 2008 報表服務_設計
Version Control System Based DSNs
成品检查报告 Inspection Report
高性能计算与天文技术联合实验室 智能与计算学部 天津大学
2019/4/8 A Load Balancing Mechanism for multiple SDN Controllers based on Load Informing Strategy Miultiple controller 的 load balancing 機制,使用一個叫 Load informing.
Guide to a successful PowerPoint design – simple is best
在Microsoft Access 下 建立資料庫
相關統計觀念復習 Review II.
中国科学技术大学计算机系 陈香兰 2013Fall 第七讲 存储器管理 中国科学技术大学计算机系 陈香兰 2013Fall.
虚 拟 仪 器 virtual instrument
從 ER 到 Logical Schema ──兼談Schema Integration
Deep Learning with Limited Numerical Precision
An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。 本文目的: 做到既有较高的藏量(1bpp),
BiCuts: A fast packet classification algorithm using bit-level cutting
Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨
钱炘祺 一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺
Speaker : YI-CHENG HUNG
5. Combinational Logic Analysis
何正斌 博士 國立屏東科技大學工業管理研究所 教授
Arguments to the main Function and Final Project
Class imbalance in Classification
Introduction to Computer Security and Cryptography
Firsthand Learning Field Trip to CCI Site.
Hybrid fractal zerotree wavelet image coding
Presentation transcript:

POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA 這篇主要提出的power efficient,是我們pipeline架構常見的一個現象,前面已經不符合某個條件下,那後面的比對也不用再進行 所以他們用的方法是關掉後面不用進行比對的memory module,後面有一些數據是我們可以參考的 Authors: Yun R. Qu, Viktor K. Prasanna Publisher: 2015 25th International Conference on Field Programmable Logic and Applications (FPL) Presenter: Chun-Yu, Li Date: 105/10/5 Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.

Computer & Internet Architecture Lab INTRODUCTION We construct a modular PE to match a stride of the packet header against a stride of a range boundary (A naïve approach to match ranges is to deploy a W-bit comparator for each range boundary). We concatenate multiple PEs into a systolic array to sustain high clock rates for large rule sets. We employ a self-enabled power gating technique on our architecture. The modular PEs are selectively enabled to save the memory access power. 每一個PE就是一個s-bit的comparator,如果是upper bound,就用小於等於的comparator,如果是lower bound,就用大於等於的comparator, 當然也可以直接做全部bit的比對,但是實作上就會出現電路比較慢的情形。 最後所有的PE會串成systolic array來進行比對(這個架構後面會看到) 這是剛剛提到的,後面不用進行比對的memory module,他們會用clock gating的方式去關掉memory省電。 Computer & Internet Architecture Lab CSIE NCKU

MODULAR PROCESSING ELEMENT s: s-bit stride. c: number of range boundaries compared by a modular PE. PE consist of c data memory module, c comparators. c = 2 1.解釋一個PE處理s bit, c boundaries 2.解釋小於等於的comparator 3.解釋紅圈 ≤ comparator eql0_in less0_in x & y eql0 less0 en0 * 1 en0_in x=y x<y x>y Computer & Internet Architecture Lab CSIE NCKU

Computer & Internet Architecture Lab SYSTOLIC ARRAY The strides from the input packet headers are propagated vertically. Each column of PEs compares an s-bit input stride with the corresponding strides of all the N rules. The number of rows required for N rule is 𝑁 𝑐 , while the total number of columns required for ( 𝑊 𝑚 )–bit packet is 2∙ 𝑊 𝑚 𝑠 . 所以我們可以看到所有的PE會串成這種SYSTOLIC ARRAY的架構,這個架構是Decomposition based,比對完某一維度,再比對下一個維度。 每個維度有各自的upperbound、lowerbound比對。 所以有幾個row就取決於一個PE可以處理多少rule,總共需要幾個row做處理。從column來看的話,就是 所有要進行比對的bit數除與stride數,乘與2是因為需要upperbound跟lowerbound的比對 Computer & Internet Architecture Lab CSIE NCKU

Computer & Internet Architecture Lab SYSTOLIC ARRAY (Type I) for the same field, both comparing upperbounds/lowerbounds 那接線之間會有三種type,如果是同一種 (Type III) for different fields (Type II) for the same field, one comparing upperbound but the other comparing lowerbound Computer & Internet Architecture Lab CSIE NCKU

SELF-ENABLED POWER GATING Chaining: If one data memory module is deactivated, a chain of data memory modules in the following PEs will be deactivated. To make the self-enabled power gating more effective, for a given packet header and a given rule, we need to report the mismatch, if any, as early as possible Computer & Internet Architecture Lab CSIE NCKU

ENTROPY-BASED SCHEDULING Number of bits in field 𝑚 as 𝑊 𝑚 , where 𝑚=0,1,…,𝑀−1 A range in field m may cover multiple values in [0, 2 𝑊 𝑚 ) Number of occurrences for value 𝑘 (𝑚) in field 𝑚 as 𝑓 𝑘 (𝑚) The probability that an input value 𝑘 (𝑚) matches some rules in field 𝑚 is 𝑝 𝑘 (𝑚) = 𝑓 𝑘 (𝑚) 𝑘 (𝑚) 𝑓 𝑘 (𝑚) The entropy of field 𝑚 is given: 𝐻 𝑚 =− 𝑘 (𝑚) [ 𝑝 𝑘 𝑚 ∙log 𝑝 𝑘 𝑚 ] Schedule all the fields in descending order of 𝐻(𝑚), a field with higher entropy reveals more information; thus, such a field should be matched early to identify mismatches. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION We conducted experiments on the state-of-the-art Xilinx Virtex 7 FPGA (XC7VX1140T). We evaluated the performance using Xilinx Vivado 2014.2. We set a conservative clock constraint of 250MHz for all of our designs. We reported resource utilization and clock rate using post-place-and-route reports. For power estimation; we used Switching Activity Interchange Format (SAIF) files as inputs to Vivado power analysis tool. We built synthetic rule sets where the entropy of each field could be adjusted separately. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Fix the packet header length 𝑊 𝑚 =64 and the number of rule N = 128. Vary 𝑠=2, 4, 8, 16, and 𝑐=2, 4, 8, 16, 32, 64. The performance of 𝑠=4 and 𝑐=64 is best. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Our designs on FPGA sustains > 250 MPPS throughput even for 4K 512-bit rules. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION We show the power performance with respect to N, for the classic packet classification (M = 5) and the OpenFlow table lookup (M = 15) under three scenarios: Without any power optimization technique. With self-enabled power gating, but randomly scheduling of packet header fields. With self-enabled power gating along with entropy-based scheduling of packet header fields. To see the effect of random scheduling, we run 100 tests for a fixed pair of 𝑀 and 𝑁. For each test, the power consumption is averaged on 1K random packets. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Self-enabled power gating reduces the average power consumption by 60%. Exploiting both optimization reduces the average power consumption further by 20%. red “o”: no optimization pink “x”: self-enabled power gating blue “+”: both optimization Computer & Internet Architecture Lab CSIE NCKU