POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA

Slides:

Advertisements

Similar presentations

Exercise 1 EECS, Peking University Exercise in Query Processing.

Advertisements

Chapter 2 Combinatorial Analysis 主講人 : 虞台文. Content Basic Procedure for Probability Calculation Counting – Ordered Samples with Replacement – Ordered.

國立成功大學工程科學系 Department of Engineering Science -National Cheng Kung University 控制與訊號處理實驗室 Control & Signal Processing Lab MATLAB/Simulink 教學.

MMN Lab 未來教室與雲端化學習 Yueh-Min Huang Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan

Course 1 演算法: 效率、分析與量級 Algorithms: Efficiency, Analysis, and Order

專利申請實務王淑靜

統合分析臨床試驗實之文獻品質評分：以針灸療法之統合分析為例

Performance Evaluation

資料庫設計 Database Design.

多菌株乳酸菌組合在飼料添加物及保健食品之應用-

-Artificial Neural Network- Hopfield Neural Network(HNN) 朝陽科技大學資訊管理系李麗華教授.

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

Chaoping Li, Zhejiang University

Leftmost Longest Regular Expression Matching in Reconfigurable Logic

Operating System CPU Scheduing - 3 Monday, August 11, 2008.

Operators and Expressions

An Adaptive Cross-Layer Multi-Path Routing Protocol for Urban VANET

Rate and Distortion Optimization for Reversible Data Hiding Using Multiple Histogram Shifting Source: IEEE Transactions On Cybernetics, Vol. 47, No. 2,February.

Improving classiﬁcation models with taxonomy information

Thinking of Instrumentation Survivability Under Severe Accident

指導教授：許子衡教授報告學生：翁偉傑 Qiangyuan Yu , Geert Heijenk

Population proportion and sample proportion

資料庫結構與組織.

計算方法設計與分析 Design and Analysis of Algorithms 唐傳義

第五讲数据的分组、合并与转换.

Journal Citation Reports® 期刊引文分析報告的使用和檢索

Logistics 物流昭安國際物流園區總經理曾玉勤.

Sampling Theory and Some Important Sampling Distributions

Decision Support System (靜宜資管楊子青)

HLA - Time Management 陳昱豪.

Responsibility Accounting

组合逻辑3 Combinational Logic

971研究方法課程第九次上課認識、理解及選擇一項適當的研究策略

Jia Zhao Simon Fraser University BC, Canada

Dynamic Traffic Diversion in SDN: Testbed vs Mininet

Programmable Logic Architecture Verilog HDL FPGA Design

Interval Estimation區間估計

2019/1/2 Experimental Analysis on Performance Anomaly for Download Data Transfer at IEEE n Wireless LAN 在IEEE n無線LAN上下載數據傳輸的性能異常的實驗分析 Author:

Formal Pivot to both Language and Intelligence in Science

消費者偏好與效用概念.

基于类关联规则的分类 Classification Based on Class-Association Rules

Decision Support System (靜宜資管楊子青)

Towards Emotional Awareness in Software Development Teams

COMPUTEX 2014 A note given in BCC class on June 4, 2014

Microsoft SQL Server 2008 報表服務_設計

Version Control System Based DSNs

成品检查报告 Inspection Report

高性能计算与天文技术联合实验室智能与计算学部天津大学

2019/4/8 A Load Balancing Mechanism for multiple SDN Controllers based on Load Informing Strategy Miultiple controller 的 load balancing 機制，使用一個叫 Load informing.

Guide to a successful PowerPoint design – simple is best

在Microsoft Access 下建立資料庫

相關統計觀念復習 Review II.

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

虚拟仪器 virtual instrument

從 ER 到 Logical Schema ──兼談Schema Integration

Deep Learning with Limited Numerical Precision

An Efficient MSB Prediction-based Method for High-capacity Reversible Data Hiding in Encrypted Images 基于有效MSB预测的加密图像大容量可逆数据隐藏方法。本文目的：做到既有较高的藏量（1bpp),

BiCuts: A fast packet classification algorithm using bit-level cutting

Efficient Query Relaxation for Complex Relationship Search on Graph Data 李舒馨

钱炘祺一种面向实体浏览中属性融合的人机交互的设计与实现 Designing Human-Computer Interaction of Property Consolidation for Entity Browsing 钱炘祺

Speaker : YI-CHENG HUNG

5. Combinational Logic Analysis

何正斌博士國立屏東科技大學工業管理研究所教授

Arguments to the main Function and Final Project

Class imbalance in Classification

Introduction to Computer Security and Cryptography

Firsthand Learning Field Trip to CCI Site.

Hybrid fractal zerotree wavelet image coding

Presentation transcript:

POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA 這篇主要提出的power efficient，是我們pipeline架構常見的一個現象，前面已經不符合某個條件下，那後面的比對也不用再進行所以他們用的方法是關掉後面不用進行比對的memory module，後面有一些數據是我們可以參考的 Authors: Yun R. Qu, Viktor K. Prasanna Publisher: 2015 25th International Conference on Field Programmable Logic and Applications (FPL) Presenter: Chun-Yu, Li Date: 105/10/5 Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.

Computer & Internet Architecture Lab INTRODUCTION We construct a modular PE to match a stride of the packet header against a stride of a range boundary (A naïve approach to match ranges is to deploy a W-bit comparator for each range boundary). We concatenate multiple PEs into a systolic array to sustain high clock rates for large rule sets. We employ a self-enabled power gating technique on our architecture. The modular PEs are selectively enabled to save the memory access power. 每一個PE就是一個s-bit的comparator，如果是upper bound，就用小於等於的comparator，如果是lower bound，就用大於等於的comparator，當然也可以直接做全部bit的比對，但是實作上就會出現電路比較慢的情形。最後所有的PE會串成systolic array來進行比對(這個架構後面會看到) 這是剛剛提到的，後面不用進行比對的memory module，他們會用clock gating的方式去關掉memory省電。 Computer & Internet Architecture Lab CSIE NCKU

MODULAR PROCESSING ELEMENT s: s-bit stride. c: number of range boundaries compared by a modular PE. PE consist of c data memory module, c comparators. c = 2 1.解釋一個PE處理s bit, c boundaries 2.解釋小於等於的comparator 3.解釋紅圈 ≤ comparator eql0_in less0_in x & y eql0 less0 en0 * 1 en0_in x=y x<y x>y Computer & Internet Architecture Lab CSIE NCKU

Computer & Internet Architecture Lab SYSTOLIC ARRAY The strides from the input packet headers are propagated vertically. Each column of PEs compares an s-bit input stride with the corresponding strides of all the N rules. The number of rows required for N rule is 𝑁 𝑐 , while the total number of columns required for ( 𝑊 𝑚 )–bit packet is 2∙ 𝑊 𝑚 𝑠 . 所以我們可以看到所有的PE會串成這種SYSTOLIC ARRAY的架構，這個架構是Decomposition based，比對完某一維度，再比對下一個維度。每個維度有各自的upperbound、lowerbound比對。所以有幾個row就取決於一個PE可以處理多少rule，總共需要幾個row做處理。從column來看的話，就是所有要進行比對的bit數除與stride數，乘與2是因為需要upperbound跟lowerbound的比對 Computer & Internet Architecture Lab CSIE NCKU

Computer & Internet Architecture Lab SYSTOLIC ARRAY (Type I) for the same field, both comparing upperbounds/lowerbounds 那接線之間會有三種type，如果是同一種 (Type III) for different fields (Type II) for the same field, one comparing upperbound but the other comparing lowerbound Computer & Internet Architecture Lab CSIE NCKU

SELF-ENABLED POWER GATING Chaining: If one data memory module is deactivated, a chain of data memory modules in the following PEs will be deactivated. To make the self-enabled power gating more effective, for a given packet header and a given rule, we need to report the mismatch, if any, as early as possible Computer & Internet Architecture Lab CSIE NCKU

ENTROPY-BASED SCHEDULING Number of bits in field 𝑚 as 𝑊 𝑚 , where 𝑚=0,1,…,𝑀−1 A range in field m may cover multiple values in [0, 2 𝑊 𝑚 ) Number of occurrences for value 𝑘 (𝑚) in field 𝑚 as 𝑓 𝑘 (𝑚) The probability that an input value 𝑘 (𝑚) matches some rules in field 𝑚 is 𝑝 𝑘 (𝑚) = 𝑓 𝑘 (𝑚) 𝑘 (𝑚) 𝑓 𝑘 (𝑚) The entropy of field 𝑚 is given: 𝐻 𝑚 =− 𝑘 (𝑚) [ 𝑝 𝑘 𝑚 ∙log 𝑝 𝑘 𝑚 ] Schedule all the fields in descending order of 𝐻(𝑚), a field with higher entropy reveals more information; thus, such a field should be matched early to identify mismatches. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION We conducted experiments on the state-of-the-art Xilinx Virtex 7 FPGA (XC7VX1140T). We evaluated the performance using Xilinx Vivado 2014.2. We set a conservative clock constraint of 250MHz for all of our designs. We reported resource utilization and clock rate using post-place-and-route reports. For power estimation; we used Switching Activity Interchange Format (SAIF) files as inputs to Vivado power analysis tool. We built synthetic rule sets where the entropy of each field could be adjusted separately. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Fix the packet header length 𝑊 𝑚 =64 and the number of rule N = 128. Vary 𝑠=2, 4, 8, 16, and 𝑐=2, 4, 8, 16, 32, 64. The performance of 𝑠=4 and 𝑐=64 is best. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Our designs on FPGA sustains > 250 MPPS throughput even for 4K 512-bit rules. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION We show the power performance with respect to N, for the classic packet classification (M = 5) and the OpenFlow table lookup (M = 15) under three scenarios: Without any power optimization technique. With self-enabled power gating, but randomly scheduling of packet header fields. With self-enabled power gating along with entropy-based scheduling of packet header fields. To see the effect of random scheduling, we run 100 tests for a fixed pair of 𝑀 and 𝑁. For each test, the power consumption is averaged on 1K random packets. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION Self-enabled power gating reduces the average power consumption by 60%. Exploiting both optimization reduces the average power consumption further by 20%. red “o”: no optimization pink “x”: self-enabled power gating blue “+”: both optimization Computer & Internet Architecture Lab CSIE NCKU