POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA

POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA
這篇主要提出的power efficient，是我們pipeline架構常見的一個現象，前面已經不符合某個條件下，那後面的比對也不用再進行所以他們用的方法是關掉後面不用進行比對的memory module，後面有一些數據是我們可以參考的 Authors: Yun R. Qu, Viktor K. Prasanna Publisher: th International Conference on Field Programmable Logic and Applications (FPL) Presenter: Chun-Yu, Li Date: 105/10/5 Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C.

Computer & Internet Architecture Lab
INTRODUCTION We construct a modular PE to match a stride of the packet header against a stride of a range boundary (A naïve approach to match ranges is to deploy a W-bit comparator for each range boundary). We concatenate multiple PEs into a systolic array to sustain high clock rates for large rule sets. We employ a self-enabled power gating technique on our architecture. The modular PEs are selectively enabled to save the memory access power. 每一個PE就是一個s-bit的comparator，如果是upper bound，就用小於等於的comparator，如果是lower bound，就用大於等於的comparator，當然也可以直接做全部bit的比對，但是實作上就會出現電路比較慢的情形。最後所有的PE會串成systolic array來進行比對(這個架構後面會看到) 這是剛剛提到的，後面不用進行比對的memory module，他們會用clock gating的方式去關掉memory省電。 Computer & Internet Architecture Lab CSIE NCKU

MODULAR PROCESSING ELEMENT
s: s-bit stride. c: number of range boundaries compared by a modular PE. PE consist of c data memory module, c comparators. c = 2 1.解釋一個PE處理s bit, c boundaries 2.解釋小於等於的comparator 3.解釋紅圈 ≤ comparator eql0_in less0_in x & y eql0 less0 en0 * 1 en0_in x=y x<y x>y Computer & Internet Architecture Lab CSIE NCKU

SYSTOLIC ARRAY The strides from the input packet headers are propagated vertically. Each column of PEs compares an s-bit input stride with the corresponding strides of all the N rules. The number of rows required for N rule is 𝑁 𝑐 , while the total number of columns required for ( 𝑊 𝑚 )–bit packet is 2∙ 𝑊 𝑚 𝑠 . 所以我們可以看到所有的PE會串成這種SYSTOLIC ARRAY的架構，這個架構是Decomposition based，比對完某一維度，再比對下一個維度。每個維度有各自的upperbound、lowerbound比對。所以有幾個row就取決於一個PE可以處理多少rule，總共需要幾個row做處理。從column來看的話，就是所有要進行比對的bit數除與stride數，乘與2是因為需要upperbound跟lowerbound的比對 Computer & Internet Architecture Lab CSIE NCKU

SYSTOLIC ARRAY (Type I) for the same field, both comparing upperbounds/lowerbounds 那接線之間會有三種type，如果是同一種 (Type III) for different fields (Type II) for the same field, one comparing upperbound but the other comparing lowerbound Computer & Internet Architecture Lab CSIE NCKU

SELF-ENABLED POWER GATING
Chaining: If one data memory module is deactivated, a chain of data memory modules in the following PEs will be deactivated. To make the self-enabled power gating more effective, for a given packet header and a given rule, we need to report the mismatch, if any, as early as possible Computer & Internet Architecture Lab CSIE NCKU

ENTROPY-BASED SCHEDULING
Number of bits in field 𝑚 as 𝑊 𝑚 , where 𝑚=0,1,…,𝑀−1 A range in field m may cover multiple values in [0, 2 𝑊 𝑚 ) Number of occurrences for value 𝑘 (𝑚) in field 𝑚 as 𝑓 𝑘 (𝑚) The probability that an input value 𝑘 (𝑚) matches some rules in field 𝑚 is 𝑝 𝑘 (𝑚) = 𝑓 𝑘 (𝑚) 𝑘 (𝑚) 𝑓 𝑘 (𝑚) The entropy of field 𝑚 is given: 𝐻 𝑚 =− 𝑘 (𝑚) [ 𝑝 𝑘 𝑚 ∙log 𝑝 𝑘 𝑚 ] Schedule all the fields in descending order of 𝐻(𝑚), a field with higher entropy reveals more information; thus, such a field should be matched early to identify mismatches. Computer & Internet Architecture Lab CSIE NCKU

PERFORMANCE EVALUATION
We conducted experiments on the state-of-the-art Xilinx Virtex 7 FPGA (XC7VX1140T). We evaluated the performance using Xilinx Vivado We set a conservative clock constraint of 250MHz for all of our designs. We reported resource utilization and clock rate using post-place-and-route reports. For power estimation; we used Switching Activity Interchange Format (SAIF) files as inputs to Vivado power analysis tool. We built synthetic rule sets where the entropy of each field could be adjusted separately. Computer & Internet Architecture Lab CSIE NCKU

Fix the packet header length 𝑊 𝑚 =64 and the number of rule N = 128. Vary 𝑠=2, 4, 8, 16, and 𝑐=2, 4, 8, 16, 32, 64. The performance of 𝑠=4 and 𝑐=64 is best. Computer & Internet Architecture Lab CSIE NCKU

Our designs on FPGA sustains > 250 MPPS throughput even for 4K 512-bit rules. Computer & Internet Architecture Lab CSIE NCKU

We show the power performance with respect to N, for the classic packet classification (M = 5) and the OpenFlow table lookup (M = 15) under three scenarios: Without any power optimization technique. With self-enabled power gating, but randomly scheduling of packet header fields. With self-enabled power gating along with entropy-based scheduling of packet header fields. To see the effect of random scheduling, we run 100 tests for a fixed pair of 𝑀 and 𝑁. For each test, the power consumption is averaged on 1K random packets. Computer & Internet Architecture Lab CSIE NCKU

Self-enabled power gating reduces the average power consumption by 60%. Exploiting both optimization reduces the average power consumption further by 20%. red “o”: no optimization pink “x”: self-enabled power gating blue “+”: both optimization Computer & Internet Architecture Lab CSIE NCKU

POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA

Similar presentations

Presentation on theme: "POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA

Similar presentations

Presentation on theme: "POWER-EFFICIENT RANGE-MATCH-BASED PACKET CLASSIFICATION ON FPGA"— Presentation transcript:

Similar presentations

About project

反馈