第六章 Internet交换体系结构 Internet技术胡越明.

Slides:

Advertisements

Similar presentations

1 I/O 设备访问方式和类型. 2 Overview n The two main jobs of a computer: l I/O (Input/Output) l processing n The control of devices connneted to the computer is.

Advertisements

第 2 章中央處理單元.

第二部分嵌入式硬件系统第2章嵌入式处理器第3章 ARM内核与ARM处理器第4章嵌入式系统的外围设备.

NEUSOFT Institute of information Technology .ChengDu

第二章微型计算机系统第一节基本术语和基本概念第二节计算机系统的基本构成第三节微机系统的硬件组成第四节微机系统的软件组成.

CHAPTER 9 虛擬記憶體管理 9．2 分頁需求 9．3 寫入時複製 9．4 分頁替換 9．5 欄的配置法則 9．6 輾轉現象

軍用電腦科技趨勢與產業應用全科科技陳以昌.

實驗 9: 無線安全網路之建設.

開店法寶 VIVIPOS 簡介 June, 2009.

计算机应用基础计算机基础知识.

IP路由查找.

最新計算機概論第3章計算機組織.

第四章存储体系.

4.3 计算机网络传输技术 1）点到点网络（Point-to-Point） 2）广播网络（broadcasting）信阳师范学院计算机系

单片机原理与应用.

單晶片原理與應用魏兆煌整理南台科技大學電機系 Sep

Hardware Chen Ching-Jung

CH.2 Introduction to Microprocessor-Based Control

第四章 BootLoader开发附-s3c2410/s3c2440处理器介绍

Chapter 5 電腦元件目標---- 研讀完本章後，你應該可以：閱讀有關電腦的廣告以及了解它的專業用語(行話)。

第 4 章記憶單元.

第 2 章中央處理單元.

學校網路設備規畫與建置報告人：莊斯凱.

第一章嵌入式系统开发基础本章介绍了嵌入式系统开发的基础知识，从嵌入式计算机的历史由来、嵌入式系统的定义、嵌入式系统的基本特点、嵌入式系统的分类及应用、嵌入式系统软硬件各部分组成、嵌入式系统的开发流程、嵌入式技术的发展趋势等方面进行了介绍，涉及到嵌入式系统开发的基本内容，使读者系统地建立起的嵌入式系统整体概念。

電腦硬體基本介紹國立高雄大學資訊工程學系林士倫 2010/10/21.

数字系统设计 I Digital System Design I

CPU資料處理醫務管理暨醫療資訊學系陳以德副教授: 濟世CS 轉

網路技術管理進階班---區域網路的技術發展

網路技術管理進階班---網路連結講師 : 陳鴻彬國立東華大學電子計算機中心.

ARM存储器结构 ARM架构的处理器的存储器寻址空间有4G字节 ,存储空间可以分为 :

可编程片上系统设计何宾

不断变迁的闪存行业形势 Memory has changed, especially serial - from a low cost, low pin count, slow memory to an advanced, high performance memory solution to save.

教育部補助「行動寬頻尖端技術跨校教學聯盟計畫-行動寬頻網路與應用-小細胞基站聯盟中心計畫」 Small Cell創新應用與服務專題課程單元： LTE/SAE網路架構與元件計畫主持人：許蒼嶺授課教師：李宗南、簡銘伸、李名峰教材編撰：李名峰國立中山大學資訊工程系.

C H A P T E R 10 存储器层次.

智能电子钱包终端设计（一） ——CPU卡与COS文件结构

1-1 微電腦系統單元 1-2 微電腦系統架構 1-3 微控制器（單晶片微電腦） 1-4 類比與數位訊號介面

1-1 微電腦系統單元 1-2 微電腦系統架構 1-3 微控制器（單晶片微電腦） 1-4 類比與數位訊號介面

基于压缩算法的tile64多核处理器性能研究

GPU分散式演算法設計與單機系統模擬(第二季)

5 Computer Organization (計算機組織).

32 bit destination IP address

附錄通訊協定堆疊.

C H A P T E R 8 体系结构对系统开发的支持.

Access Networks.

第4章网络互联与广域网 4.1 网络互联概述 4.2 网络互联设备 4.3 广域网 4.4 ISDN 4.5 DDN

微程序控制器刘鹏 Dept. ISEE Zhejiang University

Ch 9: Input/Output System 输入/输出系统

什麼是網際網路？面臨攻擊的網路網路邊際總結網路核心

電腦的硬體架構.

重點資料結構之選定會影響演算法選擇對的資料結構讓您上天堂程式.

第七讲网际协议IP.

第2章 ARM微处理器硬件结构计算机体系结构分类 ARM版本及系列 ARM处理器结构存储系统机制.

校園網路架構介紹與資源利用主講人：趙志宏圖書資訊館網路通訊組.

CPU结构和功能.

計算機概論第3章計算機組織與結構概觀.

网络系统设计与网络处理器主讲：华蓓实验室：电一楼（安徽省计算与通讯软件重点实验室）电话：

中国科学技术大学计算机系陈香兰 2013Fall 第七讲存储器管理中国科学技术大学计算机系陈香兰 2013Fall.

虚拟仪器 virtual instrument

中国科学技术大学计算机系陈香兰 Fall 2013 第三讲线程中国科学技术大学计算机系陈香兰 Fall 2013.

计算机系统结构（2012年春） ----存储层次： Cache基本概念

Source: Journal of Network and Computer Applications, Vol. 125, No

第10章存储器接口罗文坚中国科大计算机学院

第六章記憶體.

BiCuts: A fast packet classification algorithm using bit-level cutting

计算机通信网 Lecture 3: 数据链路层.

Link Layer &一點點的Physical Layer

ADX series Configuration

Requirements for SPN Information Modeling

A Trie-based Approach to Fast Flow Recognition for OpenFlow

Presentation transcript:

第六章 Internet交换体系结构 Internet技术胡越明

Agenda 6.1 交换路由器系统结构 6.2 IXA和IXP网络处理器简介 6.3 网络处理器应用系统的构成

6.1交换路由器系统结构路由器的构成嵌入式CPU和网络接口等系统硬件嵌入式操作系统以及各种协议软件网络管理系统软件路由器的用户界面

路由器的性能指标总体数据传输速率Aggregate data rate 总体分组转发速率Aggregate packet rate Total rate at which data can arrive or leave a network system Sum of data rate on all interfaces Bit per second (bps) 总体分组转发速率Aggregate packet rate Packet per second (pps) Packet size from 64 octets to 1518 octets Many processing operations require a fixed amount of time on a packet

路由器的分组处理功能基本的处理深度处理路由表查找分组的分类检错和纠错分组缓存管理和队列管理分片和重组发送队列调度和管制流量检测、整形组播处理隧道处理安全性处理

早期的路由器结构

早期的路由器结构线卡问题解决方法缺点网络物理链路的连接点主要完成网络层和数据链路层的功能 CPU必须处理每个数据包每个数据包必须通过总线两次总线与CPU都是瓶颈解决方法在每块线卡中增加处理器用于路由查找处理、转发大部分IP数据包数据包至多通过共享总线一次缺点路由查找受到CPU速度的限制共享总线影响了吞吐量

交换路由器 ——实现报文从输入端口到输出端口的无阻塞传输 switch fabric

交换路由器线卡交换结构分组处理向交换结构发送和接收分组 Banyan结构 Crossbar 并行访问共享内存 The physical connection within a switch between the input and output ports Banyan结构 Crossbar 并行访问共享内存

并行访问共享内存的路由器结构分离数据通路与控制信息通路

端口冲突的现象有多个输入端的分组需要转发到同一个输出端建立队列使分组排队等待

排队方式输入队列(IQ) 输出队列(OQ) 虚拟输出队列(VOQ) 输入与开关队列的组合(CICQ) 输入与输出队列的组合(CIOQ)

输入排队方式每个输入端口包含一个分组缓存输入队列的行前阻塞（HOL阻塞） IQ switch has only 58.6% throughput due to head-of-line blocking. M. Karol, M. Hluchyj, and S. Morgan, "Input versus Output Queueing on a Space Division Packet Switch," IEEE Transactions on Communications, Vol. 35, No. 12, pp. 1347-1356, December 1987.

输出队列方式每个输出端口用一个分组缓存输入端同时到达的数据包可同时送到相应的输出队列 The buffer memory speed must operate at N times link speed, difficult for Gigabit networks 虚拟输出队列（VOQ） Organize the input buffer in each input port into N parallel VOQs

VOQ

VOQ 一种分类的输入队列结构可以达到输出队列方式的效果需要有一个选择算法从这些输入队列中选择一个输出输入接口处共有M*N个分组队列解决发往同一输出接口的分组冲突问题对来自不同输入接口送往同一输出接口的分组队列的调度

性能比较

输入与开关队列的组合(CICQ) CICQ Combined input and crossbar queue Input buffer and Crossbar switch with output buffer 便于输出调度 http://www.csee.usf.edu/~christen/career/lit1.html

输入与开关队列的组合(CICQ) 需要先进的集成电路技术交叉开关的每个输入端有多个队列输出接口可实现调度算法开关电路+缓存电路每个输入端队列的分组对应不同的输出端输出接口可实现调度算法严格优先级加权轮回（WRR） DRR和层次化调度

输入与输出队列的组合（ CIOQ ）适合于没有CICQ的场合每个输入端和输出端都可以有多个队列位于线卡中

CIOQ的路由器 Switch Fabric 分类标记 ingress 查表调度调度 egress

CIOQ与QoS 输入队列的功能输出队列的功能分类队列管理调度整形根据转发表将分组放入不同输出端的队列对队列长度进行计量对超过业务流规范的分组进行管制或标记调度输出队列的功能实现分组队列不同优先级整形对输出的业务流进行整形

路由器分组处理功能的划分输出处理输入处理分组头修改检错和纠错分类和解复用加入检错信息流量计量和管制地址查找分片安全认证 Decrement TTL 加入检错信息分片流量整形输出安全处理输入处理检错和纠错分类和解复用流量计量和管制地址查找安全认证分组头修改队列管理输入端的分组修改是加上给交换结构的信息，使分组送到相应的输出线卡。

虚拟输出队列的调度 ——DiffServ下的调度方式之一每个输入端口有16个虚拟输出队列区分16个QoS类输入端口间采用WRR调度策略输入端口内采用DRR调度策略

虚拟输出队列的调度 ——DiffServ下的调度方式之二增加第三级调度采用严格优先级(SP)策略 16个端口间采用WRR 端口的两个队列组之间采用SP 队列组中的8个队列之间采用DRR

线卡上的处理器功能实现方式处理大部分分组只有控制包和异常包需转发至主CPU 转发引擎通用嵌入式CPU ASIC 网络处理器微引擎/通道处理器/毫微处理器实现方式通用嵌入式CPU ASIC 网络处理器

线卡上的处理器通用CPU ASIC 网络处理器(NP) 处理和转发IP数据包的CPU 性能较低高速、固定功能开发和制造周期较长灵活性有限难以实现复杂的功能如NAT 网络处理器(NP) 一种专用于网络交换设备的处理器类型较好的性能和灵活性并行处理结构陡峭的学习曲线

Why NP? Network data rates are increasing Less packet interval time Protocols are becoming more dynamic and sophisticated L4 to L7 switch Protocols are being introduced more rapidly Multicast RTP for VoIP IPTV

Data rate example Technology 10Base-T 100Base-T 1000Base-T Data rate 10Mbps 100Mbps 1000Mbps Packet rate for small packets 19.5Kpps 195.3Kpps 1953Kpps Packet rate for large packets 0.8Kpps 8.2Kpps 82.3Kpps Time per small packets 51.2ms 5.12ms 0.51ms Time per large packets 1214.4ms 121.44ms 12.14ms

网络处理器面向网络应用的处理器单片的多处理器系统标准功能用硬件实现标准化专用于进行分组的处理以线速转发分组可编程若干个处理分组的高速智能处理器标准功能用硬件实现如加密/解密和散列标准化网络处理器论坛（NPF）

Advance Quality of Service Product Life Cycles Sophisticated Algorithms Mean Longer (More Costly) Development & Less Payback L2 Switch Revenue Opportunity 802.1p & Q Revenue Opportunity IP Forwarding Revenue Opportunity Advance Quality of Service Rev Opp Firewall Design Time Selling Time

网络处理器的功能定位应用层传输层网络层链路层物理层功能不断增加，计算量大，是通用处理器的特长功能相对稳定，由通用处理器或网络处理器内核完成网络层功能部分固定，由微引擎和内核共同完成链路层功能固定，处理简单，由微引擎或外围芯片完成物理层功能固定，缺乏通用性，主要由外围芯片完成

网络处理器的功能定位控制平面数据平面管理平面路由协议分组头处理设备配置分组的接收、发送异常处理分组的分类、排队数据封装线卡管理接口管理流量管理数据平面分组头处理分组的接收、发送分组的分类、排队数据封装发送调度

网络处理器的主要功能分组头处理分组的接收、发送分组的分类、排队发送调度 CPU Embedded Proc. I /O Processors Lower Levels Of Processor Hierarchy Lower levels need the most increase

网络处理器的功能模式匹配检索计算数据处理队列管理控制处理对分组中的字段进行匹配发现分组中的特征（满足的表达式）根据关键字段查找表格中的数据项计算加密、解密、认证、散列、CRC校验数据处理分组的分片 TTL递减、打标签队列管理分组的缓存与QoS相关的流量整形和流量工程策略控制处理异常分组的处理、表格更新、数据统计

网络处理器中采用的技术 Multiple processing engines per chip Multithreading high-speed interconnections Special purpose or general purpose Multithreading Hardware thread scheduling Hardware signal, mutex, synchronization Processing engine pipelining Or thread pipelining Content-addressable memory Special acceleration hardware For encryption, hash, etc. Hierarchical memory structure Shared internal RAM Large register set

Why not use a general purpose processor? I/O speed Less I/O capacity Computing speed Less parallelism General purpose processors are not as fast as network processors at data plane network processing Memory access speed Received data are rarely spatially or temporally associated with each other General purpose processor achieve their performance by using on-chip cache to hide memory latencies

6.2 IXA和IXP网络处理器简介 6.2.1 IXA简介 6.2.2 IXP2400网络处理器 6.2.3 IXP2800网络处理器 6.2.4 链路层器件

IXA简介 Internet eXchange Architecture Intel公司提出的网络系统体系结构编程性强支持NPF标准可构成各种网络设备支持NPF标准 CSIX（common switch interface consortium）各种软件模块的实现规范

Original IXA

Intel网络处理器的特点一个通用的嵌入式核心处理器多个微引擎专用加速硬件具有片内存储层次结构可扩展处理较复杂的任务包含控制平面和转发平面确定路由、线路间负载平衡多个微引擎优化于分组处理的指令系统线速处理较简单的任务分组接收、分类、路由查找、队列管理、发送专用加速硬件支持堆栈操作、散列、加密/解密计算具有片内存储层次结构寄存器、微引擎本地存储器、便笺存储器、片内SRAM 可扩展支持多个网络处理器的连接支持微引擎数量的扩展

Internal Architecture Hash Unit IX Bus Interface Scratch Pad SRAM Controller Microengine 0 Microengine 5 Microengine 1 Microengine 2 Microengine 3 Microengine 4 StrongARM Core (166 MHz) 16K B Instruction Cache 8 K B Data Cache 512 B Mini-Data Cache PCI SDRAM 64 Bit 32 Bit Internal Architecture

StrongARM PCI Unit SDRAM Unit SRAM Unit 32-bit Micro- engine 1 Micro- Intel StrongARM SA-1 Core 16KB Icache 8KB Dcache UART GPIO PCI Unit 4 timers RTC 512B mini Dcache Write buffer SDRAM Unit Read buffer JTAG SRAM Unit 32-bit Micro- engine 1 Micro- engine 2 Micro- engine 3 Scratchpad Memory (4kbytes) 64-bit IX Bus interface Micro- engine 4 Micro- engine 5 Micro- engine 6 Hash Unit

Intel网络处理器产品系列高档中档低档面向核心路由器 IXP2800 (IXP2850) 面向边缘路由器 IXP2400 面向联网设备 IXP400 (IXP42x, IXP46x)

6.2.2 IXP2400网络处理器 XScale内核 8个微引擎 Scratchpad memory 共享的Hash单元专用RISC处理器 8 threads per micro-engine 4KW control store, 640W local memory and more registers CRC, CAM Scratchpad memory 片内共享存储器共享的Hash单元 2 QDR SRAM channels for up to 20 Gbps; Support for external classification engines Non overlapped address space UP to 1 Gbyte DDR SDRAM 64/66 PCI host CPU interface MSF interface supporting Utopia 1/2/3, SPI-3 (POS) and CSIX interfaces OC-48 data rates Configurable RBUF and TBUF size (64, 128, 256B)

IXP2400 CAP: CSR访问代理

StrongARM Characteristics Reduced Instruction Set Computer (RISC) 32 bit arithmetic Vector floating point provided via a coprocessor Byte addressable memory Virtual memory support Built-in serial port Facilities for a kernelized operating system

StrongARM Characteristics 5 stage pipeline single cycle instruction execution 16KB 32way I-cache 16KB 32way write-back D-cache co-processor support JTAG support

StrongARM core pipeline organization

Summary of ARM architectures Core Architecture ARM1 v1 ARM2 v2 ARM2as, ARM3 v2a ARM6, ARM600, ARM610 v3 ARM7, ARM700, ARM710 v3 ARM7TDMI, ARM710T, ARM720T, ARM740T v4T StrongARM, ARM8, ARM810 v4 ARM9TDMI, ARM920T, ARM940T v4T ARM9ES v5TE ARM10TDMI, ARM1020E v5TE

XScale core 采用超流水技术的RISC结构的32位微处理器采用ARM V5的定点指令系统 7~8级指令流水线采用ARM V5的定点指令系统 ARM V5在V4的基础上增加了浮点指令在用户模式的应用程序中与StrongARM兼容支持ARM的Thumb指令系统 ARM V5T 支持ARM 的DSP扩展 ARM V5TE 32 KB指令cache和32 KB数据cache

Role Of Microengines Ingress Egress Packet receive from physical layer hardware Checksum verification Header processing and classification Packet buffering in memory Table lookup and forwarding Header modification Egress Checksum computation Queue management Transmit schedule Packet transmit to physical layer hardware

Microengine Execution Pipeline The Microengines have a five stage execution pipeline P0 = Fetch instruction P1 = Decode instruction P2 = Read operands P3 = Perform ALU/shift operation P4 = Write results Developers Workbench Cursors show what is happening in each pipeline stage Colors of the arrows indicate: Instruction executing Microengine idle Microengine stalled Instruction aborted Stage 4 Stage 0

Microengine Enhancements 4/8 threads per microengine Multiplier unit Next-neighbor registers 640x32 local memory Pseudo-random number generator CRC calculator Four 32 bit timers and timer signaling 16 entry CAM Time-stamping unit

Microengine Enhancements (continued) Support for generalized thread signaling Queue manipulation mechanism that eliminates the need for mutual exclusion ATM segmentation and reassembly hardware Byte alignment facilities Two ME clusters with independent buses 4K word instruction store 256 GPRs and 512 transfer Regs 32-bit multiplication unit

SRAM Unit Features Read/ Write Bit test/set/clear 8 entry ReadLock CAM Long Word Block of Long Words Bit test/set/clear 8 entry ReadLock CAM 8 entry Push/Pop queue

SRAM Unit Architecture (1/2 Core clock) Flash ROM 512K(nom) to 8MB (max) Address Queue Service Arbiter Amba Translation Unit & Data FIFO SA Core SRAM Memory References Amba R/W Addr Queue SRAM Up to 8MB 16 Entry Read Queue Address Microengines Command Reference FIFOs 16 Entry Order/Write Queue 8 Entry Priority Queue 24 Entry Read Lock Fail Queue Microengine queues Slow Port For Peripherals Address 8 Entry CAM SRAM Unit Internal structure SRAM XFER Registers 32-bit data

SDRAM Unit Features Read/ Write Read-Modify-Write Chained Reference Quad Word Block of Quad Words Read-Modify-Write Use indirect_ref optional token Can modify individual bytes of the Quadword Chained Reference Use chained_ref optional token SDRAM unit will service same thread till the chain is completed Used for in order to access non-contiguous blocks of memory

SDRAM Unit SDRAM Unit Internal structure SDRAM Up to Address 256 MB Queue Service Arbiter SA Core Request Logic SA Core SDRAM Memory References SDRAM Up to 256 MB PCI Unit Request Logic PCI Memory References Address 16 Entry ODD Queue Microengines Command Reference FIFOs 16 Entry EVEN Queue 83 MHz (1/2 Core clock) 16 Entry ORDER Queue 16 Entry PRIORITY Queue Microengine queues 64-bit Data SDRAM Unit Internal structure Byte Aligner

Media or Switch Fabric (MSF) Interfaces MSF configurable to – Utopia 1, 2, or 3 interface – CSIX-L1 fabric interface – System Packet Interface Level 3 or 4 (SPI-3 or SPI-4) SerDes Framer Interface (SFI) Note: The Optical Internetworking Forum (OIF) controls the SPI and SFI standards.

SPI System Packet Interface SPI-3 SPI-4 SPI-5 OC-48 system interface for physical and link layer devices 32位的接口，4Gbps 支持133MHz时钟频率可以分割成4个8位的通道 SPI-4 OC-192 system interface for physical and link layer devices 支持400MHz时钟频率, 10Gbps SPI-5 OC-768 system interface for physical and link layer devices

CSIX Common Switch Interface Consortium 由厂商会员组成制订网络处理器规范 CSIX L1 fabric interface Look-aside interface Stream interface

CSIX L1 fabric interface C frame帧格式 Header optional extension header optional payload optional padding vertical parity trailer 32/64/96/128位并行连接线路支持最大速率为32Gbps 支持板级连接 C frame的帧单播帧组播帧广播帧流控帧

C frame Header Extension header Payload Padding bits 2 bytes in length Payload length(8), frame type(4), ready bits for link level flow control Extension header Type-specific, determined by the frame type, 0-4 bytes e.g. destination fabric port for unicast frames Payload Maximum allowable length is 256 bytes Padding bits Ensure that the CFrame has an appropriate length Vertical parity field 16 bit field, use of the field is optional

IXP2800 X-scale core 16 version 2 micro-engines 700 Mhz 16 version 2 micro-engines 1.4 GHz uE Operation 20+ GOPs Media / Switch Fabric Interface Configured as CSIX-L2 or SPI-4 10Gbs Full Duplex Media Interface 50Gbs Packet Memory Bandwidth 30Million Packets Per Second L4 forwarding 60Million Enqueue/Dequeue Operations/Sec ~14W, 1357 BGA

IXP2800

IXP2800 Features con’t PCI Interface QDR Interface (w/Parity) 64 bit / 66 MHz Interface for Control QDR Interface (w/Parity) (4) 36 bit SRAM Channels (QDR or Co-Processor) Network Processor Forum Proposed Co-processor Standard Interface RDR Interface (w/ECC) (3) Independent Direct Rambus DRAM Interfaces Supports 4i Banks or 16 interleaved Banks Supports 16/32 Byte bursts Tuned for PC800 or PC1066 RDR

IXP2850 Version of 2800 with onboard encryption processor symmetric-key ciphers Advanced Encryption Standard (AES) triple Data Encryption Standard (3DES) one-way hash function Secure Hash Algorithm (SHA-1) keyed message digest Hashed Message Authentication Code (HMAC) HMAC concatenates some private data into the message data before computing one-way hash A checksum accumulator

IXP23xx Intel’s first 90nm Network Processor Microengines at 300, 600, or 900 MHz Intel XScale® core at 600, 900, or 1200 MHz Two DDR DRAM controllers QDR SRAM controller 128kB Internal SRAM 512kB Layer 2 Push Cache Integrated I/O from T1/E1 through Gigabit Ethernet Integrated Encryption Engines

Intel® IXP2350 GMII, TBI GMII, TBI (4) H/MVIP MII MII MEv2 MEv2 1 Rbuf 16 16 72 QDR SRAM Controller DDR DRAM Controller Intel® IXP2350 Microengines MEv2 MEv2 1 Rbuf SPI3 or Utopia 32 Tbuf 32 MEv2 3 MEv2 2 64 PCI v2.2 Bridge GMII, TBI Hash 64/48/128 Gigabit Ethernet 0 Message SRAM 128KB Scratch 16KB Gigabit Ethernet 1 GMII, TBI DDR DRAM Controller 40 (4) H/MVIP 16 T1/E1/J1 256 Ch HDLC Control Plane Processing 10/100 Ethernet 0 MII Expansion Bus Controller 512 KB L2 Cache Intel XScale® Core Crypto 16 10/100 Ethernet 1 MII Network Processing Engines 0 & 1

IXP400网络处理器系列 XScale内核网络处理器引擎（NPE） PCI接口 MII/RMMI接口（802.3） UTOPIA-2接口主频为266MHz、400MHz、533MHz、667MHz等网络处理器引擎（NPE）用于减轻典型L2网络功能负担如以太网过滤、ATM SAR、HDLC PCI接口 32位的PCI v2.2 MII/RMMI接口（802.3）集成在NP片内的以太网接口 UTOPIA-2接口 8位，33MHz主频支持单个或多个物理接口配置 USB接口包括USB2.0宿主控制器和USB v1.1设备控制器

IXP400网络处理器系列高性能串行（HSS）接口 SDRAM接口扩展总线接口加密/认证模块 DSP支持用于连接T1/E1或者SLIC/CODEC SLIC（Analog Subscriber Line Interfaces）是传统模拟电话线的接口标准 Codec如模数/数模转换、调制/解调、压缩/解压缩 6线，支持8.192MHz，8个HDLC通道 SDRAM接口支持32MB到1GB存储器，可支持ECC 扩展总线接口最多25位地址，可连接各种其他设备可用于连接Flash存储卡或其他Boot ROM存储器加密/认证模块 DES、3DES、AES128位和256位、SHA等 DSP支持支持TI的DSP

IXP400网络处理器系列器件 UTOPIA HSS MII 0 MII 1 AES/DES HDLC SHA-1/MD-5 IXP425 8 IXP422 IXP421 IXP420 IXC1100

IXP425网络处理器系列

IXP465网络处理器系列

Tolapai Single Die integrates IA CPU @ 600, 1066 and 1200MHz DDR2 memory controller (MCH) PCI Express* Standard IA PC peripherals (ICH) 3x Gigabit Ethernet MACs 3x TDM high-speed serial interfaces for 12 T1/E1 or Slic/Codec connections Intel® QuickAssist Integrated Accelerator For high-performance security and IP telephony applications

Tolapai

Tolapai 148 Million transistors 1,088-ball FCBGA w/1.092 mm pitch 37.5 mm x 37.5 mm package Intel's first integrated IA processor, chipset and memory controller since 1994's 80386EX.

6.2.4 链路层器件 10端口千兆以太网MAC器件IXF1010 SPI-3成帧器件IXF6048 6.2.4 链路层器件 10端口千兆以太网MAC器件IXF1010 SPI-3成帧器件IXF6048 四端口千兆以太网MAC器件IXF1104

4端口千兆以太网MAC器件IXF1104

Intel网络处理器的特点有一个主核通用嵌入式处理器处理器异常分组和网络协议多个多线程的微引擎高度可编程便于实现新型分组处理功能

6.3 网络处理器应用系统的构成 6.3.1 硬件构成 6.3.2 软件构成 6.3.3 应用系统构成实例

IXP2400 Full-Duplex OC-48 System Implementation R A M Host CPU (IOP or iA) DDR SDRAM Packet Memory QDR SRAM Queues & Tables Q D R Q D R T C A M Classification Accelerator IXP2400 Ingress Processor IXF6048 Framer OC-48 OC48 Switch Fabric Gasket OC48 OC48 IXP2400 Egress Processor 1x OC-48 or 4x OC-12 T C A M Classification Accelerator QDR SRAM Queues & Tables Q D R Q D R S D R A M DDR SDRAM Packet Memory

Gigabit Ethernet Backplane Typical network edge architectures 127 Ports DSLAM Line Card 12-port ADSL PHY Dual 10/100 Ethernet PHY Control Plane Processor 10/100 Console Dual 10/100 MAC Intel® IXP2350 Network Processor Network Processor Integrated Gigabit Ethernet MACs Intel® XScale Core 128 KB Integrated SRAM Integrated 10/100 MACs 128 Port Utopia L2 Interface Dual Gigabit MAC Dual Gigabit PHY Gigabit Ethernet Backplane FPGA 12-port ADSL PHY 12-port ADSL PHY Boot Flash DDR DRAM QDR SRAM

Gigabit Ethernet Backplane Typical network edge architectures Node B Transport Card Encryption CoProcessor Control Plane Processor Dual 10/100 Ethernet PHY 10/100 Console Dual 10/100 MAC Octal T1/E1/J1 Framer/LIU 16 T1/E1/J1 HDLC Controller IMA Network Processor Intel® IXP2350 Network Processor Integraed Encryption Engine Integrated Gigabit Ethernet MACs Intel® XScale Core 128 KB On-Chip SRAM 256 Channel HDLC Controller Integrated 10/100 MACs Dual Gigabit MAC Dual Gigabit PHY Gigabit Ethernet Backplane Octal T1/E1/J1 Framer/LIU Boot Flash DDR DRAM QDR SRAM

Fabric Interface Chip (FIC) PPP/ ATM/ OTN / SONET/ SDH 10Gbps SONET Line Card SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor D R A M D R A M D R A M RDR Packet Memory QDR SRAM Queues & Tables Control Plane Processor Q D R Q D R Q D R Q D R PCI 64/66 IXP2800 Ingress Processor Fabric Interface Chip (FIC) Calypso CDR, DEMUX 10GbE OC-192c SPI I/F 10Gbs 15Gbs CSIX I/F Fabric Flow Ctl CDR, DEMUX 10Gbs 15Gbs IXP2800 Egress Processor Traffic Shaping Flexible Choices diff serve TM 4.1 … Egress Processor 10 GbE WAN / PPP/ ATM/ OTN / SONET/ SDH QDR SRAM Queues & Tables Q D R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory

6.3.2 软件构成 NPF定义的网络处理器软件架构

NPF定义的网络处理器软件架构转发平面控制平面负责以线速处理网络的流量根据网络业务流作出分组处理的决定控制和配置转发平面转发、分类、过滤等控制平面控制和配置转发平面执行各种信令和路由协议为转发平面提供路由信息、转发信息、端口配置信息和QoS配置信息分为应用层、服务层和功能层

控制平面软件应用层最高层次的抽象路由协议、边界网关协议、路由信息的管理服务层实现与系统相关的抽象功能层实现与硬件单元相关的抽象

控制平面软件两套API 控制平面平台开发包（CP_PDK） NPF应用API NPF管理API 为内核的控制平面软件提供标准化的接口针对特定的协议类型在操作系统之上运行 IPv4单播转发API、MPLS API、区分服务API 每个API分别由一个构件实现 NPF管理API 系统的配置和管理数据转发平面的插件管理名字空间的管理控制平面平台开发包（CP_PDK）为内核的控制平面软件提供标准化的接口建立在内核构件基础上

6.3.3 应用系统构成实例 Internet交换路由器见教材 2. 边缘汇聚路由器 3. 多业务服务平台

附录：其他网络处理器 2.5Gbps 10Gbps IBM公司的PowerNP 4GS3 Vitesse公司的IQ2000、IQ2200 Motorola公司的C-5 DCP Cisco Toaster 2 10Gbps Bay Microsystems公司的BrecismsP5000 Xstream Logic公司(后改名Clearwater Networks ) 动态多线程(DMS)处理器核智能包管理单元(PMU) 采用类似MIPS的结构 Ezchip公司的NP-1 Lexra公司的NetVortex

PowerNP high-level architecture UnderstandingNPs

PowerNP Embedded processor complex (EPC) Ingress EDS Egress EDS 计算资源 Ingress EDS 网络接口的分组接收、发送、调度 Egress EDS 交换结构接口的分组接收、发送、调度 Ingress SWI 实现内部分组回路 Egress SWI Ingress PPM 连接物理层设备 Egress PPM

PowerNP components Embedded processor complex (EPC) Data flow (DF) 包含1个嵌入式PowerPC 8 protocol processor units 每个协议处理器包含2个CLP 共16个picoprocessor 32个分组处理线程采用加速硬件实现帧转发、过滤、CRC计算 Accepts data for processing from both ingress and egress DFs 4KB shared memory pool (1KB per thread) Data flow (DF) Data path for receiving and transmitting packets Coprocessor Provide hardware-assist function Table search, packet alteration, classification, pattern search

CLP Core language processor A 32 bit picoprocessor 1 cycle ALU ops scaled-down RISC processor 1 cycle ALU ops 16 32bit or 32 16 bit GPRs Supports 2 threads(32 threads in all) Run at 133MHz

PowerNP functional block diagram

Vitesse公司的IQ2200 4 200MHz scalar RISC processor cores With co-processors for lookup, classification, packet order management, multicast support Optimized instruction for network operations QoS Engine for packet priorities and transfer Vsc2202-pb-r10-vppd-00306

Motorola C-port C-5 16 channel processors (CP) 5 co-processors RISC core with 2 Serial Data Processors (SDP) RISC core Classification, traffic scheduling SDP talks to other CP, field parsing, CRC validation/calculation, framing header validation, extraction, insertion, deletion, 5 co-processors Executive processor: coordination with external host processors Fabric processor: for using multiple C-5’s in a fabric Table lookup unit: table lookup and update Queue management: manage packet queues Buffer management: memory management 1 general purpose processors UnderstandingNPs

Cisco Toaster 2 Consists of 2 PXF chips Parallel express forwarding (PXF) architecture Contains 16 Express microcontroller (XMC) Single thread execution model Used in Cisco 10000 Edge Service Router (ESR)

Cisco Toaster 2 Express microcontroller (XMC) Based on a vanilla 2-way issue RISC VLIW Arranged in 4 pipelines Results in a 4x8 systolic array

Lextra NetVortex Use 16 MIPS R3000 32-bit RISC core(LX8000) Support for single cycle context switch among 8 contexts Add special instructions to speed up packet processing 1’s complement add, insert and extract bit fields LX480 used for control plan processor

Clearwater CNP810 SMT core 10 functional units (FU) Use simultaneous multi-threading Peak throughput of 225Gbps Support SPI-3, SPI-4 150nm process 300MHz 12W UnderstandingNPs

EZ-chip Uses specialized processors for different tasks Task optimized processors (TOPs) TOPparse TOPsearch TOPresolve TOPmodify Manufactured by IBM www.ezchip.com UnderstandingNPs

Questions Can we make network processors – Faster? – Easier to use? – More powerful? – More general? – Cheaper? – All of the above?