FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室

FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室
王巍天津工业大学-Xilinx 信号传输与处理联合实验室

时序收敛流程－FloorPlanner和PACE
主要内容时序约束的概念时序收敛流程时序收敛流程－代码风格时序收敛流程－综合技术时序收敛流程－管脚约束时序收敛流程－时序约束时序收敛流程－静态时序分析时序收敛流程－实现技术时序收敛流程－FloorPlanner和PACE 2018/9/19

通过附加约束可以控制逻辑的综合、映射、布局和布线，以减小逻辑和布线延时，从而提高工作频率。获得正确的时序分析报告
附加约束的基本作用提高设计的工作频率通过附加约束可以控制逻辑的综合、映射、布局和布线，以减小逻辑和布线延时，从而提高工作频率。获得正确的时序分析报告 FPGA设计平台包含静态时序分析工具，可以获得映射或布局布线后的时序分析报告，从而对设计的性能做出评估。静态时序分析工具以约束作为判断时序是否满足设计要求的标准。指定FPGA引脚位置与电气标准 FPGA的可编程特性使电路板设计加工和FPGA设计可以同时进行，而不必等FPGA引脚位置完全确定，从而节省了系统开发时间。通过约束还可以指定I/O引脚所支持的接口标准和其他电气特性。 2018/9/19

周期约束周期(PERIOD)指参考网络为时钟的同步元件间的路径，包括：flip-flop、latch、synchronous RAM等。
周期约束不会优化以下路径：从输入管脚到输出管脚之间的路径纯组合逻辑从输入管脚到同步元件之间的路径从同步元件到输出管脚的路径周期约束路径示意图 2018/9/19

周期约束周期约束是一个基本时序和综合约束，它附加在时钟网线上，时序分析工具根据周期约束检查与同步时序约束端口（指有建立、保持时间要求的端口）相连接的所有路径延迟是否满足要求（不包括PAD到寄存器的路径）。周期是时序中最简单也是最重要的含义，其它很多时序概念会因为软件商不同略有差异，而周期的概念却是最通用的，周期的概念是FPGA/ASIC时序定义的基础概念。后面要讲到的其它时序约束都是建立在周期约束的基础上的，很多其它时序公式，可以用周期公式推导。在附加周期约束之前，首先要对电路的时钟周期有一定的估计，不能盲目上。约束过松，性能达不到要求，约束过紧，会大大增加布局布线时间，甚至效果相反。 2018/9/19

周期约束周期约束的计算设计内部电路所能达到的最高运行频率取决于同步元件本身的建立保持时间，以及同步元件之间的逻辑和布线延迟。
时钟的最小周期为：　 Tperiod= Tcko +Tlogic +Tnet +Tsetup-Tclk_skew 　Tclk_skew =Tcd1-Tcd2 其中Tcko为时钟输出时间，Tlogic为同步元件之间的组合逻辑延迟，Tnet为网线延迟，Tsetup为同步元件的建立时间，Tclk_skew为时钟信号偏斜。 2018/9/19

周期约束附加周期约束的一个例子： NET SYS_CLK PERIOD=10ns HIGH 4ns
PERIOD约束自动处理寄存器时钟端的反相问题，如果相邻同步元件时钟相位相反，那么它们之间的延迟将被默认限制为PERIOD约束值的一半。反相时钟周期约束问题的例子 Recall that the PERIOD constraint allows you to specify the clock duty cycle. The implementation tools automatically reduce the length of the constraint when some flip-flops are triggered off the negative edge of the same clock. If your HDL code contains some processes that are triggered on a rising edge and other processes that are triggered on a falling edge, your synthesis tool will create a circuit like the one shown. If you manually create an inverted clock and use that clock in your HDL code, your synthesis tool can create logic different than what is shown. This can prevent the Xilinx software from correctly constraining these paths. 2018/9/19

偏移约束偏移约束指数据和时钟之间的约束，偏移约束规定了外部时钟和数据输入输出引脚之间的时序关系，只用于与PAD相连的信号，不能用于内部信号。偏移约束示意图 2018/9/19

偏移约束偏移约束优化以下时延路径从输入管脚到同步元件偏置输入(OFFSET IN) 从同步元件到输出管脚偏置输出(OFFSET OUT)
为了确保芯片数据采样可靠和下级芯片之间正确的交换数据，需要约束外部时钟和数据输入输出引脚之间的时序关系。偏移约束的内容的时刻，从而保证与下一级电路的时序关系。告诉综合器、布线器输入数据到达的时刻，或者输出数据稳定。 2018/9/19

偏移约束 OFFSET_IN_BEFORE
说明了输入数据比有效时钟沿提前多长时间准备好，于是芯片内部与输入引脚的组合逻辑延迟就不能大于该时间（上限,最大值），否则将发生采样错误。 OFFSET_IN_AFTER 指出输入数据在有效时钟沿之后多长时间到达芯片的输入引脚，也可以得到芯片内部延迟的上限。　 2018/9/19

偏移约束 OFFSET_IN_AFTER定义的含义是输入数据在有效时钟沿之后的Tarrival时刻到达。即：
输入到达时间计算时序描述 OFFSET_IN_AFTER定义的含义是输入数据在有效时钟沿之后的Tarrival时刻到达。即：　　Tarrival=Tcko+Toutput+Tlogic 综合实现工具将努力使输入端延迟Tinput 满足以下关系： Tarrival +Tinput+Tsetup<Tperiod 其中Tinput为输入端的组合逻辑、网线和PAD的延迟之和，Tsetup为输入同步元件的建立时间, Tcko为同步元件时钟输出时间。 2018/9/19

偏移约束例子：假设Tperiod=20ns，Tcko＝1ns，Toutput＝3ns，Tlogic＝8ns，请给出偏移约束。
Tarrival=Tcko+Toutput+Tlogic＝12ns，使用OFFSET_IN_AFTER进行偏移约束为： NET DATA_IN OFFSET=IN 12ns AFTER CLK　也可以使用OFFSET_IN_BEFORE进行偏移约束，它们是等价的：　NET DATA_IN OFFSET=IN 8ns BEFORE CLK 2018/9/19

指出下一级芯片的输入数据应该在有效时钟沿之前多长时间准备好。
偏移约束 OFFSET_OUT_BEFORE 指出下一级芯片的输入数据应该在有效时钟沿之前多长时间准备好。从下一级的输入端的延迟可以计算出当前设计输出的数据必须在何时稳定下来，根据这个数据对设计输出端的逻辑布线进行约束，以满足下一级的建立时间要求，保证下一级采样数据稳定。 OFFSET_OUT_AFTER 规定了输出数据在有效时钟沿之后多长时间(上限，最大值)稳定下来，芯片内部的输出延迟必须小于这个值。 2018/9/19

偏移约束计算要求的输出稳定时间定义：Tstable= Tlogic+Tinput +Tsetup
实现工具将会努力使输出端的延迟满足以下关系： Tcko +Toutput+Tstable<Tperiod 这个公式就是Tstable必须要满足的基本时序关系，即本级的输出应该保持怎么样的稳定状态，才能保证下级芯片的采样稳定。 2018/9/19

偏移约束例子：设时钟周期为20ns，后级输入逻辑延时Tinput为4ns、建立时间Tsetup为1ns，中间逻辑Tlogic的延时为8ns，请给出设计的输出偏移约束。答案： OFFSET_OUT_BEFORE 偏移约束为： NET DATA_OUT OFFSET=OUT 13ns BEFORE CLK OFFSET_OUT_AFTER约束： NET DATA_OUT FFSET=OUT 7ns AFTER CLK　 2018/9/19

偏移约束 Given the system diagram below, what values would you put in the Constraints Editor so that the system will run at 100 MHz?（Assume no clock skew between devices） 4 ns 5 ns Upstream Device Downstream Device 2018/9/19

Path-Specific Timing Constraints
Using global timing constraints (PERIOD, OFFSET, and PAD-TO-PAD) will constrain your entire design Using only global constraints often leads to over-constrained designs Constraints are too tight Increases compile time and can prevent timing objectives from being met Review performance estimates provided by your synthesis tool or the Post-Map Static Timing Report Path-specific constraints override the global constraints on specified paths This allows you to loosen the timing requirements on specific paths The key to effective constraining is applying only the constraints that are required to communicate your performance objectives. If you specify unrealistic expectations that you do not really need to meet, your compile time will increase, and you may have difficulty getting your design to complete the Place & Route phase of implementation. Path-specific constraints provide an accurate method of communicating design performance objectives. Global constraints are very powerful and can constrain every delay path in your design. Path-specific constraints allow you to define critical timing paths that require further optimization, multicycle paths that are not required to be constrained as tightly, and false paths that are not required to be constrained at all. Path-specific timing constraints provide the implementation tools the greatest flexibility to meet your system timing objectives and are a critical part of high-performance design. 2018/9/19

Areas of your design that can benefit from path-specific constraints Multi-cycle paths Paths that cross between clock domains Bidirectional buses I/O timing Path-specific timing constraints should be used to define your performance objectives and should not be indiscriminately placed Implementing path-specific constraints on designs that contain multicycle paths or bidirectional buses is very important. Constraints placed on these designs often loosen or remove a large number of constrained paths, which gives the implementation tools a great deal of flexibility in meeting your system timing objectives. 2018/9/19

While global constraints are powerful because of their wide scope, path-specific constraints are also powerful because of their precision. By loosening or tightening the constraints on specific paths, you provide the implementation tools more flexibility and a greater chance of meeting all of your timing goals. 2018/9/19

defining the constraint length. Groups of path end points can contain flip-flops, RAMs, latches, or pads. The most commonly used path-specific timing constraints are Slow/Fast Path Exceptions and Multicycle Paths. Demo To open a project and launch the Constraints Editor: 1. Open the ISE. software. 2. Select File →Open Project. 3. Browse to the Review lab. 4. Select tc_review_lab.npl and click Open. 5. In the Source window, select the correlate_and_accumulate.ucf file 6. In the Process window, double-click Create Timing Constraints. 7. Click the Advanced tab to view the Grouping and Timing Constraints options, as shown in the figure on this page. (DEMO continues on a later page) 2018/9/19

预定标计数器假设要做一个32位的高速计数器，由于计数器的速度取决于最低位到最高位的进位延迟，为了提高速度采用了预定标计数器的结构，也就是把计数器分成一个小计数器和一个大计数器，如图所示。其中小计数器是两位的，大计数器是30位，它们由同一时钟驱动。大计数器使能端EN受小计数器进位驱动，小计数器每4个CLK进位一次，使EN持续有效一个CLK的时间，此时有效时钟沿到来大计数器加1。可见，小计数器的寄存器可能每个CLK翻转1次，低位寄存器输出的数据必须在1个CLK内到达高位寄存器的输入端，即寄存器之间的最大延时为1个CLK。而大计数器内部的寄存器每4个时钟周期才可能翻转一次，低位寄存器输出的数据在4个CLK内到达高位寄存器的输入端即可，即寄存器之间的最大延迟为4个CLK，因此降低了计数器的时序要求，可以实现规模较大的高速计数器。 2018/9/19

约束文件 2018/9/19

Path-pin offset Timing Constraints
Use the Pad to Setup and Clock to Pad columns to specify OFFSETs for all I/O paths on each clock domain. Easiest way to constrain most I/O paths However, this can lead to an over-constrained design Use the Pad to Setup and Clock to Pad columns to specify OFFSETs for each I/O pin Use this type of constraint when only a few I/O pins need different timing If you have large numbers of I/O pins with similar timing requirements, set the global OFFSET constraints to that requirement. Then use path-specific OFFSET constraints to override the global constraints for I/O pins that have different requirements. Pin-specific OFFSET In/Out constraints can be entered on the Ports tab of the Constraint Editor. You can select a large number of I/O paths by holding down the Shift key and clicking each I/O pin under the appropriate column heading. After selecting the pads, right-click and select Clock to Pad or Pad to Setup. Creating a large number of pin-specific constraints usually requires the implementation tools to take more time during Place & Route. To reduce compile time, creating group OFFSET In/Out constraints (next few pages) is recommended. Demo . Click the Ports tab to view where you can enter OFFSET constraints for specific inputs and outputs. 2018/9/19

False paths Constraints
If a PERIOD constraint were placed on this design, what delay paths would be constrained? If the goal is to optimize the input and output times without constraining the paths between registers, what constraints are needed? Assume that a global PERIOD constraint is already defined False paths are useful when your design has paths that are not required to be constrained. Most commonly, these paths are bidirectional paths that are not exercised during normal operation; however, any path that you know will meet your timing objectives can be defined as a false path. 2018/9/19

Timing Constraint Priority
False paths Must be allowed to override any timing constraint FROM THRU TO FROM TO Pin-specific OFFSETs Group OFFSETs Groups of pads or registers Global PERIOD and OFFSETs Lowest priority constraints 2018/9/19

设计是否满足面积要求---是否能在选定的器件中实现。设计是否满足性能要求---能否达到要求的工作频率。
时序收敛流程设计完成后，如何判断一个成功的设计？设计是否满足面积要求---是否能在选定的器件中实现。设计是否满足性能要求---能否达到要求的工作频率。管脚定义是否满足要求---信号名、位置、电平标准及数据流方向等。 2018/9/19

所选芯片是否有足够的资源容纳更多的逻辑？如果有，有多少？如果适合所选芯片, 能否完全成功布通?
时序收敛流程如何判断设计适合所选芯片？所选芯片是否有足够的资源容纳更多的逻辑？如果有，有多少？如果适合所选芯片, 能否完全成功布通? 手段：查看 Map Report 或者 Place & Route Report 2018/9/19

Project Navigator 产生两种时序报告： Post-Map Static Timing Report
时序收敛流程 Project Navigator 产生两种时序报告： Post-Map Static Timing Report Post-Place & Route Static Timing Report 时序报告包含没有满足时序要求的详细路径的描述，用于分析判断时序要求没有得到满足的原因。 Timing Analyzer用于建立和阅读时序报告。 2018/9/19

Post-Map Static Timing Report
时序收敛流程合理的性能约束的依据 Post-Map Static Timing Report 包括：实际的逻辑延迟和(block delays)和0.1 ns网络延迟( net delays) 合理的时序性能约束的原则：60/40 原则 If less than 60 percent of the timing budget is used for logic delays, the Place & Route tools should be able to meet the constraint easily. Between 60 to 80 percent, the software run time will increase. Greater than 80 percent, the tools may have trouble meeting your goals. The Synthesis Report is the first place where performance estimates are given. The estimate is not very accurate this early in the implementation process, but it can be an indicator of whether synthesis results are good enough to proceed to the next step. The Post-Map Static Timing Report is useful because it is based on the Xilinx timing constraints, and this report shows detailed descriptions of the longest paths covered by each constraint. The routing delays are not accurate, but performance can be estimated by using the logic delays and the 60/40 rule (covered on the next page). * If MAP is run with the timing-driven packing option, routing delays will be estimated based on logic placement and fanout. 2018/9/19

时序收敛流程 Applying full and correct constraints refers to applying constraints for all clocks in the design. Additionally, false paths and multicycle paths should be correctly constrained, as should the I/O.The timing closure flow chart was created to help achieve breakthrough performance 2018/9/19

DSP48, PowerPC processor, EMAC, MGT,
时序收敛流程性能突破只要三步： 1. 充分利用嵌入式（专用）资源 DSP48, PowerPC processor, EMAC, MGT, 　 FIFO, block RAM, ISERDES, and OSERDES, 等等。 2. 追求优秀的代码风格 Use synchronous design methodology Ensure the code is written optimally for critical paths Pipeline（ Xilinx FPGAs have abundant Registers ) 3. 充分利用synthesis工具和Place & Route工具参数选择 Try different optimization techniques Add critical timing constraints in synthesis Preserve hierarchy Apply full and correct constraints Use High effort 2018/9/19

时序收敛流程 Use embedded blocks 2018/9/19

Simple Coding Steps Yield 3x Performance
时序收敛流程 Simple Coding Steps Yield 3x Performance Use pipeline stages－more bandwidth Use synchronous reset－better system control Use Finite State Machine optimizations Use inferable resources Multiplexer Shift Register LUT (SRL) Block RAM, LUT RAM Cascade DSP Avoid high-level constructs (loops, for example) in code Many synthesis tool produce slow implementations These are just the most obvious suggestions. For every design, there may be more tricks or other clever things that can improve performance. Pipelining is the one thing that helps the most, and for most systems today, pipelining is always an option because bandwidth is what defines the system, not the latency. Latency can be important, but if it is, it is usually the latency in a different order of magnitude than the one that is caused by pipelining. FPGAs have lots of registers, so re-timing and clever use of arithmetic functions can yield tremendous performance. If designers need to balance the latency between different paths in the system, the SRLs can be used to compensate efficiently for delay differences. 2018/9/19

Use timing constraints
时序收敛流程 Synthesis guidelines Use timing constraints Define tight but realistic individual clock constraints Put unrelated clocks into different clock groups Use proper options and attributes Turn off resource sharing Move flip-flops from IOBs closer to logic Turn on FSM optimization Use the retiming option 2018/9/19

时序收敛流程 Impact of Constraints 2018/9/19
This example shows what happens when constraints are used properly. If there is no performance requirement, the tools generate a design that is as small as possible. The same is true when there are several solutions that all meet the requirements; the smallest implementation will be used. 2018/9/19

Place & Route Guidelines
时序收敛流程 Place & Route Guidelines Timing constraints Use tight, realistic constraints Recommended options High-effort Place & Route By default, effort is set to Standard Timing-driven MAP Multi-Pass Place & Route (MPPR) Tools to help meet timing Floorplanning(Use the PACE and PlanAhead software tools) Physical synthesis tools Other available options: Incremental design Modular design flows 2018/9/19

Impact of Constraints in Tools
时序收敛流程 Impact of Constraints in Tools 2018/9/19

代码风格使用同步设计技术使用Xilinx-Specific代码使用Xilinx提供的核使用层次化设计
使用ISE产生的静态时序分析报告，找出时序关键路径，并进行优化 2018/9/19

使用综合工具提供的参数选项，尤其是constraint-driven技术，可以优化设计网表，提高系统性能
综合技术使用综合工具提供的参数选项，尤其是constraint-driven技术，可以优化设计网表，提高系统性能为综合工具指定关键路径，综合工具可以提高工作级别，使用更深入的算法，减少关键路径延迟 2018/9/19

综合技术参考F1帮助信息或 XST Userguide Register Duplication
综合工具提供许多优化选择，以获得期望的系统性能和面积要求 Register Duplication Timing-Driven Synthesis Timing Constraint Editor FSM Extraction Retiming Hierarchy Management Schematic Viewer Error Navigation Cross-Probing Physical Optimization 参考F1帮助信息或 XST Userguide 2018/9/19

Duplicating Flip-Flops
综合技术 Duplicating Flip-Flops High-fanout nets can be slow and hard to route Duplicating flip-flops can fix both problems Reduced fanout shortens net delays Each flip-flop can fanout to a different physical region of the chip to reduce routing congestion Design trade-offs Gain routability and performance Increase design area Increase fanout of other nets D Q fn1 2018/9/19

Timing-Driven Synthesis
综合技术 Timing-Driven Synthesis Synplify, Precision, and XST software Timing-driven synthesis uses performance objectives to drive the optimization of the design Based on your performance objectives, the tools will try several algorithms to attempt to meet performance while keeping the amount of resources in mind Performance objectives are provided to the synthesis tool via timing constraints Synplify software: Communicate constraints via SCOPE. Precision software: Communicate constraints by entering them in a constraint file (SDC file) or by entering them individually for each clock from the hierarchy window. XST software: Communicate constraints via the XCF. For more information, see the XST User Guide in the online software documents (toolbox.xilinx.com/docsan/xilinx8/books/manuals.pdf) 2018/9/19

Timing-Driven Synthesis
综合技术 Timing-Driven Synthesis 实施period约束和input/output约束(.xcf文件) 通常，根据期望的性能目标进行1.5X－2X的过约束，综合工具会提高工作级别，有利于在实现中更容易满足时序目标切记：如果使用过约束，不要把这些约束传递给实现工具使用Multi-cycle和false paths约束使用Critical path约束，对Critical path进行优化 2018/9/19

Synplify, Precision, and XST software
综合技术 Retiming Synplify, Precision, and XST software Retiming: The synthesis tool automatically tries to move register stages to balance combinatorial delay on each side of the registers Before Retiming D Q After Retiming To access retiming: Synplify software: Enable under Implementation Options or the Retiming option in the Run window in the Synplify Pro software (Synplify  Options  Configure VHDL or Verilog Compiler). Precision software: Check the box in the Setup Design dialog box. XST: Enable under the Properties dialog box for Synthesize  Xilinx Specific Options  Register balancing. Retiming results will be design dependent. In some situations, retiming may not provide any benefit (highly pipelined designs); however, it may improve performance for some designs. D Q 2018/9/19

综合技术 Hierarchy Management Synplify, Precision, and XST software
The basic settings are: Flatten the design: Allows total combinatorial optimization across all boundaries Maintain hierarchy: Preserves hierarchy without allowing optimization of combinatorial logic across boundaries If you have followed the synchronous design guidelines, use the setting 　　　 -maintain hierarchy If you have not followed the synchronous design guidelines, use the setting 　 -flatten the design Your synthesis tool may have additional settings Refer to your synthesis documentation for details on these settings To access hierarchy control: Synplify software: SCOPE Constraints Editor Synplify also has an additional setting: Maintain hierarchy but allow optimization. This setting allows combinatorial logic to be optimized while maintaining hierarchy in the netlist (setting in Synplify is “firm”). Precision software: After compiling the design, right-click Modules in the Design Hierarchy window and select Preserve Hierarchy or Flatten Hierarchy. XST: Turn on the Advanced Property Display level in the Edit  Preferences dialog box. Then look under Properties for the Synthesize process  Synthesis Options tab  Keep Hierarchy. 2018/9/19

Hierarchy Preservation Benefits
综合技术 Hierarchy Preservation Benefits Easily locate problems in the code based on the hierarchical instance names contained within static timing analysis reports Enables floorplanning and incremental design flow The primary advantage of flattening is to optimize combinatorial logic across hierarchical boundaries If the outputs of leaf-level blocks are registered, there is no need to flatten Registering outputs of each leaf-level block is part of the synchronous design techniques methodology. Registering the output boundaries helps because you know the delays from one block to the next. That is, the delays are not variable based on combinatorial outputs. Logic cannot be optimized across a registered boundary. Therefore, if you do register outputs, you know the delay is minimized from one hierarchical or functional block to the next and you also know that no logic optimization can occur across hierarchical domains. In addition to the benefits listed above, preserving hierarchy has the added benefit of limiting name changes to registers.thus, the element names used in a UCF will generally not change. However, if you flatten the design, the register and element names and hierarchical path and references in a flattened design can change from one iteration to the next. In this case, maintaining the UCF can be quite a burden. However, preserving hierarchy can prevent register balancing (retiming) and register duplication. The benefits of preserving hierarchy generally outweigh the benefits of flattening except when you have combinatorial outputs. And in general, preserve hierarchy for large designs. For smaller designs, preserve the hierarchy if you registered leaf-level outputs; otherwise, you might consider flattening the design. If you flatten the design, remember the extra burdens of name changes (UCF and static timing analysis) from one iteration to the next and the limits on floorplanning. 2018/9/19

管脚约束管脚约束通常在设计早期就要确定下来，以保证电路板的设计同步进行
对高速设计、复杂设计和具有大量I/O管脚的设计，Xilinx推荐手工进行管脚约束实现工具可以自动布局逻辑和管脚，但是一般来说不会是最优的管脚约束可以指导内部数据流向,不合理的管脚布局很容易降低系统性能合理的管脚布局需要对所设计系统和Xilinx器件结构的详细了解，如要考虑I/O bank、I/O电气标准等时钟(单端或差分)必须约束在专用时钟管脚注意：时钟资源数量的限制最后使用dual-purpose管脚(如配置和DCI管脚) 2018/9/19

用于控制信号的I/O置于器件的顶部或底部控制信号垂直布置用于数据总线的I/O置于器件的左部和右部数据流水平布置。
管脚约束根据数据流指导管脚约束用于控制信号的I/O置于器件的顶部或底部控制信号垂直布置用于数据总线的I/O置于器件的左部和右部数据流水平布置。以上布局方法可以充分利用Xilinx器件的资源布局方式进位链排列方式块RAM，乘法器位置 2018/9/19

管脚约束使用PACE进行管脚约束 2018/9/19

时序约束如果实现后性能目标得到满足，则设计完成否则，施加特定路径时序约束
施加multi-cycle，false path和关键路径约束，实现工具会优先考虑这些特定路径约束 2018/9/19

Post-map：Map后，使用Post-map timing report确定关键路径的逻辑延迟
静态时序分析 Post-map：Map后，使用Post-map timing report确定关键路径的逻辑延迟 Post-PAR：PAR后，使用Post-PAR static timing report确定时序约束是否满足 Logic delay Vs. Routing delay：60%/40%原则 Timing Analyzer可以读取时序报告，查找关键路径，并与Floorplanner协同解决时序问题 2018/9/19

静态时序分析 Report Example 2018/9/19

Analyzing Post-Place & Route Timing
静态时序分析 Analyzing Post-Place & Route Timing There are many factors that contribute to timing errors, including Neglecting synchronous design rules or using incorrect HDL coding style Poor synthesis results (too many logic levels in the path) Inaccurate or incomplete timing constraints Poor logic mapping or placement Each root cause has a different solution Rewrite HDL code Add timing constraints Resynthesize or re-implement with different software options Correct interpretation of timing reports can reveal the most likely cause Therefore, the most likely solution 2018/9/19

静态时序分析 Case1 What is the primary cause of the timing failure? . The net_2 signal has a long delay and low fanout . Most likely cause is poor placement 2018/9/19

Poor Placement: Solutions
静态时序分析 Poor Placement: Solutions Increase Placement effort level (or Overall effort level) Timing-driven packing, if the placement is caused by packing unrelated logic together Cross-probe to the Floorplanner to see what has been packed together This option is covered in the .Advanced Implementation Options. module PAR extra effort or MPPR options Covered in the .Advanced Implementation Options. module Floorplanning or Relative Location Constraints (RLOCs) if you have the skill 2018/9/19

静态时序分析 Case2 What is the primary cause of the timing failure? . The signal net_2 has a long delay, but the fanout is not low . Most likely cause is high fanout 2018/9/19

High Fanout: Solutions
静态时序分析 High Fanout: Solutions Most likely solution is to duplicate the source of the high-fanout net the net is the output of a flip-flop, the solution is to duplicate the flip-flop Use manual duplication (recommended) or synthesis options If the net is driven by combinatorial logic, locating the source of the net in the HDL code may be more difficult Use synthesis options to duplicate the source 2018/9/19

静态时序分析 Case3 What is the primary cause of the timing failure? . There are no really long delays, but there are a lot of logic levels (7) 2018/9/19

Too Many Logic Levels: Solutions
静态时序分析 Too Many Logic Levels: Solutions The implementation tools cannot do much to improve performance The netlist must be altered to reduce the amount of logic between flip-flops Possible solutions Check whether the path is a multicycle path If yes, add a multicycle path constraint Use the retiming option during synthesis to distribute logic more evenly between flip-flops Confirm that good coding techniques were used to build this logic (no nested if or case statements) Add a pipeline stage 2018/9/19

实现技术 R&R参数选项：Effort Level
Xilinx推荐：第一遍实现时，使用全局时序约束和缺省的实现参数选项。如果不能满足时序要求：尝试修改代码，如使用合适的代码风格，增加流水线等修改综合参数选项，如Optimization Effort ，Use Synthesis Constraints File ，Keep Hierarchy ，Register Duplication，Register Balancing 等增加PAR Effort Level Apply path-specific timing constraints for synthesis and implementation 2018/9/19

实现技术和PAR一样，可以使用Map-timing参数选项针对关键路径进行约束。如参数 “Timing-Driven Packing and Placement ”给关键路径以优先时序约束的权利。用户约束通过Translate过程从User Constraints File (UCF ) 中传递到设计中。 2018/9/19

Timing-Driven Packing
实现技术 Timing-Driven Packing Timing constraints are used to optimize which pieces of logic are packed into each slice Normal (standard) packing is performed PAR is run through the placement phase Timing analysis analyzes the amount of slack in constrained paths If necessary, packing changes are made to allow better placement The output of MAP contains both mapping and placement information The Post-Map Static Timing Report contains more realistic net delays Place & Route runtime is reduced because some placement is already done 2018/9/19

Originally, the flip-flops were packed together into a slice.
实现技术 Example Originally, the flip-flops were packed together into a slice. After placement and timing analysis, the flip-flops are packed into different slices to allow independent movement In this simple example, two flip-flops were originally packed into one slice. They may share common inputs or the packing may be necessary to fit the design into the target device. During placement, it becomes clear that FF1 should move to the top of the die and FF2 should move to the bottom (in order to meet timing constraints). If timing-driven packing is enabled, the design goes back into the MAP process with this knowledge. The flip-flops will be packed into two separate slices to allow independent movement. 2018/9/19

实现技术 Trade-Offs Typical performance improvement: Five to eight percent
Density improvements are also seen Has the greatest effect on high-density designs when unrelated packing has occurred Look in the Map Report, Design Summary section Number of slices containing unrelated logic If no unrelated packing has occurred, performance improvement will be minimal Runtime for the MAP process always increases Up to 200 percent But you recover some of this increased runtime by saving runtime during Place & Route Unrelated packing occurs when the software puts unrelated logic into the same slice to fit the design into the target device. This can affect performance because the pieces of logic in the slice may need to be placed in different locations to meet timing. Timing-driven packing can fix this situation. Timing analysis will show that the unrelated logic needs to be separated to meet timing. If no unrelated packing has occurred, the only change that timing-driven packing can make to the design is to merge flip-flops into IOBs to meet OFFSET constraints. 2018/9/19

MPPR：对同一个设计运行PAR多次，试图找到最可能满足设计要求的结果，保留作为设计结果
实现技术 MPPR和PAR Extra Effort MPPR：对同一个设计运行PAR多次，试图找到最可能满足设计要求的结果，保留作为设计结果当最高级别的PAR Effort Level被选择时，PAR Extra Effort可选三种选择：None,Normal和Continue on impossible 典型情况下，大约可以提高4%的性能通常PAR消耗更多的时间(增加200%以上) MPPR uses the same software program as a normal Place & Route, but each iteration starts with a different random placement of logic. There are 100 cost tables to choose from (1 to 100). By default, the software uses cost table 1 every time you implement your design. The Xilinx tools assign two scores to each result based on how close together related logic was placed and how close the design is to meeting timing. MPPR automatically compares the scores and saves only the best results. This saves disk space, and you do not have to compare the results manually. Starting Placer Cost Table: Start with Cost Table 1, even though the tools have already used it. This ensures a comparison of all results to your initial attempts that used cost table 1. Number of PAR Iterations: Look at a Place & Route report and find the runtime stamp at the end of the file. Use this as a basis to compute how many cost tables you can run in a given amount of time. If you run too many iterations, you can abort MPPR and save the current results. If you select zero iterations, MPPR tries every cost table until finding one that meets your performance goals. Consider using High for the Placer Effort Level option and Standard for the Router Effort Level option. This will ensure faster run times in finding the best placement. Use re-entrant routing on the best MPPR result to achieve best routing. Number of Results to Save: Save at least two or three results. If you select zero, MPPR saves all results. Save Results in Directory: Each saved result will have its own subdirectory. The best results are automatically copied back into the main project directory of the ISE. software. Nodelist File: This file contains names of Unix machines to run different PAR implementations (each using a different cost table) in parallel. 2018/9/19

Floorplanning和PACE 使用Floorplanning和PACE指导逻辑布局性能可能更坏!!!
如果时序有提高，但还是不能满足要求，使用MPPR Map-timing与Floorplanning不能很好配合 2018/9/19

尽量使用前面提高的时序收敛流程，而不使用这个工具，除非：非常了解这个设计非常了解Xilinx器件结构非常了解Xilinx工具软件的使用
Floorplanning和PACE 尽量使用前面提高的时序收敛流程，而不使用这个工具，除非：非常了解这个设计非常了解Xilinx器件结构非常了解Xilinx工具软件的使用使用Floorplanner的好处(如果你有足够的使用技巧)：在大型设计中，Floorplanner可以为实现工具提供设计的布局指导有助于减少实现运算时间，提高系统性能在incremental design技术和modular设计技术中需要使用Floorplanner 2018/9/19

区域约束(Area Constraints)
Floorplanning和PACE 区域约束(Area Constraints) Area Constraints是Floorplanner最容易、最有效的应用大型设计首选布局工具- Floorplanner 在综合中，为了防止单独的component名称被改变，选择“Keep Hierarchy”参数选项设计的每个组成部分可以被约束限定到某一个区域更高级的升级设计工具是：Planahead 2018/9/19

感谢各位老师！ 2018/9/19

FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室

Similar presentations

Presentation on theme: "FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

FPGA设计时序收敛 天津工业大学-Xilinx 信号传输与处理联合实验室

Similar presentations

Presentation on theme: "FPGA设计时序收敛 天津工业大学-Xilinx 信号传输与处理联合实验室"— Presentation transcript:

Similar presentations

About project

反馈

FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室

Presentation on theme: "FPGA设计时序收敛天津工业大学-Xilinx 信号传输与处理联合实验室"— Presentation transcript: