3. FPGAs vs. ASICsFPGAs???货架产品
(Off-the-shelf)开发时间短
(Short time to the market)开发成本低、风险低
(Low development costs)可重构
(Reconfigurability)??????ASICs
4. FPGAs vs. ASICsASICs高性能
(High performance)低功耗
(Low power)低成本
(Low cost,but only
in high volumes)FPGAs货架产品
(Off-the-shelf)开发成本低、风险低
(Low development costs)可重构
(Reconfigurability)开发时间短
(Short time to the market)
5. 51xFPGA(Virtex II 6000)19.80 mm19.68 mm2.7 mm2.82 mmArea of Xilinx Virtex II 6000
FPGA
(estimation by R.J. Lim Fong,
MS Thesis, VPI, 2004) Area of an ASIC with equivalent functionality面积对比ASIC (130 nm 工艺)
15. 什么是逻辑综合逻辑综合的主要目的
域的转换 (Domain Translation)
Translate RTL into netlist
优化(Optimization)
Do timing and area optimizationDomain translation Optimization
16. 面积和速度的平衡(1/2)速度面积优化约束(Optimization Constraint)
Timing optimization constraint
set_max_delay 25 from a to z
Area optimization constraint
set_max_area 0
设计规则(Design Rule)
Set_max_transition
Set_max_fan_out
Etc.
19. 什么是TCL脚本语言Created by John Ousterhout of UC Berkeley
Scripting Language
Very simple to automate routine tasks.
Extension Language
Used to customize tools with user/company specific aplications.
Nearly all of modern EDA tools have a TCL interface.
Very simple to learn and use.
29. Set Analysis Condition “Single” typical – typical for normal design
or “Min/max case” slow – fast for conservative design
Select one of the wire load model
30. 5. 建立约束
31. 设定优化约束Design Constraints ...
Max area : 0 to get the min. area
32. Timing constraints ...
Combinational Circuit :
Select the in/out pin which you want to specify
Attributes -> Optimization Constraints
-> Timing Constraints
Sepcicy your target and then press “Ok”
34. 6. 编译设计Choose Design > Compile Design.
Map effort : medium
Area effort : medium
Click “Ok” to begin compiling.
35. 7. 保存设计File -> Save info -> Design Setup ( your_design.dc)
save all your design constraints, you may load it : File -> Ececute Script ...
File -> Save info -> Design Timing (your_design.sdf)
Save delay timing information with Standard Delay Format v1.0. It will be referenced
during gate level simuation.
36. `include "MLMS1.gv"
`include "umc18.v"
module test_MLMS1_gv();
parameter ADC_bits = 4, W1_bits = 16,
M = 40, step_size = -8;
reg clk,rst;
reg [ADC_bits*M-1:0] x;
reg [ADC_bits-1:0] mem [40000:0];
reg [ADC_bits-1:0] temp;
reg signed [W1_bits-1:0] temp2;
wire [M-1:0] y;
wire [W1_bits*M-1:0] e;
MLMS1 u1(clk,rst,x,y,e);
always #8 clk = ~clk;
initial begin
$readmemh("test_MLMS1.dat",mem);
fid = $fopen("MLMS1_result_gv.dat");
$fsdbDumpvars;
$fsdbDumpfile("MLMS1_gv.fsdb");
$sdf_annotate("MLMS1.sdf",u1);
...File - > Save as -> select Format (Verilog)
Save synthesis result .... (your_design.v) or (your_design.gv)you should include your_design.gv and your_design.sdf in your test file.
37. 8. 生成报告Timing
1. Select Top in the logical hierarchy view.
2. Choose Timing > Report Timing Paths
3. Click “Ok”
Note:
Slack is defined as the time difference between the timing goal for a path and its actual timing. Paths that meet the design’s timing goals have positive slack valuesWorst Slack Timing Report
...
u81/WW1[14] 0.00 14.15 f
W1_reg[14]/D (SDFFTRX4) 0.00 14.15 f
data arrival time 14.15
clock CLK_0 (rise edge) 10.00 10.00
clock network delay (ideal) 0.00 10.00
W1_reg[14]/CK (SDFFTRX4) 0.00 10.00 r
library setup time -0.39 9.61
data required time 9.61
----------------------------------------------------------------------------------
data required time 9.61
data arrival time -14.15
-----------------------------------------------------------------------------------
slack (VIOLATED) -4.54
38. Area (equivilent gate counts)
1. Select Top in the logical hierarchy view.
2. Choose Design > Report Area ...
3. Click “Ok”Power
1. Select Top in the logical hierarchy view.
2. Choose Design > Report Power ...
3. Click “Ok”
39. 前端设计另一个重要内容:静态时序分析
(Static Timing Analysis)
40. Types of paths
41. Basic STA concepts: Timing PathsD Q
FF2
D Q
FF1
OUTPUTINPUTCLOCKTiming Point
Each path has a startpoint and an endpoint
Timing path Startpoints
- Input ports,
- Clock pins of flip-flops
Timing path Endpoints
- Output ports,
- all input pins of flip-flops except clock pins
42. Setup and Hold timeSetup time: the time required for the data to be stable before the clock edge
Hold time: the time required for the data to remain stable after the clock edge
43. Setup and Hold timeCLKD2=Q1Q2FF1FF2D1Q1D2Q2CLKLaunch EdgeCapture EdgeCQCQhold timesetup time
44. FF1Setup time CheckCLKD2FF2D1Q1D2Q2CLKLaunch EdgeCapture EdgeCombo logic54.5ns4.9setup violationCLK0setup time0.34.70.4ns
45. Hold time CheckCLK1D2=Q1FF1FF2D1Q1D2Q2CLK1CQCLK20.3ns0.4CLK2hold time0.2Hold violation0.4nsLaunch EdgeCapture Edge
46. Setup Check ExampleD Q
FF2
D Q
FF1
CALCULATION:
Arrival time (max) = clock delay FF1 (max) +clock-to-Q delay FF1 (max) + comb. Delay( max)
Required time = clock adjust + clock delay FF2 (min) - set up time FF2
Slack = Required time - Arrival time (since we want data to arrive before it is required)
clock adjust = clock period (since setup is analyzed at next edge)
47. Hold check ExampleCALCULATION:
Arrival time = clock delay FF1 (min) +clock-to-Q delay FF1 (min) + comb. Delay( min)
Required time = clock adjust + clock delay FF2 (max) + hold time FF2
Slack = Arrival time - Required time (since we want data to arrive after it is required)
clock adjust = 0 (since hold is analyzed at same edge)D Q
FF2
D Q
FF1
48. Sections of a timing report
49. VHDL可综合代码设计
50. Recommended rules for SynthesisWhen implementing combinational paths do not have hierarchy
Register all outputs
Do not implement glue logic between blocks, partition them well
Separate designs on functional boundary
Keep block sizes to a reasonable size
51. Avoid hierarchical combinational blocksThe path between reg1 and reg2 is divided between three different block
Due to hierarchical boundaries, optimization of the combinational logic cannot be achieved
Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinational optimization cannot be achieved between blocks (unless “grouping” is used).
52. Recommend way to handle Combinational PathsAll the combinational circuitry is grouped in the same block that has its output connected the destination flip flop
It allows the optimal minimization of the combinational logic during synthesis
Allows simplified description of the timing interface
53. Register all outputsSimplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays)
Don’t really need to specify output requirements since paths starts at flip flop outputs.
Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output)
54. NO GLUE LOGIC between blocksDue to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!!
At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block.
55. Separate design with different goalsreg1 may be driven by time critical function, hence will have different optimization constraints
reg3 may be driven by slow logic, hence no need to constrain it for speed
56. Optimization based on design requirementsUse different entities to partition design blocks
Allows different constraints during synthesis to optimize for area or speed or both.
57. Separate FSM with random logicSeparation of the FSM and the random logic allows you to use FSM optimized synthesis
58. Maintain a reasonable block sizePartition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent)
Larger the blocks, longer the run time -> quick iterations cannot be done.