• 1. 第十一章:VHDL的ASIC实现(1)电子科技大学 微电子与固体电子学院
    • 2. 本章重点了解数字ASIC设计的基本流程 掌握逻辑综合的基本概念和方法 逻辑综合的原理和步骤 DB库的作用 掌握全主动布局布线的基本概念和方法 布图规划(floorplaning) 布局(placement) 布线(routing) 验证(DRC和LVS)
    • 3. FPGAs vs. ASICsFPGAs???货架产品 (Off-the-shelf)开发时间短 (Short time to the market)开发成本低、风险低 (Low development costs)可重构 (Reconfigurability)??????ASICs
    • 4. FPGAs vs. ASICsASICs高性能 (High performance)低功耗 (Low power)低成本 (Low cost,but only in high volumes)FPGAs货架产品 (Off-the-shelf)开发成本低、风险低 (Low development costs)可重构 (Reconfigurability)开发时间短 (Short time to the market)
    • 5. 51xFPGA(Virtex II 6000)19.80 mm19.68 mm2.7 mm2.82 mmArea of Xilinx Virtex II 6000 FPGA (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004) Area of an ASIC with equivalent functionality面积对比ASIC (130 nm 工艺)
    • 6. ASIC的流片成本
    • 7. ASIC应该如何设计
    • 8. ASIC的设计思想速度 (speed)面积 (area)功耗 (power)可测性 (testability)(Trade-off)ASIC的设计是一个“折中”的过程
    • 9. ASIC的设计方法全定制的设计方法 适用于小规模电路 对性能有特殊要求(例如:存储器、CPU关键模块) 半定制的设计方法 基于标准单元(Standard Cell)的设计方法优点: 1、具有灵活的布图方式; 2、标准单元存于单元库中,可以极大提高设计效率 3、布通率高,达到100% 4、便于从更层次关注电路的优化和性能 5、自动化程度高、设计周期短、设计效率高、适合利用EDA工具
    • 10. 主要EDA工具的供应商
    • 11. ASIC设计的阶段划分规格RTL (Spec to RTL)RTL电路图 (RTL to Netlist)电路图版图 (Netlist to GDSII)ASIC的规格定义代工厂 (Foundry)前端设计 (Front End)后端设计 (Back End)
    • 12. Top-Down 的ASIC设计流程算法或模型的建立行为级仿真生成门级网表文件RTL级描述RTL级仿真逻辑综合、优化门级仿真、时序分析行 为 描 述系统功能描述元器件 模型库
    • 13. 布 图 规 划布 局 布 线设计规则检查(DRC)版图参数提取(LPE)电学规则检查(ERC)一致性检查(LVS)后 仿 真版图生成PLD映射、布局布线时 序 检 查输出位流文件ASIC实现FPGA实现
    • 14. 本章重点内容:逻辑综合
    • 15. 什么是逻辑综合逻辑综合的主要目的 域的转换 (Domain Translation) Translate RTL into netlist 优化(Optimization) Do timing and area optimizationDomain translation Optimization
    • 16. 面积和速度的平衡(1/2)速度面积优化约束(Optimization Constraint) Timing optimization constraint set_max_delay 25 from a to z Area optimization constraint set_max_area 0 设计规则(Design Rule) Set_max_transition Set_max_fan_out Etc.
    • 17. 面积和速度的平衡(2/2)
    • 18. 逻辑综合的EDA软件Synopsy的Design Compiler 我们一般简称其为DC 学习阶段使用图形界面,熟练后一般使用脚本(TCL Script)
    • 19. 什么是TCL脚本语言Created by John Ousterhout of UC Berkeley Scripting Language Very simple to automate routine tasks. Extension Language Used to customize tools with user/company specific aplications. Nearly all of modern EDA tools have a TCL interface. Very simple to learn and use.
    • 20. DC的输入和输出
    • 21. 什么是DB文件逻辑单元库文件,文件后缀为.db 包含标准单元的信息 时序信息 门延迟(Gate Delay) 输入/输出延迟(Input/Output Delay) 连线延迟(Wire Delay) 转换时间(Transition Time) 功耗 面积 驱动能力 负载
    • 22. 什么是线负载模型(Wire Load Model)线负载(Wire Loads) 计算连线长度 根据以前芯片进行统计分析 推算连线的电容和驱动 线负载查找表Net LoadNet fanoutNet fanoutNet Resistance1230.0300.0600.04540.01510.01220.01630.02040.024
    • 23. 1. 检查设置Link lib. : Design Ware information Target lib.: wire model
    • 24. 2. 读入设计文件File -> Analyze -> Add Analyze all your design1234
    • 25. File -> Elaborate -> (select list of Design) Elaborate the file “TOP.v “ 5
    • 26. Hierarchy -> Uniquify -> Hierarchy Give each submodule a different name
    • 27. 3. 选择当前设计topcreate symbol
    • 28. 4. 指定工作环境Attributes -> Operating Environment -> Operating Conditions ... Wire Load ...
    • 29. Set Analysis Condition “Single” typical – typical for normal design or “Min/max case” slow – fast for conservative design Select one of the wire load model
    • 30. 5. 建立约束
    • 31. 设定优化约束Design Constraints ... Max area : 0 to get the min. area
    • 32. Timing constraints ... Combinational Circuit : Select the in/out pin which you want to specify Attributes -> Optimization Constraints -> Timing Constraints Sepcicy your target and then press “Ok”
    • 33. Sequential Circuit : Select clock pin “clk” Attributes -> Specify Clock ... Period : 12 , Rising : 0, Falling : 6 Press “Ok”
    • 34. 6. 编译设计Choose Design > Compile Design. Map effort : medium Area effort : medium Click “Ok” to begin compiling.
    • 35. 7. 保存设计File -> Save info -> Design Setup ( your_design.dc) save all your design constraints, you may load it : File -> Ececute Script ... File -> Save info -> Design Timing (your_design.sdf) Save delay timing information with Standard Delay Format v1.0. It will be referenced during gate level simuation.
    • 36. `include "MLMS1.gv" `include "umc18.v" module test_MLMS1_gv(); parameter ADC_bits = 4, W1_bits = 16, M = 40, step_size = -8; reg clk,rst; reg [ADC_bits*M-1:0] x; reg [ADC_bits-1:0] mem [40000:0]; reg [ADC_bits-1:0] temp; reg signed [W1_bits-1:0] temp2; wire [M-1:0] y; wire [W1_bits*M-1:0] e; MLMS1 u1(clk,rst,x,y,e); always #8 clk = ~clk; initial begin $readmemh("test_MLMS1.dat",mem); fid = $fopen("MLMS1_result_gv.dat"); $fsdbDumpvars; $fsdbDumpfile("MLMS1_gv.fsdb"); $sdf_annotate("MLMS1.sdf",u1); ...File - > Save as -> select Format (Verilog) Save synthesis result .... (your_design.v) or (your_design.gv)you should include your_design.gv and your_design.sdf in your test file.
    • 37. 8. 生成报告Timing 1. Select Top in the logical hierarchy view. 2. Choose Timing > Report Timing Paths 3. Click “Ok” Note: Slack is defined as the time difference between the timing goal for a path and its actual timing. Paths that meet the design’s timing goals have positive slack valuesWorst Slack Timing Report ... u81/WW1[14] 0.00 14.15 f W1_reg[14]/D (SDFFTRX4) 0.00 14.15 f data arrival time 14.15 clock CLK_0 (rise edge) 10.00 10.00 clock network delay (ideal) 0.00 10.00 W1_reg[14]/CK (SDFFTRX4) 0.00 10.00 r library setup time -0.39 9.61 data required time 9.61 ---------------------------------------------------------------------------------- data required time 9.61 data arrival time -14.15 ----------------------------------------------------------------------------------- slack (VIOLATED) -4.54
    • 38. Area (equivilent gate counts) 1. Select Top in the logical hierarchy view. 2. Choose Design > Report Area ... 3. Click “Ok”Power 1. Select Top in the logical hierarchy view. 2. Choose Design > Report Power ... 3. Click “Ok”
    • 39. 前端设计另一个重要内容:静态时序分析 (Static Timing Analysis)
    • 40. Types of paths
    • 41. Basic STA concepts: Timing PathsD Q FF2 D Q FF1 OUTPUTINPUTCLOCKTiming Point Each path has a startpoint and an endpoint Timing path Startpoints - Input ports, - Clock pins of flip-flops Timing path Endpoints - Output ports, - all input pins of flip-flops except clock pins
    • 42. Setup and Hold timeSetup time: the time required for the data to be stable before the clock edge Hold time: the time required for the data to remain stable after the clock edge
    • 43. Setup and Hold timeCLKD2=Q1Q2FF1FF2D1Q1D2Q2CLKLaunch EdgeCapture EdgeCQCQhold timesetup time
    • 44. FF1Setup time CheckCLKD2FF2D1Q1D2Q2CLKLaunch EdgeCapture EdgeCombo logic54.5ns4.9setup violationCLK0setup time0.34.70.4ns
    • 45. Hold time CheckCLK1D2=Q1FF1FF2D1Q1D2Q2CLK1CQCLK20.3ns0.4CLK2hold time0.2Hold violation0.4nsLaunch EdgeCapture Edge
    • 46. Setup Check ExampleD Q FF2 D Q FF1 CALCULATION: Arrival time (max) = clock delay FF1 (max) +clock-to-Q delay FF1 (max) + comb. Delay( max) Required time = clock adjust + clock delay FF2 (min) - set up time FF2 Slack = Required time - Arrival time (since we want data to arrive before it is required) clock adjust = clock period (since setup is analyzed at next edge)
    • 47. Hold check ExampleCALCULATION: Arrival time = clock delay FF1 (min) +clock-to-Q delay FF1 (min) + comb. Delay( min) Required time = clock adjust + clock delay FF2 (max) + hold time FF2 Slack = Arrival time - Required time (since we want data to arrive after it is required) clock adjust = 0 (since hold is analyzed at same edge)D Q FF2 D Q FF1
    • 48. Sections of a timing report
    • 49. VHDL可综合代码设计
    • 50. Recommended rules for SynthesisWhen implementing combinational paths do not have hierarchy Register all outputs Do not implement glue logic between blocks, partition them well Separate designs on functional boundary Keep block sizes to a reasonable size
    • 51. Avoid hierarchical combinational blocksThe path between reg1 and reg2 is divided between three different block Due to hierarchical boundaries, optimization of the combinational logic cannot be achieved Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinational optimization cannot be achieved between blocks (unless “grouping” is used).
    • 52. Recommend way to handle Combinational PathsAll the combinational circuitry is grouped in the same block that has its output connected the destination flip flop It allows the optimal minimization of the combinational logic during synthesis Allows simplified description of the timing interface
    • 53. Register all outputsSimplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays) Don’t really need to specify output requirements since paths starts at flip flop outputs. Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output)
    • 54. NO GLUE LOGIC between blocksDue to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!! At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block.
    • 55. Separate design with different goalsreg1 may be driven by time critical function, hence will have different optimization constraints reg3 may be driven by slow logic, hence no need to constrain it for speed
    • 56. Optimization based on design requirementsUse different entities to partition design blocks Allows different constraints during synthesis to optimize for area or speed or both.
    • 57. Separate FSM with random logicSeparation of the FSM and the random logic allows you to use FSM optimized synthesis
    • 58. Maintain a reasonable block sizePartition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent) Larger the blocks, longer the run time -> quick iterations cannot be done.