System Verilog - Semicon IC Design

History of VLSI Design

1. Introduction to ASIC - SoC Design

An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use. For example, a chip designed solely to run a cell phone is an ASIC.

As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million. Modern ASICs often include entire 32-bit processors, memory blocks including ROM, RAM, EEPROM, Flash and other large building blocks. Such an ASIC is often termed a SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL), such as Verilog or VHDL, to describe the functionality of ASICs.

SoC is a collection of components and/or subsystems (designed as Ips, ASICs) interconnected to perform the specified functions. Entire system is built on a single piece of silicon. SoC includes a processor, memory, DSP Cores, IO devices, interfaces to external circuitry, and custom IPs as Verilog or VHDL modules.

SoC consists of Very large transistor counts on a single IC. Mixed technologies, mixed design on the same chip are commonly used. They include – Digital, analog, FPGA, full custom, semi-custom, etc. A SoC usually contains one or more processors and co-processors, internal memory and memory controllers for external memory, buses/interconnect architectures such as AMBA, peripherals (timer, interrupt controller, etc.), I/O channels amd many more.

Thus SoC is essentially a mixture of ASIC, including full-custom and semi-custom (standard cells), reusable Intellectual Property (IP) blocks (also called macro, hard macro, cores) as shown in below diagram.

IP core based design approach mainly intended for reducing design complexity and time to market. There are different IP cores supplied by different vendors in different technologies of different specifications. Customizable soft cores provides essential set of preverified parameters to cinfigure acording to the customer requirement Interface logic generally support standard buses to ease integration. Bus-based architectures such as IBM core-connect, Motorola IP-bus, ARM's advanced microcontroller bus architecture (AMBA), etc that facilitate core-based SoC design.

1.1. IP Core Types

1.1.1. Soft Cores (RTL format)

Technology independent synthesizable RTL description written using any HDL. These can be modifiable and can be extended functionally.

1.1.2. Firm Cores (netlist format)

These IP cores are Gate-level Technology dependent netlists. Internal implementation of core cannot be modified. User can partially modify netlist by parameterizing I/O and to remove unwanted functionality.

1.1.3. Hard Cores (GDSII format)

Tehnology dependant Layout & timing information provided in the format of GDSII. Integration is simple & can result in highly predictable performance.

1.2. SoC Design Issues

·         Architectural complexity has manufolded due to significant advancement in technolgy.

·         Deed submicron and nanometer technological fabrication processes are very much sensitive to variations in process, voltage and temperature.

·         System verification and integration challenges are increasing as analog/RF, digital hardware, non-electrical parts such as sensors, actuators etc are becoming part of SoC.

·         Normal ASIC design methodology is itelf too costly for SoC. Cost to develop is very high.

2.      IC Design Methodologies

There are basically 2 types of methodology:

· Full Custom Design
· Semi-Custom Design

2.1 Full Custom IC Design

Full custom design flow is depicted in below figure.

Based on the schematic transistor level circuit is drawn. W/l ratio is decided; layout does mask set is required for manufacturing the transistors. A full custom IC includes logic cells that are customized and all mask layers that are customized. Full custom IC design and manufacturing take large lead time. Typically full custom design flow includes CMOS sizing and manual layout. Even today full custom design flow is used for building analog blocks and analog Ics. There are no efficient HDLs available yet to model analog blocks owing to inherent characteristics of analog circuitry.

Thus in full custom flow every transistor is designed and drawn by hand. Till date this is most successful (only) way to design analog portions of ASICs. Full custom design flow can give the highest performance but the longest design time.

2.2.Semi-customIC Design

Almost all digital Ics are designed and manufactured by semicustom Ics methodolgies. Typically it refers to HDL language based IC design with automatic synthesis-place and route EDA tool.

Semi-custom IC design methodology has 3 major categories.

·         Standard cell based ASIC design
·         Gate arrays
·         FPGA

2.2.1 Standard cell based ASIC design

Initially RTL (Register Transfer Language) description of the design is written. Then it is synthesized based on technology specific Standard cell libraries. Standard Cells are custom designed and then inserted into a library. These cells are then used in the design by being placed in rows and wired together using ‘place and route’ CAD tools. Some standard cells, such as RAM and ROM cells, and some datapath cells (e.g. a multiplier) are tiled together to create macro cells.

2.2.2. Gate arrays

In a gate array, the transistors level masks are fully defined and the designer can not change them. The design instead programs the wiring and vias to implement the desired function. For example, Interconnections done in layer anything more than 2. In GA design we take a die which have all gates placed but not connected. Metal 2 layer (interconnect) is available, Only layout of interconnect is given to fabrication house.

Gate array designs are slower than cell-based designs but the implementation time is faster as less time must be spent. RTL-based methods and synthesis, together with other CAD tools, are often used for gate arrays. Efficiency decreases with GA.

2.2.2. FPGA

2.3. Standard Cell ASIC Vs GateArray Vs FPGA

2.4. FPGA vs. ASIC

Difference between ASICs and FPGAs mainly depends on costs, tool availability, performance and design flexibility. They have their own pros and cons but it is designers responsibility to find the advantages of the each and use either FPGA or ASIC for the product. However, recent developments in the FPGA domain are narrowing down the benefits of the ASICs.

2.4.1 FPGA Design Advantages

Faster time-to-market: No layout, masks or other manufacturing steps are needed for FPGA design. Readymade FPGA is available and burn your HDL code to FPGA ! Done !!

No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For FPGA this is not there. FPGA tools are cheap. (sometimes its free ! You need to buy FPGA.... thats all !). For ASIC you pay huge NRE and tools are expensive. I would say "very expensive"...Its in crores....!!

Simpler design cycle: This is due to software that handles much of the routing, placement, and timing. Manual intervention is less.The FPGA design flow eliminates the complex and time-consuming floorplanning, place and route, timing analysis.

More predictable project cycle: The FPGA design flow eliminates potential re-spins, wafer capacities, etc of the project since the design logic is already synthesized and verified in FPGA device.

Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely, instantly. FPGA can be reprogrammed in a snap while an ASIC can take $50,000 and more than 4-6 weeks to make the same changes. FPGA costs start from a couple of dollars to several hundreds or more depending on the hardware features.

Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be implemented on FPGA which could be verified for almost accurate results so that it can be implemented on an ASIC. Ifdesign has faults change the HDL code, generate bit stream, program to FPGA and test again.Modern FPGAs are reconfigurable both partially and dynamically.

FPGAs are good for prototyping and limited production.If you are going to make 100-200 boards it isn't worth to make an ASIC.

Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But today's FPGAs even run at 500 MHz with superior performance. With unprecedented logic density increases and a host of other features, such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price, FPGAs are suitable for almost any type of design.

Unlike ASICs, FPGA's have special hardwares such as Block-RAM, DCM modules, MACs, memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get better performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with phase-locked loops, low-voltage differential signal, clock data recovery, more internal routing, high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and ARM (hardcore) and Nios(softcore) in Altera. There are FPGAs available now with built in ADC ! Using all these features designers can build a system on a chip. Now, dou yo really need an ASIC ?

FPGA sythesis is much more easier than ASIC.

In FPGA you need not do floor-planning, tool can do it efficiently. In ASIC you have do it.

2.4.2. FPGA Design Disadvantages

Power consumption in FPGA is more. You don't have any control over the power optimization. This is where ASIC wins the race !

You have to use the resources available in the FPGA. Thus FPGA limits the design size.

Good for low quantity production. As quantity increases cost per product increases compared to the ASIC implementation.

2.4.3. ASIC Design Advantages

Cost....cost....cost....Lower unit costs: For very high volume designs costs comes out to be very less. Larger volumes of ASIC design proves to be cheaper than implementing design using FPGA.

Speed...speed...speed....ASICs are faster than FPGA: ASIC gives design flexibility. This gives enoromous opportunity for speed optimizations.

Low power....Low power....Low power: ASIC can be optimized for required low power. There are several low power techniques such as power gating, clock gating, multi vt cell libraries, pipelining etc are available to achieve the power target. This is where FPGA fails badly !!! Can you think of a cell phone which has to be charged for every call.....never.....low power ASICs helps battery live longer life !!

In ASIC you can implement analog circuit, mixed signal designs. This is generally not possible in FPGA.

In ASIC DFT (Design For Test) is inserted. In FPGA DFT is not carried out (rather for FPGA no need of DFT !) .

2.4.4. ASIC Design Disadvantages

Time-to-market: Some large ASICs can take a year or more to design. A good way to shorten development time is to make prototypes using FPGAs and then switch to an ASIC.

Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many more. In FPGA you don't have all these because ASIC designer takes care of all these. ( Don't forget FPGA isan IC and designed by ASIC design enginner !!)

Expensive Tools: ASIC design tools are very much expensive. You spend a huge amount of NRE.

3. Overviewof Complete ASIC Design Flow

4. MajorEDA Companies and their tools

5. RTLCoding for Logic Synthesis

5.1.Synthesizable and Non-Synthesizable Verilog constructs

	Synthesizable	Non-Synthesizable
Basic	Identifiers, escaped identifiers, Sized constants (b, o, d, h), Unsized constants (2'b11, 3'07, 32'd123, 8'hff), Signed constants (s) 3'bs101, module, endmodule, macromodule, ANSI-style module, task, and function port lists	system tasks, real constants
Data types	wire, wand, wor, tri, triand, trior, supply0, supply1, trireg (treated as wire), reg, integer, parameter, input, output, inout, memory(reg [7:0] x [3:0];), N-dimensional arrays,	real, time, event, tri0, tri1
Module instances	Connect port by name, order, Override parameter by order, Override parameter by name, Constants connected to ports, Unconnected ports, Expressions connected to ports,	Delay on built-in gates
Generate statements	if,case,for generate, concurrent begin end blocks, genvar,
Primitives	and, or, nand, nor, xor, xnor,not, notif0, notif1, buf, bufif0, bufif1, tran,	User defined primitives (UDPs), table, pullup, pulldown, pmos, nmos, cmos, rpmos, rnmos, rcmos, tranif0, tranif1, rtran, rtranif0, rtranif1,
Operators and expressions	+, - (binary and unary)
Bitwise operations	&, \|, ^, ~^, ^~
Reduction operations	&, \|, ^, ~&, ~\|, ~^, ^~, !, &&, \|\| , ==, !=, <, <=, >, >=, <<, >>, <<< >>>, {}, {n{}}, ?:, function call	===, !==
Event control	event or, @ (partial), event or using comma syntax, posedge, negedge (partial),	Event trigger (->), delay and wait (#)
Bit and part selects	Bit select, Bit select of array element, Constant part select, Variable part select ( +:, -:), Variable bit-select on left side of an assignment
Continuous assignments	net and wire declaration, assign	Using delay
Procedural blocks	always (exactly one @ required),	initial
Procedural statements	;, begin-end, if-else, repeat, case, casex, casez, default, for-while-forever-disable(partial),	fork, join
Procedural assignments	blocking (=), non-blocking (<=)	force, release
Functions and tasks	Functions, tasks
Compiler directives	`define, `undef, `resetall, `ifndef, `elsif, `line, `ifdef, `else, `endif, `include

5.2.How hardware is infered?

5.2.1 Register inference

Whenever there is a ‘posedge’ or ‘negedge’ construct synthesis tool infers a flip flop.

always @(posedge clk)

output_reg <= data;

Above code infers D-flip flop.

Asynchronous reset :

module async_rst(clk,rst,data,out);

input clk, rst, data;

output out;

reg out;

always @(posedge clk or negedge rst)

begin

if(!rst)

out<=1’b0;

else

out<=data;

end

endmodule

In above case the sensitivity list includes both clock and the rst and hence it infers a asynchronous reset flip flop. rst has negedge in sensitivity list and hence same should be checked in the code.

Synchronous Reset:

module sync_rst(clk,rst,data,out);

input clk, rst, data;

output out;

reg out;

always @(posedge clk)

begin

if(!rst)

out<=1’b0;

else

out<=data;

end

endmodule

In above case the sensitivity list doesn’t include ‘rst’ and hence it infers a synchronous reset flip flop.
5.2.2 Mux Inference

“if else” loop infers a mux.

eg.:

if(sel) z=a; else z=b;

General case statement infers a mux. If case statement is a overlapping structure then priority encoder in infered. Case statements only works with true values of 0 or 1.

5.2.3. Priority Encoder Inference

Multiple if statements with multiple branches result in the creation of priority encoder structure.

“if else if” infers priority encoder.

5.2.4. Combo Logics

If unknown ‘x’ or ‘z’ is assigned then it will be realized into tristate buffer. So avoid using ‘x’ and ‘z’. usage of these may mislead synthesis.

Eg.:

assign tri_out=en ? tri_in : 1b’z

5.2.5. if vs case

Multiflexer is faster circuit. Therefore is priority encoding structure is not required then use ‘case’ staements instead of ‘if-else’ statement.

Use late arriving signal early in an ‘if-else’ loop to keep these late arriving signals with critical timing closest to the output of a logic block.

5.2.6. Proper partitioning for synthesis

Properly partition the top level design based on functionality. Keep related combinational logic in same module. It is not recommended to add glue logic at top level of the module. Hierarchical designs are good but unnecessary hierarchies may limit the optimizations across the hierarchies. It is practically observed that deeper hierarchies cause miserably failing boundary optimizations due to increased number of either setup or hold fixing buffer insertion. In such cases ungrouping or flattening hierarchy command can be used to flatten the unwanted hierarchies before compiling the design to achieve better results.

5.2.7. FSM synthesis guidelines

If you are using state machine for coding then take care to separate it from other logic. This helps synthesis tools to synthesize and optimize FSM logic much better. Use “parameter” in Verilog to describe state names. An “always” block should have all the combinational logic for computing the next state.

5.2.8. Blocking vs non-blocking-race condition

Never mix a description of combinational (blocking) construct with sequential (nonblocking).

Blocking: combinational àracing

Since the final outputs depend on the order in which the assignments are evaluated, blocking assignments within sequential block may cause race condition.

Nonblocking: sequential àNo race condition

Nonblockng assignments closely resemble hardware as they are order independent.

Most of the applications which require data transfer within module required to be written using non-blocking assignment statement.

5.2.9. Technology independent RTL coding

Write HDL code in technology independent fasion. This helps reusage of the HDL code for any technology node. Do not hard code logic gates from the technology library unless it is necessary to meet critical timing issues.

5.2.10. Pads separate from core logic

Pads are instantiated like any other module instantiation. If design has large number of I/O pads it is recommended to keep the pad instantiations in a separate file. Note that pads are technology dependant and hence the above recommendation!

5.2.11. Clock logic guidelines

In case of multiple clocks in the design, make sure that clock generation and reset logics are written in one module for better handling in synthesis. If a clock is used in different modules of different heirarchy then keep clock names common across all the modules. This makes constraining that clock easier and also supports better handling of synthesis scripts.

Ø Don’t use mixed clock edges

mixing of edge sensitive and level sensitive lists are not allowed. Below code is a wrong one.

always @(posedge clk or posedge rst)

Ø Avoid clock buffers or any other logic

If any signal crosses multiple clock domains having different clock frequencies then those signals must be properly synchronised with synchronous logic. Synthesis tools can’t optimize any timing paths between asynchronous clock domains.
5.2.12. Reset logic guidelines

Synchronous Reset:

Advantages:

Ø Easy to synthesize, just another synchronous input to the design.

Disadvantages:

Ø Require a free running clock. At power-up clock is must for reset.

Asynchronous Reset:

Advantages:

Ø Doesn’t require a free running clock.

Ø Uses separate input on flip flop, so it doesn’t affect flop data timing.

Disadvantages:

Ø Harder to implement. Considered as high fanout net

Ø STA, simulation, DFT becomes difficult

5.2.13. Registered outputs

All outputs should be registered and combinational logic should be either at the input section or in between two registered stages of a module.

5.2.14. Incomplete sensitivity list

Sensitive list should contain all inputs. If inputs are missed in the sensitivity list, then the changes of that inputs will not be recognized by simulator. Synthesized logic in most cases may correct for the blocks containing incomplete sensitivity list. But this may cause simulation mismatches between source RTL and synthesized netlist. Generally synthesis tools issue a warning for the “always” block having incomplete sensitivity list. Registers can also be added in the sensitive list.

5.2.15. Avoid latch inference

Ø “if-else” statements must be end with ‘else’ statements. Else ‘unintentional latches’ will be realized (at output) due to the missing ‘else’ statement at the end.

Ø Same is true for ‘case’ statement. ‘default’ statement must be added.

Work Around:

Either include all possible combination of inputs or initialise the value before the loop starts.

Eg.:

if(z) a=b;

Above code will infer a latch. Because if z=1, value of ‘a’ is defined. But if z=0 value of ‘a’ is not specified. Hence it is assumed that previous value has to be retained and hence latch is infered.

Eg.:

module latch_inf_test(a, x, y, t, out);

input [2:0] a;

input x, y, t;

output out; reg out;

always @(a or x or y or t)

begin

case(a)

3’b001:out=x;

3’b010:out=y;

3’b100:out=t;

endcase

end

endmodule

Eg.:

module case_latch(dout,sel,a,b,c);

input [1:0] sel;

input a,b,c;

output dout;

reg dout;

always @(a or b or c or sel)

begin

case (sel)

2'b00 : dout = a;

2'b01 : dout = b;

2'b10 : dout = c;

endcase

end

endmodule

Preventing a Latch by Assigning a Default Value

module case_default(dout,sel,a,b,c);

input [1:0] sel;

input a,b,c;

output dout;

reg dout;

always @(a or b or c or sel)

begin

case (sel)

2'b00 : dout = a;

2'b01 : dout = b;

2'b10 : dout = c;

default : dout = 1'b0;

endcase

end

endmodule

5.2.16. Use Constants

Use constants instead of hard coded numeric values.

Below coding style is not recommended:

wire [15:0] input_bus;

reg [15:0] output bus;

Recommended coding style:

‘define INPUT_BUS_WIDTH 16

‘define OUTPUT_BUS_WIDTH 16

wire [INPUT_BUS_WIDTH-1:0] input_bus;

reg [OUTPUT_BUS_WIDTH-1:0] output_bus;

Keep constants and parameters definitions in separate file with naming convention such as design_name.constants.v and design_name.parameters.v

5.2.17. General Coding guidelines for ASIC synthesis

Ø “Inference” of the logic should be given higher priority compared to instantiation of the logic.

Ø File name and module name should be same.

Ø A file should have only one module.

Ø Use lowercase letters for ports, variables and signal names.

Ø Use uppercase for constants, user defined types.

6. ASIC Synthesisis

6.1.Synthesis definition, goals

Synthesis is the process of transforming your HDL design into a gate-level netlist, given all the specified constraints and optimization settings.

Logic synthesis is the process of translating and mapping RTL code written in HDL (such as Verilog or VHDL ) into technology specific gate level representation.

There are 3 steps in Synthesis:

Ø Translation: RTL code is translated to technolohgy independent representation. The converted logic is available in boolean equation form.

Ø Optimization: Boolean equation is optimized using SoP or PoS optimization methods.

Ø Technology mapping: Technology independent boolean logic equations are mapped to technology dependant library logic gates based on design constraints, library of available technology gates. This produces optimized gate level representation which is generally represented in Verilog.

Then the gate level circuit generated is logically optimized to meet the targets or goals set as per the user constraints. The clock frequency target is the number one goal that has to be met by the synthesis operation.

6.2.Inputs and output from ASIC synthesis flow

Outcome of Synthesis is Gate level netlist which is again in Standard Verilog format. Netlists can be simulated as well which we call as Gate Level Simulation.

6.2.1. Register Transfer Level (RTL) Representation

RTL is the functional specification of the design to logic synthsis which is represented by HDLs.

Ø Register: Storage element like F-F, latches

Ø Transfer: Transfer data between input, output and register using combinational logic.

Ø Level: Level of Abstraction modeled using HDL.

6.2.2. Constraints

The major objective of the logic synthesis is to meet the optimization constraints specified by the designer. Timing, area and power targets are the optimization constraints.

Ø Timing Constraints: The synthesis tool tries to meet the setup and hold timing constraints on the sequential logic in the design.

Ø Area constraints: Area constraints specifies maximum area for a design.

Ø Power Constraints: Power constraints specifies the maximum power consumption for the design.

Theo asic-soc

System Verilog - Semicon IC Design

Trang

✔ Đào Tạo Kỹ Sư Thiết Kế Vi Mạch cùng Semicon

1/23/2014

Introduction to ASIC Design Flow

1/04/2014

ASIC - SoC Design

History of VLSI Design