1. Introduction to ASIC - SoC Design
An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use. For example, a chip designed solely to run a cell phone is an ASIC.
As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million. Modern ASICs often include entire 32-bit processors, memory blocks including ROM, RAM, EEPROM, Flash and other large building blocks. Such an ASIC is often termed a SoC (system-on-a-chip). Designers of digital ASICs use a hardware description language (HDL), such as Verilog or VHDL, to describe the functionality of ASICs.
SoC is a collection of components and/or subsystems (designed as Ips, ASICs) interconnected to perform the specified functions. Entire system is built on a single piece of silicon. SoC includes a processor, memory, DSP Cores, IO devices, interfaces to external circuitry, and custom IPs as Verilog or VHDL modules.
SoC consists of Very large transistor counts on a single IC. Mixed technologies, mixed design on the same chip are commonly used. They include – Digital, analog, FPGA, full custom, semi-custom, etc. A SoC usually contains one or more processors and co-processors, internal memory and memory controllers for external memory, buses/interconnect architectures such as AMBA, peripherals (timer, interrupt controller, etc.), I/O channels amd many more.
Thus SoC is essentially a mixture of ASIC, including full-custom and semi-custom (standard cells), reusable Intellectual Property (IP) blocks (also called macro, hard macro, cores) as shown in below diagram.
IP core based design approach mainly intended for reducing design complexity and time to market. There are different IP cores supplied by different vendors in different technologies of different specifications. Customizable soft cores provides essential set of preverified parameters to cinfigure acording to the customer requirement Interface logic generally support standard buses to ease integration. Bus-based architectures such as IBM core-connect, Motorola IP-bus, ARM's advanced microcontroller bus architecture (AMBA), etc that facilitate core-based SoC design.
1.1. IP Core Types
1.1.1. Soft Cores (RTL format)
Technology independent synthesizable RTL description written using any HDL. These can be modifiable and can be extended functionally.
1.1.2. Firm Cores (netlist format)
These IP cores are Gate-level Technology dependent netlists. Internal implementation of core cannot be modified. User can partially modify netlist by parameterizing I/O and to remove unwanted functionality.
1.1.3. Hard Cores (GDSII format)
Tehnology dependant Layout & timing information provided in the format of GDSII. Integration is simple & can result in highly predictable performance.
1.2. SoC Design Issues
· Architectural complexity has manufolded due to significant advancement in technolgy.
· Deed submicron and nanometer technological fabrication processes are very much sensitive to variations in process, voltage and temperature.
· System verification and integration challenges are increasing as analog/RF, digital hardware, non-electrical parts such as sensors, actuators etc are becoming part of SoC.
· Normal ASIC design methodology is itelf too costly for SoC. Cost to develop is very high.
2. IC Design Methodologies
There are basically 2 types of methodology:
· Full Custom Design
· Semi-Custom Design
2.1 Full Custom IC Design
Full custom design flow is depicted in below figure.
Based on the schematic transistor level circuit is drawn. W/l ratio is decided; layout does mask set is required for manufacturing the transistors. A full custom IC includes logic cells that are customized and all mask layers that are customized. Full custom IC design and manufacturing take large lead time. Typically full custom design flow includes CMOS sizing and manual layout. Even today full custom design flow is used for building analog blocks and analog Ics. There are no efficient HDLs available yet to model analog blocks owing to inherent characteristics of analog circuitry.
Thus in full custom flow every transistor is designed and drawn by hand. Till date this is most successful (only) way to design analog portions of ASICs. Full custom design flow can give the highest performance but the longest design time.
2.2.Semi-customIC Design
Almost all digital Ics are designed and manufactured by semicustom Ics methodolgies. Typically it refers to HDL language based IC design with automatic synthesis-place and route EDA tool.
Semi-custom IC design methodology has 3 major categories.
· Standard cell based ASIC design
· Gate arrays
· FPGA
2.2.1 Standard cell based ASIC design
Initially RTL (Register Transfer Language) description of the design is written. Then it is synthesized based on technology specific Standard cell libraries. Standard Cells are custom designed and then inserted into a library. These cells are then used in the design by being placed in rows and wired together using ‘place and route’ CAD tools. Some standard cells, such as RAM and ROM cells, and some datapath cells (e.g. a multiplier) are tiled together to create macro cells.
2.2.2. Gate arrays
In a gate array, the transistors level masks are fully defined and the designer can not change them. The design instead programs the wiring and vias to implement the desired function. For example, Interconnections done in layer anything more than 2. In GA design we take a die which have all gates placed but not connected. Metal 2 layer (interconnect) is available, Only layout of interconnect is given to fabrication house.
Gate array designs are slower than cell-based designs but the implementation time is faster as less time must be spent. RTL-based methods and synthesis, together with other CAD tools, are often used for gate arrays. Efficiency decreases with GA.
2.2.2. FPGA
2.3. Standard Cell ASIC Vs GateArray Vs FPGA
2.4. FPGA vs. ASIC
Difference between ASICs and FPGAs mainly depends on costs, tool availability, performance and design flexibility. They have their own pros and cons but it is designers responsibility to find the advantages of the each and use either FPGA or ASIC for the product. However, recent developments in the FPGA domain are narrowing down the benefits of the ASICs.
2.4.1 FPGA Design Advantages
- Faster time-to-market: No layout, masks or other manufacturing steps are needed for FPGA design. Readymade FPGA is available and burn your HDL code to FPGA ! Done !!
- No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For FPGA this is not there. FPGA tools are cheap. (sometimes its free ! You need to buy FPGA.... thats all !). For ASIC you pay huge NRE and tools are expensive. I would say "very expensive"...Its in crores....!!
- Simpler design cycle: This is due to software that handles much of the routing, placement, and timing. Manual intervention is less.The FPGA design flow eliminates the complex and time-consuming floorplanning, place and route, timing analysis.
- More predictable project cycle: The FPGA design flow eliminates potential re-spins, wafer capacities, etc of the project since the design logic is already synthesized and verified in FPGA device.
- Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely, instantly. FPGA can be reprogrammed in a snap while an ASIC can take $50,000 and more than 4-6 weeks to make the same changes. FPGA costs start from a couple of dollars to several hundreds or more depending on the hardware features.
- Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be implemented on FPGA which could be verified for almost accurate results so that it can be implemented on an ASIC. Ifdesign has faults change the HDL code, generate bit stream, program to FPGA and test again.Modern FPGAs are reconfigurable both partially and dynamically.
- FPGAs are good for prototyping and limited production.If you are going to make 100-200 boards it isn't worth to make an ASIC.
- Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But today's FPGAs even run at 500 MHz with superior performance. With unprecedented logic density increases and a host of other features, such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price, FPGAs are suitable for almost any type of design.
- Unlike ASICs, FPGA's have special hardwares such as Block-RAM, DCM modules, MACs, memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get better performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with phase-locked loops, low-voltage differential signal, clock data recovery, more internal routing, high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and ARM (hardcore) and Nios(softcore) in Altera. There are FPGAs available now with built in ADC ! Using all these features designers can build a system on a chip. Now, dou yo really need an ASIC ?
- FPGA sythesis is much more easier than ASIC.
- In FPGA you need not do floor-planning, tool can do it efficiently. In ASIC you have do it.
2.4.2. FPGA Design Disadvantages
- Power consumption in FPGA is more. You don't have any control over the power optimization. This is where ASIC wins the race !
- You have to use the resources available in the FPGA. Thus FPGA limits the design size.
- Good for low quantity production. As quantity increases cost per product increases compared to the ASIC implementation.
2.4.3. ASIC Design Advantages
- Cost....cost....cost....Lower unit costs: For very high volume designs costs comes out to be very less. Larger volumes of ASIC design proves to be cheaper than implementing design using FPGA.
- Speed...speed...speed....ASICs are faster than FPGA: ASIC gives design flexibility. This gives enoromous opportunity for speed optimizations.
- Low power....Low power....Low power: ASIC can be optimized for required low power. There are several low power techniques such as power gating, clock gating, multi vt cell libraries, pipelining etc are available to achieve the power target. This is where FPGA fails badly !!! Can you think of a cell phone which has to be charged for every call.....never.....low power ASICs helps battery live longer life !!
- In ASIC you can implement analog circuit, mixed signal designs. This is generally not possible in FPGA.
- In ASIC DFT (Design For Test) is inserted. In FPGA DFT is not carried out (rather for FPGA no need of DFT !) .
2.4.4. ASIC Design Disadvantages
- Time-to-market: Some large ASICs can take a year or more to design. A good way to shorten development time is to make prototypes using FPGAs and then switch to an ASIC.
- Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many more. In FPGA you don't have all these because ASIC designer takes care of all these. ( Don't forget FPGA isan IC and designed by ASIC design enginner !!)
- Expensive Tools: ASIC design tools are very much expensive. You spend a huge amount of NRE.
3. Overviewof Complete ASIC Design Flow
4. MajorEDA Companies and their tools
5. RTLCoding for Logic Synthesis
5.1.Synthesizable and Non-Synthesizable Verilog constructs
|
Synthesizable
|
Non-Synthesizable
|
Basic
|
Identifiers, escaped identifiers, Sized constants (b, o, d, h),
Unsized constants (2'b11, 3'07, 32'd123, 8'hff), Signed constants (s)
3'bs101, module, endmodule, macromodule, ANSI-style module, task, and
function port lists
|
system tasks, real constants
|
Data types
|
wire, wand, wor, tri, triand, trior, supply0, supply1, trireg
(treated as wire), reg, integer, parameter, input, output, inout, memory(reg
[7:0] x [3:0];), N-dimensional arrays,
|
real, time, event, tri0, tri1
|
Module instances
|
Connect port by name, order, Override parameter by order,
Override parameter by name, Constants connected to ports, Unconnected ports,
Expressions connected to ports,
|
Delay on built-in gates
|
Generate statements
|
if,case,for generate, concurrent begin end blocks, genvar,
|
|
Primitives
|
and, or, nand, nor, xor, xnor,not, notif0, notif1, buf, bufif0,
bufif1, tran,
|
User defined primitives
(UDPs), table, pullup, pulldown, pmos, nmos, cmos, rpmos, rnmos,
rcmos, tranif0, tranif1, rtran, rtranif0,
rtranif1,
|
Operators and
expressions
|
+, - (binary and unary)
|
|
Bitwise operations
|
&, |, ^, ~^, ^~
|
|
Reduction operations
|
&, |, ^, ~&, ~|, ~^, ^~, !, &&, || , ==, !=,
<, <=, >, >=, <<, >>, <<< >>>, {},
{n{}}, ?:, function call
|
===, !==
|
Event control
|
event or, @ (partial), event or using comma syntax, posedge,
negedge (partial),
|
Event trigger (->), delay and wait (#)
|
Bit and part selects
|
Bit select, Bit select of array element, Constant part select,
Variable part select ( +:, -:), Variable bit-select on left side of an
assignment
|
|
Continuous assignments
|
net and wire declaration, assign
|
Using delay
|
Procedural blocks
|
always (exactly one @ required),
|
initial
|
Procedural statements
|
;, begin-end, if-else, repeat, case, casex, casez, default,
for-while-forever-disable(partial),
|
fork, join
|
Procedural assignments
|
blocking (=), non-blocking (<=)
|
force, release
|
Functions and tasks
|
Functions, tasks
|
|
Compiler directives
|
`define, `undef, `resetall, `ifndef, `elsif, `line, `ifdef,
`else, `endif, `include
|
|
5.2.How hardware is infered?
5.2.1 Register inference
Whenever
there is a ‘posedge’ or ‘negedge’ construct synthesis tool infers a flip flop.
always @(posedge clk)
output_reg <= data;
Above
code infers D-flip flop.
Asynchronous
reset :
module
async_rst(clk,rst,data,out);
input clk, rst, data;
output out;
reg out;
always @(posedge clk or negedge rst)
begin
if(!rst)
out<=1’b0;
else
out<=data;
end
endmodule
In
above case the sensitivity list includes both clock and the rst and hence it
infers a asynchronous reset flip flop. rst has negedge in sensitivity list and
hence same should be checked in the code.
Synchronous
Reset:
module
sync_rst(clk,rst,data,out);
input clk, rst, data;
output out;
reg out;
always @(posedge clk)
begin
if(!rst)
out<=1’b0;
else
out<=data;
end
endmodule
In above case the sensitivity list doesn’t
include ‘rst’ and hence it infers a synchronous reset flip flop.
5.2.2 Mux Inference
“if
else” loop infers a mux.
eg.:
if(sel) z=a; else z=b;
General case statement infers a mux. If
case statement is a overlapping structure then priority encoder in infered. Case statements only works with true values of
0 or 1.
5.2.3. Priority Encoder Inference
Multiple if statements with multiple
branches result in the creation of priority encoder structure.
“if else if” infers priority encoder.
5.2.4. Combo Logics
If unknown ‘x’ or ‘z’ is assigned then
it will be realized into tristate buffer. So avoid using ‘x’ and ‘z’. usage of
these may mislead synthesis.
Eg.:
assign
tri_out=en ? tri_in : 1b’z
5.2.5. if vs case
Multiflexer
is faster circuit. Therefore is priority encoding structure is not required
then use ‘case’ staements instead of ‘if-else’ statement.
Use late
arriving signal early in an ‘if-else’ loop to keep these late arriving signals
with critical timing closest to the output of a logic block.
5.2.6. Proper partitioning for synthesis
Properly partition the top level design
based on functionality. Keep related combinational logic in same module. It is
not recommended to add glue logic at top level of the module. Hierarchical
designs are good but unnecessary hierarchies may limit the optimizations across
the hierarchies. It is practically observed that deeper hierarchies cause
miserably failing boundary optimizations due to increased number of either
setup or hold fixing buffer insertion. In such cases ungrouping or flattening hierarchy
command can be used to flatten the unwanted hierarchies before compiling the
design to achieve better results.
5.2.7. FSM synthesis guidelines
If you are using state machine for
coding then take care to separate it from other logic. This helps synthesis
tools to synthesize and optimize FSM logic much better. Use “parameter” in
Verilog to describe state names. An “always” block should have all the
combinational logic for computing the next state.
5.2.8. Blocking vs non-blocking-race condition
-
Never mix a description of
combinational (blocking) construct with sequential (nonblocking).
- Blocking: combinational àracing
Since the final outputs depend on the order in which the assignments
are evaluated, blocking assignments within sequential block may cause race
condition.
- Nonblocking: sequential àNo race condition
- Nonblockng assignments closely resemble hardware as they are order
independent.
- Most of the applications which
require data transfer within module required to be written using non-blocking
assignment statement.
5.2.9. Technology independent RTL coding
Write
HDL code in technology independent fasion. This helps reusage of the HDL code
for any technology node. Do not hard code logic gates from the technology
library unless it is necessary to meet critical timing issues.
5.2.10. Pads separate from core logic
Pads
are instantiated like any other module instantiation. If design has large
number of I/O pads it is recommended to keep the pad instantiations in a
separate file. Note that pads are technology dependant and hence the above
recommendation!
5.2.11. Clock logic guidelines
In
case of multiple clocks in the design, make sure that clock generation and
reset logics are written in one module for better handling in synthesis. If a
clock is used in different modules of different heirarchy then keep clock names
common across all the modules. This makes constraining that clock easier and
also supports better handling of synthesis scripts.
Ø Don’t
use mixed clock edges
mixing
of edge sensitive and level sensitive lists are not allowed. Below code is a
wrong one.
always @(posedge
clk or posedge
rst)
Ø Avoid clock buffers or any other logic
If
any signal crosses multiple clock domains having different clock frequencies
then those signals must be properly synchronised with synchronous logic.
Synthesis tools can’t optimize any timing paths between asynchronous clock
domains.
5.2.12. Reset logic guidelines
Synchronous Reset:
Advantages:
Ø Easy to synthesize,
just another synchronous input to the design.
Disadvantages:
Ø Require a free running
clock. At power-up clock is must for reset.
Asynchronous Reset:
Advantages:
Ø Doesn’t require a free
running clock.
Ø Uses separate input on
flip flop, so it doesn’t affect flop data timing.
Disadvantages:
Ø Harder to implement.
Considered as high fanout net
Ø STA, simulation, DFT
becomes difficult
5.2.13. Registered outputs
All outputs should be registered and
combinational logic should be either at the input section or in between two
registered stages of a module.
5.2.14. Incomplete sensitivity list
Sensitive list should contain all
inputs. If inputs are missed in the sensitivity list, then the changes of that
inputs will not be recognized by simulator. Synthesized logic in most cases may
correct for the blocks containing incomplete sensitivity list. But this may
cause simulation mismatches between source RTL and synthesized netlist.
Generally synthesis tools issue a warning for the “always” block having
incomplete sensitivity list. Registers can also be added in the sensitive list.
5.2.15. Avoid latch inference
Ø “if-else”
statements must be end with ‘else’ statements. Else ‘unintentional latches’
will be realized (at output) due to the missing ‘else’ statement at the end.
Ø Same
is true for ‘case’ statement. ‘default’ statement must be added.
Work Around:
Either
include all possible combination of inputs or initialise the value before the
loop starts.
Eg.:
if(z) a=b;
Above code will infer a latch. Because
if z=1, value of ‘a’ is defined. But if z=0 value of ‘a’ is not specified.
Hence it is assumed that previous value
has to be retained and hence latch is infered.
Eg.:
module latch_inf_test(a, x, y, t, out);
input [2:0] a;
input x, y, t;
output out; reg out;
always @(a or x or y or t)
begin
case(a)
3’b001:out=x;
3’b010:out=y;
3’b100:out=t;
endcase
end
endmodule
Eg.:
module
case_latch(dout,sel,a,b,c);
input [1:0] sel;
input a,b,c;
output dout;
reg dout;
always @(a or b or c or sel)
begin
case
(sel)
2'b00
: dout = a;
2'b01
: dout = b;
2'b10
: dout = c;
endcase
end
endmodule
Preventing a Latch by
Assigning a Default Value
module
case_default(dout,sel,a,b,c);
input [1:0] sel;
input a,b,c;
output dout;
reg dout;
always @(a or b or c or sel)
begin
case
(sel)
2'b00
: dout = a;
2'b01
: dout = b;
2'b10
: dout = c;
default
: dout = 1'b0;
endcase
end
endmodule
5.2.16. Use Constants
Use
constants instead of hard coded numeric values.
Below
coding style is not recommended:
wire [15:0] input_bus;
reg [15:0] output bus;
Recommended
coding style:
‘define INPUT_BUS_WIDTH 16
‘define OUTPUT_BUS_WIDTH 16
wire [INPUT_BUS_WIDTH-1:0] input_bus;
reg [OUTPUT_BUS_WIDTH-1:0] output_bus;
Keep constants and parameters
definitions in separate file with naming convention such as
design_name.constants.v and design_name.parameters.v
5.2.17. General Coding guidelines for ASIC synthesis
Ø “Inference”
of the logic should be given higher priority compared to instantiation of the
logic.
Ø File
name and module name should be same.
Ø A
file should have only one module.
Ø Use
lowercase letters for ports, variables and signal names.
Ø Use
uppercase for constants, user defined types.
6. ASIC Synthesisis
6.1.Synthesis definition, goals
Synthesis
is the process of transforming your HDL design into a gate-level netlist, given
all the specified constraints and optimization settings.
Logic synthesis is the process of
translating and mapping RTL code written in HDL (such as Verilog or VHDL ) into
technology specific gate level
representation.
There are 3 steps in Synthesis:
Ø Translation: RTL code is
translated to technolohgy independent representation. The converted logic is
available in boolean equation form.
Ø Optimization:
Boolean equation is optimized using SoP or PoS optimization methods.
Ø Technology mapping: Technology independent boolean logic
equations are mapped to technology dependant library logic gates based on
design constraints, library of available technology gates. This produces optimized gate level
representation which is generally represented in Verilog.
Then the gate level circuit generated is
logically optimized to meet the targets or goals set as per the user
constraints. The clock frequency target is the number one goal that has to be
met by the synthesis operation.
6.2.Inputs and output from ASIC synthesis flow
Outcome of Synthesis is Gate level
netlist which is again in Standard Verilog format. Netlists can be simulated as
well which we call as Gate Level Simulation.
6.2.1. Register Transfer Level (RTL) Representation
RTL is the functional specification of
the design to logic synthsis which is represented by HDLs.
Ø Register: Storage element
like F-F, latches
Ø Transfer: Transfer data between
input, output and register using combinational logic.
Ø Level: Level of
Abstraction modeled using HDL.
6.2.2. Constraints
The major objective of the logic
synthesis is to meet the optimization constraints specified by the designer.
Timing, area and power targets are the optimization constraints.
Ø Timing Constraints: The
synthesis tool tries to meet the setup and hold timing constraints on the
sequential logic in the design.
Ø Area constraints: Area
constraints specifies maximum area for a design.
Ø Power Constraints: Power
constraints specifies the maximum power consumption for the design.
Theo asic-soc