This is only a preview of the August 2022 issue of Silicon Chip. You can view 41 of the 104 pages in the full issue, including the advertisments. For full access, purchase the issue for $10.00 or subscribe for access to the latest issues. Articles in this series:
Items relevant to "Wide-Range Ohmmeter, Part 1":
Articles in this series:
Items relevant to "isoundBar with Built-in Woofer":
Items relevant to "SPY-DER: a 3D-printed Robot":
Items relevant to "Secure Remote Mains Switch, Part 2":
Articles in this series:
Purchase a printed copy of this issue for $11.50. |
IC Fabrication
AMD EPYC 7702 ES photographed by Fritzchens Fritz: www.flickr.com/photos/130561288<at>N04/49139472562/
from inception to cutting-edge technology
Over the last two issues, we’ve described the history of integrated
circuits (ICs), the manufacturing process, process nodes, wafer sizes
and EUV lithography. Along with EUV, another technology that is just
maturing and has fundamentally changed the way high-end ICs are
made is multi-chip modules. We shall now investigate that and other
cutting-edge chip technologies.
Part 3 – finFETs, GAAFETs, chip stacking & multi-chip modules – By Dr David Maddison
14
Silicon Chip
Australia's electronics magazine
siliconchip.com.au
IC
technology is approaching the
physical limits of feature size –
ie, it is becoming almost impossible to make transistors smaller or
increase density. In an attempt to overcome this, finFETs were developed and
now gate-all-around (GAA) FETs are
coming into use. After covering those,
we will look at 3D ICs and chiplets.
As density is not improving as
quickly as it used to, multi-chip modules (MCMs) containing ‘chiplets’ are
becoming much more widespread.
Designs are no longer limited to what
can fit onto a single, reasonably-sized
silicon die.
FinFETs and beyond
A fin field-effect transistor (finFET)
is a 3D Mosfet in which the gate is
enhanced vertically to make a ‘fin’,
forming three surfaces where the gate
interacts with the channel, rather than
just one (see Fig.57).
This is helpful because the planar
device can be made no smaller due
to the scalability restrictions of a 2D
plane. Also, the fins have a larger surface area. FinFETs are smaller than and
have superior performance to planar
CMOS devices. They were first commercialised in the mid-2010s and are
the dominant devices in the 14nm,
10nm and 7nm process nodes.
At the 5nm process node, undesired
variations in channel width in finFETs
can cause variability in behaviour and
the loss of carrier mobility. 3nm is
considered the limit of their usability.
Therefore, the industry is now moving
to a “gate-all-around” (GAA) technology in which the gate interacts with
the channel on all four sides.
GAAFETs and Moore’s Law
Since about 2010, the rate of
increase of transistors in a chip
Fig.58: a cross-sectional image of an actual 2nm gate-all-around (GAA) device
using nanosheets produced by IBM. This technology results in 333 million
transistors per square millimetre. The cell height is 75nm, width is 40nm,
individual nanosheets are 5nm high and separated by 5nm. The gate pitch is
44nm and gate length is 12nm. Source: IBM
decreased below the original prediction by Moore. Instead of doubling
every two years, it is now about two
and a half years.
The physical limits of transistor
size with current technology are being
approached due to increasing sourceto-drain leakage, limitations due to
the metals used in gates and limited
options for channel materials. The
channel is where charge carriers such
as electrons or holes flow between the
source and the drain.
In silicon, the smallest possible
gate size for a Mosfet was thought to
be about 7nm, although finFETs and
GAAs have somewhat lowered this
limit, perhaps to as low as 2nm for
the IBM GAA (Fig.58). Any smaller
and electrons can move between adjacent transistors by a process known
as quantum tunnelling. If that happened, a transistor could unexpectedly change its state.
By comparison, the diameter of a
silicon atom is around 0.2nm, so we
are discussing a structure of only about
10 to 35 atoms across. Presently, there
is no point in making transistors any
smaller in silicon than this.
The processor in an iPhone XS uses
7nm technology, and it was stated that
the active channel of a transistor gate
in it would be 7nm long, 7nm deep
and 20nm wide. Based on there being
5×1022 atoms/cm3 in a silicon crystal,
there would be about 49,000 atoms in
such a structure.
The apparent discrepancy between
the atomic radius of silicon and the
volumetric density is due to the way
the atoms are arranged in a crystal.
According to Wikipedia (https://w.
wiki/583Z), for the 5nm process node,
there are typically over 130 million
transistors per square millimetre.
Options for the future include “spintronics”, which exploits the spin state
of electrons, “tunnelling junctions”,
which use the quantum mechanical
process of quantum tunnelling, and
the use of nano-scale wires in the
channels.
One advantage of spintronic devices,
according to Professor Ian Appelbaum
(then at the University of Delaware),
STI stands for shallow
trench isolation
Fig.57: a comparison of planar, FinFET and gate-all-around FET devices. The gate operates at the interface shown in green. In
the gate-all-around (GAA) structure, the channel may be constructed from either ‘nanowires’ or ‘nanosheets’ (shown here).
siliconchip.com.au
Australia's electronics magazine
August 2022 15
Fig.60: a possible transistor of the
future by Lawrence Berkeley National
Laboratory with a 1nm gate size. It is
fabricated from a carbon nanotube,
zirconium oxide and molybdenum
disulfide.
Managing chip defects
is that “silicon can now be used to
perform many spin manipulations
both within the space of thousands of
devices and within the time of thousands of logic operations, paving the
way for silicon-based spintronics circuits”. See Fig.59 and the video at
https://vimeo.com/32338065
Another approach is to use different
materials. Carbon nanotubes, molybdenum disulfide and zirconium oxide
were used to make a transistor with a
1nm gate size in 2016. Ali Javey did
that at the US Department of Energy’s Lawrence Berkeley National
Laboratory – see Fig.60. By comparison, human hair is 50,000 nanometres thick.
More recently, Tian-Ling Ren at
Tsinghua University in Beijing made
a transistor with a gate length of 0.34
nanometres. The materials used were a
titanium-palladium alloy for the metal
contacts, molybdenum disulfide and
hafnium oxide.
Fig.59: this early spintronics chip
developed in 2007 contains 16
spintronics devices. It was built
by Professor Ian Appelbaum and
doctoral student Biqin Huang at the
University of Delaware and Douwe
Monsma of Cambridge NanoTech.
Fig.61: schemes for 3D packaging as envisaged by AMD. TSV Pitch refers to
the distance between the through-silicon vias used for vertical connectivity. IP
refers to ‘intellectual property’ cores which are designs with a specific function
produced by a third-party vendor. Uncore refers to parts of the CPU that are not
part of the cores, such as cache memory and the memory controller. Source:
Advanced Micro Devices (AMD)
16
Silicon Chip
Three-dimensional ICs
ICs can be 3D either by having many
layers in a monolithic IC or by 3D
packaging. In the latter, multiple dies
are connected on top of one another
using through-silicon vias (TSVs) or
with solder bumps – see Figs.61 & 62.
For example, V-Cache is a technology from AMD that allows a cache
memory die to be stacked directly on
top of the CPU core die. This triples
the CPU cache memory without altering the size of the die or shrinking the
feature size.
This technology is related to chiplets, which we will discuss shortly.
Australia's electronics magazine
Not all silicon chips are made equal.
When IC dies are tested, several things
can happen. The worst scenario is that
the die is unusable and must be discarded. Alternatively, a chip may not
work reliably at the maximum design
speed, but could work perfectly well at
a lower clock rate. Such chips are usually marked and sold at lower prices,
with a lower default clock.
Some people try to increase the
clock speed to see if they can find
a higher speed that it will reliably
operate at (“overclocking”), as manufacturer speed ratings are very conservative and chosen for maximum
reliability.
Another thing that can be done in
the case of memory chips is if parts
of the memory are defective, they are
siliconchip.com.au
Figs.63(a) & (b): an example of how chip defects and differences in performance between different sections of a die are
managed. Each of the twelve CPU cores spread across two dies has its own characteristics, such as maximum stable
operating frequency and power consumption. Clocks are controlled and tasks are allocated based on a profile made for
each chip section after manufacture.
permanently locked out, and the chip
is sold as having less memory. Similarly, in CPUs or GPUs with multiple
computing cores, faulty cores can be
permanently locked out, and they are
sold as lower performance devices
with fewer cores.
In other words, most chips come
off the same production line. They
are then “binned” and sold according to the speed, power consumption
and other characteristics determined
during testing (usually before packaging, as there’s no point in packaging a
defective chip).
Fig.63 shows some statistics we
gathered from a computer CPU built
with 16 cores but sold as a 12-core
device. Presumably, those four cores
were disabled because they either
didn’t work or weren’t up to spec.
The second-from-right column in
each image shows the maximum readings seen during testing. Core 0 has
run at a maximum of 5.15GHz, Core 3
at 5.10GHz, while Cores 4 and 9 only
ran up to 4.475GHz. After manufacturing and testing, these limits are programmed into the chip based on the
maximum speed that each core can
reliably operate at.
Also, note how Core 3 consumed up
to 7.5W while Core 8 has never drawn
more than 1.71W, even though it ran
up to 4.525GHz (88.7% as fast as Core
3). Mobile chips are binned for power
efficiency, whereas desktop chips like
this one are mainly chosen based on
their peak performance. Still, better
efficiency does let the CPU run cooler
under load.
Core-to-core peak temperature
Fig.62: how through-silicon vias (TSVs) in DRAM dies (top right) and solder
bumps create a 3D package for a graphics processing unit. The whole assembly
is mounted directly on a PCB. Source: Wikimedia user ScotXW (CC BY-SA 4.0)
siliconchip.com.au
Australia's electronics magazine
variation is also high, with Core 3
recording a peak of 59.1°C, while the
coolest core was Core 8, which only
ever reached 44.8°C (it's also the one
that uses the least power).
All of these variations are despite
the fact that the masks for each core
are identical, and they were made in
the same manufacturing process at the
same time.
Multi-chip modules (MCMs)
So far, we have mainly described
monolithic ICs that comprise only one
chip or die in a package.
“Multi-chip module” is a generic
term. Wikipedia defines it as electronic assemblies that come in various forms and involve multiple components, such as IC dies (chips) and
discrete components, all held together
Fig.64: AMD’s EPYC SoC (system on
a chip). Depending upon the model,
there can be up to eight CCDs (core
chiplet dies) plus one I/O chiplet.
Each CCD comprises one or two CCXs
(core complexes), depending on the
generation. A CCX is a quad-core or
octa-core CPU with a shared L3 cache.
This can give a total of up to 64 cores.
Source: AMD
August 2022 17
Fig.65: a hybrid integrated circuit in the form of an operational amplifier,
containing both discrete IC/transistor dies and thick-film resistors. According
to the Wikipedia definition, it is a form of MCM, but we would refer to it as a
hybrid IC. Source: Wikimedia user Mister rf (CC BY-SA 4.0)
on a substrate and contained within a
package.
Substrates may be of various forms,
such as printed circuit boards, ceramic
substrates or IC base plates with other
devices mounted on top. The entire
package assembly can be treated and
used as a component in the same manner as an IC.
Other terms for these MCM packages include “heterogeneous integration” and “hybrid integrated circuits”
(Fig.65). They are used to save space
and avoid designing customised ICs
because the desired functions can be
produced using separate off-the-shelf
components at a lower cost.
But there is no strict definition of
what an MCM is. We think it would
be clearer to reserve the term MCM for
assemblies containing monolithic ICs
and no other components and refer to
the other devices as hybrid circuits. So
that is the terminology we will use in
this article.
Earlier examples of MCMs include
IBM bubble memory (1970s), the IBM
3081 thermal conduction module
(1980s), superconducting multi-chip
modules (1990s) and the Intel Pentium
Pro (1995) – see Fig.66.
a standard “library” of such devices,
and can thus be combined in a modular
fashion to produce the desired functionality. Even chiplets from different
manufacturers can be used.
The use of chiplets in MCM devices
is a way to dramatically reduce the cost
of the design of large ICs. With a large
enough library or catalog of chiplets, it
would be possible to combine them to
rapidly develop many custom applications, resulting in major cost savings.
One estimate is that using chiplets
leads to a 70% reduction in design
and development costs and time to
produce a given device.
There are several advantages to
using chiplets.
One is that smaller dies with fewer
components generally have better
yields (a higher percentage of functional devices after fabrication) than
single larger dies with more components. It may thus be more economical
to use two or more individual dies tied
together than one larger one with the
same overall functionality and number of components.
As chips get larger and larger, the
yield drops naturally, as there is more
likelihood of defects in larger devices.
Sometimes it gets to the point that it
becomes uneconomical to produce
them. Chiplets are the most obvious
way to overcome that.
Also, dies can be “mixed and
matched” with different technology
nodes, production processes, materials (eg, some chiplets of silicon and
some of another semiconductor such
as gallium arsenide) and manufacturers.
More advanced MCMs
This use of chiplets to make MCMs
is a developing idea in the IC industry.
An important aspect of using chiplets
is how they are connected together in
the package, either horizontally or vertically (ie, when chiplets are stacked
on top of each other).
Individual chiplets are controlled
and unified by input-output and communication controllers that coordinate
the entire device as a single unified IC.
See the section below on chiplet interconnect standards.
Another advantage of MCMs is that
chiplets in the same device can be fabricated with various process nodes.
An example would be using a mixture of 7nm and 10nm process nodes
depending on performance and component density requirements, plus factors such as cost.
Fig.66: the
Pentium Pro
processor could
be regarded as
the first example
of a consumerlevel ceramic
multi-chip
module (MCM).
It contains both
a CPU die and a
separate cache
memory die.
Chiplets
A chiplet (called a “tile” by Intel)
is an IC with defined functionality
that is designed to be combined with
and connected to other chiplets in a
single MCM. Chiplets can come from
18
Silicon Chip
Australia's electronics magazine
siliconchip.com.au
For example, a chip for integrated
connectivity such as USB, Wi-Fi, Ethernet or PCIe does not need the latest
technology, but a GPU core will. The
CPU tested to produce Fig.63 uses this
approach, with two 7nm chiplets each
with eight compute cores (two disabled in each, for a total of 12) plus an
8nm I/O chiplet that interfaces those
cores to the outside world.
A manufacturer can easily customise an MCM for different applications,
such as having more graphics processing chiplets for more graphics capability and fewer memory chiplets for one
application, or the opposite for another
application – see Fig.70.
Examples of MCMs that use chiplets
on the market include AMD’s Ryzen,
Ryzen Threadripper and EPYC CPUs
(see Fig.64) and soon, Intel’s Ponte
Vecchio (described in detail below).
One clever aspect is that AMD produces consumer (Ryzen), workstation (Threadripper) and server (EPYC)
CPUs using essentially identical core
dies (CCDs). Ryzen chips have one or
two CCDs totalling 6-16 cores, Threadripper chips have up to four CCDs for
up to 32 cores (later versions up to 64),
while EPYC chips have up to eight
CCDs for up to 8/64 cores.
Reusing the same chiplets saves a lot
of R&D time and money and makes the
end product more affordable.
Layout of MCM integrated
circuits with chiplets
There are several possible physical
configurations in which chiplets can
be incorporated into a module. Some
are shown in Fig.67, in increasing levels of advancement.
(A) Shows four chiplets laid out
side-by-side on an organic substrate
Fig.68: details of an interposer showing internal connections in yellow on
the lower diagram. TSV stands for through-silicon via which are vertical
interconnects fabricated into the silicon. The micro bumps and C4 (controlled
collapse chip connection) bumps are connection pads.
such as a high-density PCB.
(B) Shows chiplets laid out side-byside on a passive silicon interposer
(see the description of interposer
below). 2.5D refers to side-by-side
chiplets with high interconnect densities to neighbouring chiplets.
(C) Shows chiplets mounted on
an electrically active interposer. The
active interposer may contain parts of
the system, such as a platform controller hub (PCH).
(D) Shows chiplets connected via
an active silicon bridge embedded in
the package substrate. The bridge acts
much like an interposer, but because
it is embedded in the package substrate, the chip can be much smaller
as it is level with the rest of the substrate material.
(E) Shows chiplets mounted directly
on an active silicon base using a bumpless bonding system developed by
TSMC. This is distinct from (C), in
which the attachment is via wafer
bumps.
An interposer (Fig.68) acts as an
interconnection between chiplets and
connects them to the external input/
output lines. An interposer can have a
higher wiring density than an organic
substrate.
Bumps are a type of connection
used on integrated circuits to eliminate wire bonding. In “wafer bumping” technology, solder spheres are
attached to the chip’s input/output
pads instead of wires.
Advantages include better electrical performance, lower inductance,
Fig.67: several manners in which chiplets can be laid out in a package, with a cross-sectional view at the bottom and
plan view at the top. Original source: Jawad Nasrullah, Palo Alto Electron Inc (http://ieee-edps.com/archives/2021/
c/1100nasrullah.pdf).
siliconchip.com.au
Australia's electronics magazine
August 2022 19
HBM2
HBM2
Compute Dies
Rambo Caches 10 ESF
Compute Dies
Passive Die Stiffeners
Passive Die Stiffeners
Passive Die Stiffeners
Passive Die Stiffeners
Passive Die Stiffeners
Xe Link
IO Tile
HBM2
Foveros 3d Packaging
HBM2
Fig.69: Intel’s Ponte
Vecchio GPU package
with multiple individual
chiplets/tiles. HBM is highbandwidth memory; ESF is
enhanced SuperFin; EMIB
is Intel’s embedded multidie interconnect bridge;
Tile is Intel’s name for a
chiplet. Source: Intel
Compute Dies
Rambo Caches 10 ESF
Compute Dies
Passive Die Stiffeners
Graphics
Compute
I/O
AI
In-Package
Memory
Media
Xe Link
IO Tile
HBM2
HBM2
HBM2
DRAM
HBM2
EMIB under
passive die &
HBM2
greater current capacity, lower cost
and a smaller footprint.
Intel Ponte Vecchio
The Intel Ponte Vecchio (see Fig.69)
is an example of an advanced MCM
device in the form of a GPU (graphics
processing unit). It will be initially
used in the USA’s Argonne National
Laboratory's new ‘exascale’ supercomputer, Aurora and for artificial intelligence, machine learning and graphics
applications.
"Exascale" refers to a computing system capable of executing at least 1018
floating-point operations per second
(>1 exaFLOP).
Ponte Vecchio uses 63 ‘tiles’ (Intel’s
name for chiplets) in total; 47 active
tiles for computing functions and 16
for thermal management, with a total
of 100 billion transistors in a 77.5 ×
62.5mm package.
The device is partly fabricated using
“Intel 7”, which is their name for an
enhanced 10nm SuperFin fabrication
process. Some tiles use Intel 7, while
others are fabricated by TSMC using
their 7nm (N7) and 5nm (N5) nodes,
plus some others. For more information on Intel’s SuperFin technology, see the video at https://youtu.be/
Y04yHqLKs4w
Note that Intel’s 10nm chips are
comparable to 7nm devices from
TSMC or Samsung because, as we
pointed out earlier, those figures no
longer correspond directly to physical feature size.
As mentioned earlier, mixing chiplets/tiles from different process nodes
and manufacturers is one of the advantages of MCMs.
The eight GPU tiles used in the
device are manufactured by TSMC
using their 5nm process, and each
Persistent
Memory
Fig.70: Intel envisions
a package made of
standardised tile (chiplet)
components with the
combination adjusted to
suit the needs of different
users. Source: Intel
of those tiles contains 128 Intel Xe
GPU cores or “compute units” for a
total of 1024 vector units, 1024 matrix
engines and 128 ray tracing units per
device. Each device also has 64MB
of L1 cache memory and 408MB of
L2 cache.
The GPU tiles, memory and other
tiles (eg, for I/O) are all mounted
on the “base tile”. The base tile is a
646mm2 die with 17 layers. It includes
a “RAMBO” memory controller, voltage regulators, a PCIe 5.0 interface
and a CXL (Compute Express Link)
interface.
RAMBO (random access memory,
bandwidth optimised) uses Foveros
interconnection technology. RAMBO
uses novel SRAM (static random
access memory) and has four banks of
3.75MB memory groups for a total of
15MB per tile with eight tiles.
There is also up to 128GB of HBM2e
Chip development costs
According to Handel Jones, CEO of International Business Strategies Inc (Los Gatos, CA, USA), it costs US$40 million to design a 28nm
chip, US$217 million to design a 7nm chip, US$416 million for a 5nm device and a future 3nm design is expected to cost US$590 million.
Chiplets in multi-chip modules (MCMs) are one way to reduce costs. The use of chiplets is expected to reduce the cost of new device
elements because they can be produced as standard functional elements. Then, making a device means assembling standard chiplets
together, perhaps with some custom fabrication work too.
Physically, chiplets are much like any other chip, but they are designed to interface with other chiplets. Essentially, they are modular
elements or building blocks, selected from a library or catalogue of such devices.
Apart from chiplets, existing packaging solutions can integrate existing dies into existing packaging types. This includes 2.5D layouts
(multiple dies inside the same package arranged in a planar or stacked configuration) or fan-out (dies placed on “redistribution layers”
similar to circuit boards inside the package).
20
Silicon Chip
Australia's electronics magazine
siliconchip.com.au
memory (according to Hardware
Times) or 64GB (according to Tom's
Hardware). Possibly there will be different versions of the chip with different memory sizes – not all specifications of the device have yet been confirmed. The memory is contained in
eight HBM2e (high bandwidth memory 2e) ‘stacks’, each eight dies high.
Ponte Vecchio's heat dissipation
is 600W with water cooling or 450W
with air cooling.
The entire surface area of all 47
active tiles in the Ponte Vecchio is
2330mm2, or 3100mm2 including the
thermal tiles. When fully packaged,
the area is 4844mm2. The package has
a staggering 4468 pins.
Intel has devised two technologies
to allow the tiles to communicate with
each other. The first is their embedded multi-die interconnect bridge,
and the second is Foveros die stacking packaging.
EMIB is a method to connect adjacent dies via a small embedded bridge
rather than the conventional, more
complicated method of connecting dies via a silicon interposer and
through-silicon vias (TSVs). For more
on this, see the video titled “Intel EMIB
Technology Explained” at https://
youtu.be/mRQFJFmYMak
Foveros 3D die stacking packaging is an interconnection technology
for vertical chip-to-chip bonding via
Fig.71: Intel’s Ponte Vecchio
GPU mounted on a PCB with the heat
spreader removed. A large number of these modules
would be used to construct an exascale supercomputer. Source: Intel
microbumps. There is a video about
this titled “Intel Foveros Technology Explained” at https://youtu.be/
eMmCYqN6KSs
The Ponte Vecchio package is
housed in a module, as shown in
Fig.71.
Chiplet interconnect standards
For chiplets to come into common
use, with the mixing of chiplets from
different manufacturers and fabrication processes, they will need to use
common connection standards.
In March 2022, Advanced Semiconductor Engineering, Inc (ASE),
AMD, Arm, Google Cloud, Intel Corporation, Meta (formerly Facebook),
Microsoft Corporation, Qualcomm
Incorporated, Samsung and TSMC
announced a standard for chiplet
interconnects called Universal Chiplet Interconnect Express or UCIe –
www.uciexpress.org
The objective is to have a single set
of standards (initially, UCIe 1.0), similar to that for PCIe expansion cards
(see Fig.72).
Predating UCIe, the Open Domain-
Specific Architecture (ODSA) from the
Open Compute Project Foundation
was released in 2019 (see siliconchip.
au/link/abef).
The objective was to “define an open
interface and architecture that enables
the mixing and matching of available
silicon die from different suppliers
onto a single SoC for data centre applications. The goal is to define a process
to integrate best-of-breed chiplets onto
a SoC”.
SoC stands for ‘system on a chip’. It
is unclear how or if this project relates
to UCIe, as no specific public information is available.
Conclusion
Fig.72: example packaging options from the UCIe 1.0 standard for chiplets.
MCM technology is very important at the moment. For example, it is
a key reason that AMD’s laptop and
desktop chips have been competitive
with Intel’s products over the last few
years. Intel is now using it too, as are
Apple (with the M1 Ultra) and Nvidia
(with the Hopper AI engine).
MCM technology is now entrenched
in the CPU market. It also appears that
AMD’s new line of high-end graphics
processors (RDNA3) will be based on
MCMs, and Nvidia may follow suit.
It probably won’t be long before all
but the most basic computer chips are
using MCM technology.
SC
Australia's electronics magazine
siliconchip.com.au
22
Silicon Chip
|