Intel Corporation is the largest multinational manufacturing company
of semiconductor devices (microprocessors, memory devices,
telecommunication support circuits and computer applications) based in
Santa Clara, Calif. It was founded in 1968 by Gordon Moore and Robert
Noyce.here is the story of its CPUs
Here is the story of its CPUs.
The first microprocessor sold by Intel was the
4-bit 4004 in 1971. It was designed to work with three other
microchips - ROM 4001, RAM 4002, and Shift Register 4003. The 4004
performed calculations, while the others were critical to running the
The 4004 was mainly used in computers and similar devices, and was
not intended to end up inside computers. Its maximum frequency was 740
KHz. At 4004 followed a similar processor, the
4040, an improved variant with a set of extended performance
and further instructions.
The 4004 allowed Intel to name itself in the microprocessor
industry. To capitalize on the situation, Intel
introduced a new line of 8-bit processors. The 8008 came in 1972, followed by 8080 in 1974 and
8085 in 1975. Although the 8008 was the first 8-bit processor produced by
Intel, it was not as important as its predecessor and its
It was faster than the 4004 because of its ability to process 8-bit
data but had a rather conservative frequency of between 200 and 800 kHz,
and consequently its performance did not completely convince. The 8008 was
produced with transistors 10 micrometres. The 8080 was more successful: it
was a 8008 to 6 micrometer and incorporated new instructions.
This allowed the company more than double the frequencies: the 8080
best chips came in 1974 at a frequency of 2 MHz. The 8080 was integrated
into a lot of devices, and this brought a lot of software developers, such
as the then young Microsoft Focus on Intel processors.
The 8086 was the first x86 processor. This
16-bit chip could manage 1 MB of memory using a 20-bit
external bus. The clock frequency was 4.77 MHz, basically low,
considering that at the end of his career this processor reached 10
Thanks to the 16-bit managed two 8-bit
instructions simultaneously. The 8086 used the first ISA x86
review, still used by AMD and Intel.
In the 8086 they followed several other processors based on the
same 16-bit architecture. The first was the 80186. Intel migrated
various hardware components usually located on the motherboard inside
the CPU, as the clock generator, the interrupt
controller and timer. By integrating these components
in the 80186 processor he proved several times
faster 8086. Intel also increased the frequency to get more
performance. The 80188 was a less expensive variant with a halved
Introduced in the same year dell'80186, the 80286 was three times
faster than the 8086 at the same frequency. It could handle 16 MB of
memory through a 24 bit bus address. It was the first x86 with a unit of
memory management unit (MMU), which allowed to administer the virtual
memory. As the 8086, he had no floating-point unit (FPU): relied on a
x87 (80287) co-processor. The maximum frequency reached 12.5 MHz.
iAPX 432 was the first attempt by Intel to separate dall'x86. From
this solution, the company expected performance times more than x86
solutions. The processor, however, had no luck due to large architectural
problems. The x86 processors were relatively complex, but the iAPX 432
brought CISC to a new level of complexity.
Intel had to create two separate die. The processor was also rather
hungry for data and did not offer good performance without an extremely
high bandwidth. The iAPX 432 managed to exceed the 8080 and 8086 as well
as performance, but little was enough for the x86 to come back.
Intel created its first RISC processor in 1984. It was designed as a
competitor to the x86 processors, but it was a microcontroller for
embedded applications. Internally, it was a 32-bit superscalar
architecture that used the concepts of the Berkeley RISC project.
The first i960 processors had a relatively low frequency, with the
slower model running at 10 MHz, but over the years it improved and passed
to more advanced processes that allowed to touch the 100 MHz. It supported
4 GB of protected memory. The i960 was used in military systems as well as
in business systems.
The Intel 80386 was the first x86 processor with
32-bit architecture. Debuted in 1985. One of the main advantages of this
solution was the 32-bit address bus which allowed him to support up to 4
GB of memory. Though it was much more memory than that used at that time,
it should not be forgotten that limited RAM often brake the performance of
previous x86 processors.
Unlike modern CPUs, more RAM at that time almost always resulted in
an increase in performance. Intel also implemented several architectural
enhancements that allowed the 80286 to exceed the performance, even with
the same amount of RAM. The chip also supported virtual mode processing,
which improved multitasking support.
They came on the market several versions, such as the famous
80386SL, who had a 16-bit data bus. However, it supported 4GB of RAM, but
the least bandwidth impacted performance.
In 1989, Intel tried again to abandon the x86. He created a new
known RISC CPU named i860. Unlike the i960, this project was born to be
a high performance solution that compete in the desktop industry. Since
Intel is still making x86 chips today, you already know how it's
The biggest problem is that the processor based his performance
entirely on the compiler, Member of the placement of the order in which
you would have to execute the first authoring. In this way Intel wanted
to hold the die size and complexity, but it was almost impossible to
properly list each instruction from start to finish the compilation of
the program. Consequently, the CPU entered the stall constantly trying
to work around the problem.
The 80486 was, for many, the entrance door to the computer world.
The key to its success was the great integration of the components. The
80486 was the first x86 CPU to offer L1 cache. The first 80486 models
they had 8 KB, and were made with the process to 1000 nanometers. It
then passed 600 nanometers and the L1 cache doubled to 16 KB.
Intel also integrated an FPU to the CPU, which up to that point
was a unit of a separate calculation. By moving this component into the
processor, latency was reduced considerably. The 80486 also used a
faster FSB interface to increase bandwidth, and the core was improved to
increase IPC. These changes allowed the chip to unlock high performance:
high end models were several times faster than the old 80386.
The first 80486 operated at 50 MHz, but the subsequent 600
nanometer models arrived at 100 MHz. Intel also presented a low cost
80486 call called 80486SX, which had an FPU disabled.
The first Pentium came in 1993. He did not follow the previous
nomenclature. Internally, the Pentium used an architecture known as P5,
the first project of Intel x86 superscalar. Although the Pentium was
generally faster 80486 in every respect, the most important feature was
the FPU improved significantly.
The FPU of the Pentium was more than 10 times faster than 80486. It
was a big step forward, as this feature gained prominence in later years
with the arrival of the Pentium MMX. This processor had an architecture
identical to the first Pentium, but it supported the new SIMD MMX
instruction set that dramatically increased performance.
Compared to 80486 Intel increased the size of the L1 cache. The
first Pentium contained 16 KB, while the Pentium MMX passed to 32 KB. Of
course these processors operated at higher frequencies. The first
transistor Pentium had 800 nanometers and could reach 60 MHz, but later
revisions passed to 250 nanometers and reached up to 300 MHz.
At the Pentium she followed the Pentium Pro with P6 architecture.
The new processor was considerably faster than the Pentium in 32-bit
operations due to Out-of-Order design (OoO).
The internal architecture was deeply magazine to decode instructions
into micro operations, which were then executed on execution unit general
purpose. He also used an extended 14-stage pipeline with additional
hardware for decoding.
As the first Pentium Pro processors were pointing to the server
market, Intel extended the 36-bit bus address and added PAE technology
that could support up to 64GB of RAM. It was much more than the average
user needed, but supporting more RAM was important for the server
The cache was also reviewed. The L1 cache was limited to two 8 KB
segmented caches, instructions, and data. To cope with the 16kbps deficit
compared to the Pentium MMX, Intel placed between 256 KB and 1 MB of L2
cache on a separate chip, linked to the CPU package. It was connected to
the CPU via a back side bus (BSB).
Initially, Intel wanted to push Pentium Pro into the consumer
industry, but then decided to be limited to the server world. It had
several revolutionary features, but in performance terms it was harder
than Pentium and Pentium MMX. The old Pentium were significantly faster
with 16-bit instructions and 16-bit software was still the most popular.
The processor also did not support MMX, which led the Pentium MMX to
overcome Pentium Pro in software optimized for that instruction
Pentium Pro could have had a chance in the consumer market, but in
addition to the issues mentioned it was quite expensive due to the
separate L2 cache chip. The fastest Pentium Pro had a 200 MHz clock and
was made with processes between 500 and 350 nanometers.
In 1997 it was the turn of the Pentium II processor that was born
to remedy just about all the negative aspects of the Pentium Pro. The
basic architecture was similar to the Pentium Pro, and continued to use
a 14-stage pipeline with several improvements Increase IPC. The L1 cache
was formed of 16 KB data and 16 KB instructions.
Intel went to cheaper cache chips, linked to a larger package to
reduce production costs. It was an effective way to break the price, but
these memory modules were not able to operate at the maximum CPU speed.
L2 cache operated at a halved frequency, but was enough to increase
performance. Intel also added support for the MMX instruction
The Pentium II - code-named "Klamath" and "Deschutes" depending on
the core - also found space in the server industry such as Xeon and
Pentium II Overdrive. The best models integrated 512 KB of L2 cache and
worked at 450 MHz.
Intel was already developing the Netburst architecture, but it was
not ready yet. So introduced the Pentium III. The first processors,
code-named Katman, were pretty similar to Pentium II because they used a
low-quality L2 cache that operated halfway through CPU frequency. The
architecture offered other major changes, given that different parts of
the 14-stage pipeline were fused together - reducing it to 10
stages.Thanks to the updated pipeline and the increased frequency, the
first Pentium III exceeded the Pentium II counterparts a small
Katmai was produced with 250 nanometer transistors. By switching
to 180-nanometer Intel was able to greatly increase the performance of
the Pentium. The new version, code-named Coppermine, moved the L2 cache
inside the processor and reduced its capacity by half (256 KB).As it
worked at the processor's frequency, the performance increased.
Coppermine competed with AMD's Athlon in gigahertz racing. Intel
attempted to produce a 1.13 GHz model but was forced to call it because
Tom's Hardware founder Dr. Thomas Pabst found it unstable. The last was
called core Pentium III Tualatin. Going to 130 nanometers the frequency
went up to 1.4 GHz. It also generated the L2 cache (512 KB).
P5 and P6, Celeron and Xeon
The years of Pentium saw the arrival of other product lines also
calls Celeron and Xeon. These solutions used the same core as Pentium II
or Pentium III, but had variable cache amounts. The first Celeron were
based on Pentium II and did not have L2 cache, which undermined their
performance. Pentium III-based models had half L2 cache disableed compared
to Pentium III counterparts.
This led to Celeron processors with Coppermine core and only 128 KB
of L2 cache; The successive models based on Tualatin core had 256
These half cache derivatives were also known as Coppermine-128 and
Tualatin-256. They had frequencies similar to Pentium III to compete with
AMD's Duron. Microsoft used a Celeron Coppermine-128 at 733 MHz on the
The first Xeons offered more L2 cache. Pentium II-based Xeon chips
contained up to 2 MB.
Before talking about the architecture of the Pentium 4 Netburst and
it is important to dwell on the concept of a pipeline, the process in
which the instructions are moved in the core. Pipeline stages often carry
out more operations, but sometimes they are dedicated to individual
By adding new hardware or splitting a stage into multiple segments,
you can extend the pipeline. The pipeline can also be shortened by
removing hardware or combining multiple stages into a single.
The length or depth of the pipeline has a direct impact on latency,
IPC, frequency and throughput architecture. Usually longer pipelines
require greater bandwidth, but if the pipeline is always powered, every
stage remains busy. Processors that have longer pipelines are usually able
to work even at higher frequencies.
The compromise is a much higher internal processor latency, with the
data passing through them that must stop at each stage for a given number
of clock cycles. Processors with a long pipeline tend to have a lower IPC,
and this is why they rely on higher frequencies to increase performance.
Over the years there have been successful processors based on both
philosophies. None of the two approaches, basic, is wrong.
Netburst, P4 Willamette and Northwood
In 2000 the Netburst architecture was finally ready and arrived on
the market in the form of Pentium 4. The first solution was called
Willametteand lasted for about two years. The chip struggled to overcome
the Pentium III. Netburst reached much higher frequencies - Willamette
touched the 2 GHz - but the 1.4 GHz Pentium III was faster in some
operations. AMD's Athlon processors enjoyed a good lead over this
The problem was that Intel Willamette brought the pipeline to 20
stages believing that they can exceed 2 GHz. Consumption and
temperatures prevented her from achieving that goal. eThe situation
improved with the 130 nanometers, known as Northwood, which climbed up
to 3.2 GHz and doubled the L2 cache from 256 to 512 KB. Netburst's fuel
consumption and temperature persisted. Northwood managed to behave
considerably better and compete better with AMD solutions.
On high-end models, Intel introduced the Hyper-Threading to
improve the use of resources in multitasking environments. Compared with
today, however, the benefits were significantly lower. Willamette and
Northwood were also used to create Celeron and Xeon CPUs. As with the
previous Celeron and Xeon generations, Intel increased or reduced the
size of the L2 cache.
After Northwood fell to Prescott, a new solution with many
improvements. Produced at 90 nanometers, it allowed Intel to bring the
L2 cache to 1 MB. Intel also introduced the new LGA 775 interface with
DDR2 support and an improved quad-pumped FSB.
These changes guaranteed a much higher bandwidth than Northwood,
vital to boosting Netburst's performance. Prescott was also the first
64-bit Intel x86 processor.
Unfortunately it was a failure. Intel extended the pipeline,
bringing it to 31 stages. The company hoped to raise enough frequency to
cope with the longer pipeline, but was able to reach only 3.8 GHz.
Prescott became too hot and consumed excessively. Intel hoped that the
90-nanometer shift would alleviate the problem, but the greater density
made cooling more difficult.
Summing up the improvements and the cache, Prescott was - at best
- equal with Northwood. At the same time AMD's K8 processors went to a
lower production process and reached higher frequencies, dominating the
Netburst architecture was not designed for the mobile industry. For
that in 2003 Intel created the first architecture for notebooks, namely
thePentium M processors. The chips were based on P6 architecture, but they
had a longer pipeline (12/14 stages). It was also the first time that
Intel adopted a variable length pipeline.
This meant that the instructions could be run after 12 stages, but
only if the information required for the instruction was already in the
cache. If it was not, it passed for two additional stages.
The Pentium Ms were based on 130 nanometer transistors and contained
1 MB of L2 cache. It reached up to 1.8 GHz, consuming just 24.5 watts.A
subsequent review - Dothan - went to 90 nanometers.
This allowed Intel to increase the L2 cache, bringing it to 2 MB and
improving the IPC throughput thanks to some core retouching. The frequency
increased to 2.27 GHz, with a slight increase in consumption (27
In 2005 everyone was focused on establishing the first dual-core
processor for the consumer market. AMD announced the dual-core Athlon 64,
but Intel went on sale in advance by using a multi-core module (MCM)
containing two die Prescott. The company named its first dual-corePentium
D, codenamed Smithfield.
The Pentium D was not very well received: Prescott faced the
same problems. The heat and power consumption of the two drives limited
the frequency to 3.2 GHz. And since architecture had a reduced bandwidth,
Smithfield IPC suffered because the throughput was split between the two
core. The first dual-core CPU of AMD, made with just one die, was
Smithfield followed Presler, which was 65 nanometers. It contained
two Cedar Mill on an MCM. The smallest process allowed the processor to
reduce temperature and consumption. The frequency went up to 3.8 GHz. The
Pentium D Presler had a 125 watt TDP, which then fell to 95 watts thanks
to a new stepping. It also doubled the L2 cache with every die of 2 MB.
Some high-end models offered the Hyper-Threading, allowing the CPU to
handle four threads simultaneously. All Pentium D processors supported
64-bit software and could use more than 4GB of RAM.
Intel decided to abandon the Netburst architecture and create a new
one based on what was done with the P6 and Pentium M projects. Thus was
born the Core project. Like the Pentium M, it had a 12/14 stage pipeline
much shorter than the Prescott's 31 stadia.
Core is demonstrated highly scalable, so as to cover the mobile
systems with TDP up to 5 watts and high-end server 130 watt. Intel led the
market Core 2 Duo and Core 2 Quad, Core but was also at the center of Core
Solo, Celeron, Pentium and Xeon. The die integrated two core and quad-core
designs used two dual-core MCMs. Single core versions had a core disabled.
The L2 cache size ranged from 512 KB to 2 MB.
Thanks to the new Intel project, he was able to compete again with
AMD. The architecture Intel Core put on the right path and it is thanks to
this breakthrough if they were born Atom processors.
The Bonnell architecture of the first Atom was not designed from
scratch, but sank its roots in P5 architecture. The first die,
Silverthorne, had a 3 watt TDP that allowed Intel to occupy sectors that
had previously been foreclosed. The performance was not high, but the
experience was enough and, above all, cheap. Silverthorne succeeded
Diamondville, with a lower frequency but 64 bit support. In the following
years, Pineview, Cedarview, Silvermont and Airmont took turns.
Nehalem, the first Core i7
With the market of Intel processors in full ferment went back to
work on the architecture Nehalem Core creating a solution with numerous
improvements. The cache controller was redesigned and the L2 cache went
down to 256 KB per core. A decision that did not lower the performance, as
Intel added between 4 and 12 MB of shared L3 cache across all cores.
Nehalem-based CPUs were comprised between one and four core and were made
at 45 nanometers.
Intel redefined the connections between the CPU and the rest of the
system. The old FSB was in use since the 1980's, so Intel replaced it with
QuickPath Interconnect (QPI) on high-end systems and DMI elsewhere. This
allowed Intel to move the memory controller (updated to support the DDR3)
and the PCIe into the processor. These changes sharply increased
bandwidth, while latency collapsed.
Once again, Intel extended the processor's pipeline, this time at
20-24 stages. The frequency did not rise, however, and Nehalem worked at
frequencies similar to the Core. Nehalem also was the first Intel
processor to implement Turbo Boost. Although the clock base of the best
Nahalem chip was 3.33 GHz, the processor could accelerate to 3.6 GHz for
short periods of time thanks to the new technology.
Nehalem also marked the return of Hyper-Threading. Thanks to this
and many other changes, the project achieved performance up to two times
higherthan the Core 2 heavy loads that exploited many threads.
After Nehalem was the turn of the die shrink to 32nm Westmere.
Basic architecture changed little, but Intel took advantage of the
smaller dimension of the die to add more components within the
CPU. Instead of just four x86 core, Westmere could hold up to 10
(Westmere-EX) and have up to 30 MB shared L3 cache.
Arrandale and Clarkdale, mobile and desktop mainstream solutions,
adopted an integrated GPU on the CPU package and produced at 45
nanometers. The Core Graphics HD Graphics was similar to the GMA 4500,
except for two additional EUs.
From the glorious Sandy Bridge to our day
With Sandy Bridge Intel made a clear step forward, so that a
micro-op cache still more than a few passionate accurately retains a Core
series 2000. The pipeline was reduced to 14-19 stages and was implemented
that can hold up to 1500 micro- Op decoded to enable instructions to
bypass five stages when the micro-op request was already cached. If it had
not, education would have to go through 19 stages.
The processor also provided other improvements, such as DDR3 to
support high performance. The GPU was also integrated into the CPU die.
The various subsystems were internally connected by a ring bus that
allowed data exchange with a very high bandwidth.
Intel also upgraded the integrated graphics chip. Instead of a
single HD Graphics on all CPU models, he created three versions. The HD
Graphics 3000, the best version, had 12 EU up to 1.35 GHz. It also
contained the engine for Quick Sync transcoding. The HD Graphics 2000 was
identical, but it had only six EUs. Low-end HD Graphics also had six EUs
but was devoid of some value-added features.
Sandy Bridge followed Ivy Bridge, a "Tick +" at the then
"Tick-Tock" evolutionary cadence. Ivy Bridge's IPC was slightly better
than Sandy Bridge, but there were other major enhancements that improved
The architecture adopted three-dimensional transistors to 22
nanometers that reduced consumption significantly. The Core i7 Sandy
Bridge mainstream had a 95 watt TDP, while similar Ivy Bridge went down
to 77 watts. This improvement was of particular importance in the mobile
world: Intel managed to create a quad-core Ivy Bridge mobile with 35
watt TDP. Previously all Intel quad-core mobile CPUs had a TDP of at
least 45 watts.
Intel used the reduced die size to expand the integrated GPU. The
HD Graphics 4000 integrated 16 EUs, but a new graphics architecture
improved the performance of every EU. Thanks to these changes, the HD
Graphics 400 provided performance 200% higher than its
A year later came Ivy Bridge Haswell. It was, again, an evolution
rather than a revolution. The AMD processors facing Sandy and Ivy Bridge
were not fast enough to bump into the high end, so Intel did not have much
interest in boosting performance. Haswell was about 10% faster than Ivy
Even with Haswell the company is focused on energy efficiency and
integrated GPU. Haswell processor blended in a voltage regulator which
allowed the CPU to better manage consumption. The voltage regulator led
the CPU to produce more heat, but the Haswell platform overall was more
To fight with AMD APUs, Intel has put up to 40 EUs in the Haswell
iGPU. The company also tried to increase the bandwidth at the disposal of
the fastest graphics chip by providing 128MB of eDRAM (L4 cache) that
dramatically improved its performance.
The next architecture to Haswell took the name of Broadwell.
Designed for mobile systems, it debuted at the end of 2014 and used
14-nanometer transistors. The first Broadwell processor was called Core
M, a dual-core with Hyper-Threading with a TDP of 3/6 watts.
Other Broadwell mobile processors arrived later, but on the
Broadwell desktop front we only count two CPUs. The Core i7 had
the fastest integrated graphics ever seen on an Intel
desktop processor. It contained six sublists with 8 EU each, for a total
of 48 EU. The GPU also had access to 128 MB of eDRAM (L4 cache), which
was able to solve the bandwidth problem with which the integrated chip
usually collided. In testing with the games, the GPU surpassed AMD's
best APU and proved to be good performance in modern titles.
In 2015, shortly after Broadwell, came Skylake. The CPU was faster,
but platform changes were more important. Skylake was the first consumer
CPU compatible with DDR4, more efficient and capable of ensuring a greater
bandwidth than DDR3. The Skylake platform offered several enhancements,
such as a new DMI interface, an updated PCIe controller, and a broader
Of course Skyake also had a better integrated GPU. The most
prominent model, Iris Pro Graphics 580, was used in R solutions and had 72
EU and 128MB of eDRAM (L4 cache). Much of the CPUs have a 24 EU GPU based
on a graphic architecture similar to Broadwell.
With Skylake and Kaby Lake, Intel ended the development Tick-Tock
cadence in favor of the new tick-tock-tock, which is referred to as
Intel will spend more time on a single production process before
developing a new one. As a result, Intel will also bring about major
changes in architecture.
Kaby Lake, therefore, can be seen as an optimized variant of
Skylake. Intel has used a new process called 14nm + with various
modifications to improve energy efficiency and performance.
The architecture has changed very little, but it supports DDR4 at
2400 MHz. Kaby Lake integrates an improved GPU, HD Graphics 630, with
support for the latest encoding and decoding codec, as well as 4K video