Anda di halaman 1dari 12

Department of Computer Science &


We feel a great honour in presenting this paper in techno-expert conducted at College of

engineering Bandera. We especially thank the IEEE for organizing such a national level
symposia. This paper presentation competition has helped us in gaining knowledge and has
provided us with a deep insight in the computer science field. This paper has made us to
take interest in all recent developments in the computer science field.
Title-Advance research in

Topic:Core 2duo
Processor Technology

Name of authors
Sampada V. Bawane Dipali R.Chawre
Phone: 9423100360 Phone: 9766575230


Postal ID:Jijaoo Girls Hostel, Postal ID: Jijaoo Girls Hostel

Government college of Engineering, Government college of

Engineering , Amravati.
• Index

1. Abstract…………………………………………………
2. Introduction………………………………………………
3. Core Details………………………………………………
4. Development……………………………………………
5. Advantages………………………………………………
6. Disadvantages……………………………………………
7. Multi-Chip Module………………………………………
8. Features…………………………………………………
9. The 64-bit Advantage……………………………………

• Abstract:-
The Core 2 brand refers to a range of Intel's consumer 64-bit dual-core and 2x2 MCM quad-
core CPUs with the x86-64 instruction set, based on the Intel Core microarchitecture, derived
from the 32-bit dual-core. The 2x2 MCM dual-die quad-core CPU had two separate dual-core
dies (CPUs)—next to each other—in one quad-core MCM package. The Core 2 relegated the
Pentium brand to a mid-end market, and reunified laptop and desktop CPU lines

The Core microarchitecture returned to lower clock speeds and improved processors' usage of
both available clock cycles and power compared with preceding NetBurst of the Pentium 4/D-
branded CPUs. Core microarchitecture provides more efficient decoding stages, execution
units, caches, and buses, reducing the power consumption of Core 2-branded CPUs, while
increasing their processing capacity. Intel's CPUs have varied very wildly in power
consumption according to clock speed, architecture and semiconductor process, shown in the
CPU power dissipation tables.

The Core 2 brand was introduced on July 27, 2006 comprising the Solo (single-core), Duo
(dual-core), Quad (quad-core), and Extreme (dual- or quad-core CPUs for enthusiasts)
branches, during 2007 Intel Core 2 processors with vPro technology (designed for businesses)
include the dual-core and quad-core branches.
• Introduction:-

Diagram of a generic dual core processor, with CPU-local Level 1 caches,

and a shared, on-die Level 2 cache.

A multi-core CPU (or chip-level multiprocessor, CMP) combines two or more independent
cores into a single package composed of a single integrated circuit (IC), called a die, or more
dies packaged together. A dual-core processor contains two cores, and a quad-core processor
contains four cores. A multi-core microprocessor implements multiprocessing in a single
physical package. A processor with all cores on a single die is called a monolithic processor.
Cores in a multicore device may share a single coherent cache at the highest on-device cache
level or may have separate caches The processors also share the same interconnect to the rest of
the system. Each "core" independently implements optimizations such as superscalar
execution, pipelining, and multithreading. A system with n cores is effective when it is
presented with n or more threads concurrently. The most commercially significant multi-core
processors are those used in personal computers and game consoles In this context, "multi"
typically means a relatively small number of cores. However, the technology is widely used in
other technology areas, especially those of embedded processors, such as network processors
and digital signal processors

The amount of performance gained by the use of a multicore processor depends on the problem
being solved and the algorithms used, as well as their implementation in software. For so-called
"embarrassingly parallel" problems, a dual-core processor with two cores at 2GHz may
perform very nearly as fast as a single core of 4GHz. Other problems though may not yield so
much speedup. This all assumes however that the software has been designed to take advantage
of available parallelism. If it hasn't, there will not be any speedup at all. However, the processor
will multitask better since it can run two programs

• CoreDetails:-
Core is a pipelined architecture, where instructions move through a number of internal
stages between entering the processor As an instruction exits a stage another can enter,
minimising the idle time for each internal component. Core has around fourteen stages
in its pipeline: as with most modern architectures, there are a number of complications,
such as early completion
and out of order execution,
which make it hard to
define exactly how many
stages there are.

Intel's Core micro-architecture.

The front end of the machine fetches instructions and does preliminary analysis and
reconstruction work on them. Core is a four-wide machine, with portions of five or six wide,
meaning it can execute at least four instructions at once. That's wider than any previous x86
architecture. Internally, Core has its own microcode, and the first stage in dealing with x86
instructions is translating them to micro-ops in that microcode while working out which
instructions can be safely combined into single operations -- 'macrofusion'.

As with all chip designers, Intel spends a lot of time analysing software, looking for common
combinations of instructions -- for example, a mathematical comparison followed by a switch
to a different section of code depending on the result of that comparison. By fusing those two
x86 operations into a single micro-op, the chip can complete them much faster.

Core also does 'microfusion', where it does something similar but for those occasions when a
single x86 instruction translates into multiple micro-ops. Where possible, the processor binds
two of those micro-ops together and treats them as one; again, this can reduce the number of
processing steps by around ten percent in some cases.

Once we've got streams of micro-ops rattling through the pipelines, considerable performance
gains can be achieved by spotting those instructions that'll take some time to complete and
starting them as early as possible.

Typically, these involve reads or writes to memory: if you know that ten steps down the
pipeline you'll need to load some information in, it's best to send the request out to the
relatively slow memory system as early in the pipeline as possible. Unfortunately, instructions
already in the pipeline may change the data at the memory location that you've preloaded,
making your version out of date by the time it comes into play.

Core copes with this by using prediction hardware that allows a read from memory to happen
even if there's a write already in progress, provided the predictor thinks that the write is
unlikely to cause a problem. Checking afterwards catches the times that this prediction is
wrong, when there's a relatively slow process of recovering the right information; however, on
balance the gains from guessing right outweigh the losses when it gets it wrong.
By the time instructions reach the end of the pipeline, they will have been operated on in any
order that the chip deems most efficient. It has a single unified scheduler that decides what
happens when, and that controls every execution unit on the chip.

• Development:-
While manufacturing technology continues to improve, reducing the size of single gates,
physical limits of semiconductor-based microelectronics have become a major design concern.
Some effects of these physical limitations can cause significant heat dissipation and data
synchronization problems. The demand for more capable microprocessors causes CPU
designers to use various methods of increasing performance. Some instruction-level parallelism
(ILP) methods like superscalar pipelining are suitable for many applications, but are inefficient
for others that tend to contain difficult-to-predict code.

Many applications are better suited to thread level parallelism (TLP) methods, and multiple
independent CPUs is one common method used to increase a system's overall TLP. A
combination of increased available space due to refined manufacturing processes and the
demand for increased TLP is the logic behind the creation of multi-core CPUs.

• Advantages:-
1. The proximity of multiple CPU cores on the same die allows the cache coherency
circuitry to operate at a much higher clock rate than is possible if the signals have to
travel off-chip. Assuming that the die can fit into the package, physically, the multi-core
CPU designs require much less Printed Circuit Board (PCB) space than multi-chip SMP
2. Also, a dual-core processor uses slightly less power than two coupled single-core
processors, principally because of the decreased power required to drive signals
external to the chip and

because the smaller silicon process geometry allows the cores to operate at lower voltages;
such reduction reduces latency.

3. Furthermore, the cores share some circuitry, like the L2 cache and the interface to the
front side bus (FSB). In terms of competing technologies for the available silicon die
area, multi-core design can make use of proven CPU core library designs and produce a
product with lower risk of design error than devising a new wider core design. Also,
adding more cache suffers from diminishing returns.

• Disadvantages:-
1. In addition to operating system (OS) support, adjustments to existing software are
required to maximize utilization of the computing resources provided by multi-core
processors. Also, the ability of multi-core processors to increase application
performance depends on the use of multiple threads within applications
2. Integration of a multi-core chip drives production yields down and they are more
difficult to manage thermally than lower-density single-chip designs. Intel has partially
countered this first problem by creating its quad-core designs by combining two dual-
core on a single die with a unified cache, hence anytwo working dual-core dies can be
used, as opposed to producing four cores on a single die and requiring all four to work
to produce a quad-core.
3. From an architectural point of view, ultimately, single CPU designs may make better
use of the silicon surface area than multiprocessing cores, so a development
commitment to this architecture may carry the risk of obsolescence.
4. Finally, raw processing power is not the only constraint on system performance. Two
processing cores sharing the same system bus and memory bandwidth limits the real-
world performance advantage.
5. If a single core is close to being memory bandwidth limited, going to dual-core might
only give 30% to 70% improvement. If memory bandwidth is not a problem, 90%
improvement can be expected. It would be possible for an application that used two
CPUs to end up running faster on one dual-core

• Multi-Chip Module (MCM):-

It is a specialized electronic package where multiple integrated circuits (ICs), semiconductor

dies or other modules are packaged in such a way as to facilitate their use as a single IC. The
MCM itself will often be referred to as a "chip" in designs, thus illustrating its integrated

Multi-Chip Modules come in a variety of forms depending on the complexity and development
philosophies of their designers. These can range from using pre-packaged ICs on a small
printed circuit board (PCB) meant to mimic the package footprint of an existing chip package
to fully custom chip packages integrating many chip dies on a High Density Interconnection
(HDI) substrate.

Multi-Chip Module packaging is an important facet of modern electronic miniaturization and

micro-electronic systems. MCMs are classified according to the technology used to create the
HDI (High Density Interconnection) substrate.

• MCM-L - laminated MCM. The substrate is a multi-layer laminated PCB (Printed

circuit board).
• MCM-D - deposited MCM. The modules are deposited on the base substrate using
thin film technology.
• MCM-C - ceramic substrate MCMs, such as LTCC.

• POWER5 MCM with four processors

• Features:-

Features Benefits
Two independent processor cores in one physical package run at the same frequency,
and share up to 6 MB of L2 cache as well as up to a 1333 MHz Front Side Bus, for
truly parallel computing.
Intel® Wide
Improves execution speed and efficiency, delivering more instructions per clock cycle.
Each core can complete up to four full instructions simultaneously.

Intel® Smart Optimizes the use of the data bandwidth from the memory subsystem to accelerate
Memory Access out-of-order execution.
A newly designed prediction mechanism reduces the time in-flight instructions have
to wait for data. New pre-fetch algorithms move data from system memory into fast
L2 cache in advance of execution. These functions keep the pipeline full, improving
instruction throughput and performance.

The shared L2 cache is dynamically allocated to each processor core based on
Advanced Smart
workload. This efficient, dual-core optimized implementation increases the
probability that each core can access data from fast L2 cache,

Enables the processor to access larger amounts of memory. With appropriate 64-bit
supporting hardware and software, platforms based on an Intel processor supporting
Intel 64 architecture can allow the use of extended virtual and physical memory.

Provides enhanced virus protection when deployed with a supported operating system.
Execute Disable The Execute Disable Bit allows memory to be marked as executable or non-
Bit4 executable, allowing the processor to raise an error to the operating system if
malicious code attempts to run in non-executable memory, thereby preventing the
code from infecting the system.

• The 64-bit advantage:-

The Core 2 Duo, like the Pentium D before it, is based on Intel's Extended Memory 64
Technology (EM64T), also called AMD64 by AMD and known generically as x86-64.
Basically it is the old x86 architecture (called "general purpose instructions" and commonly
referred to as the IA32 instruction set architecture (ISA))

• Increased number of general purpose registers

• 64-bit addressing
• 128-bit (SSE, SSE2, SSE3) media instructions
• Improved physical and virtual memory management

The EM64T ISA includes twice as many general purpose registers as the old x86 design, and
all of them are twice as wide due to 64-bit addressing The instruction pointers also increase
from 32 to 64.Having more and wider general purpose registers means that memory can be
used much more efficiently and memory traffic can be minimized, which in turn allows
compilers to compile programs to work much faster on your machine.

64-bit addressing means that the physical memory limitation rises to 1TB (that's 1000GB) from
the 32-bit limit of 4GB. The processor can also work with longer instructions. To really notice
this advantage, you have to stress the system to a degree that most desktop users don't with
current software, but as desktop applications demand more from processing hardware, this
advantage will become much more important.

128-bit media instructions refer specifically to Intel's SSE, SSE2, and SSE3 (Streaming SIMD
-- Single Instruction Multiple-Data -- Extensions) technologies. These instructions are very
useful for working with large blocks of data, which benefits anyone who deals with a lot of
scientific data or high-performance media or anything that uses floating-point math.

EM64T deals with both physical and virtual memory in a much more sensible manner than
x86, treating the entire virtual memory space as one unsegmented block and eliminating a lot of
translation layers from the process of addressing physical memory. Previously x86 would
segment virtual memory into small blocks for use with different programs and functions, but
this ended up being inefficient and rarely used by software. EM64T eliminates that inefficiency
by letting the software choose how it will handle virtual memory

• Conclusion :-
At last we conclude that the Combining equivalent CPUs on a single die significantly improves
the performance of cache snoop operations. Put simply, this means that signals between
different CPUs travel shorter distances, and therefore those signals degrade less. These higher
quality signals allow more data to be sent in a given time period since individual signals can be
shorter and do not need to be repeated as
often. it is "a frequency limited processor with additional support for ratio overrides higher
than the maximum Intel-tested bus-to-core ratio."

• References:-