Pentium Architecture

Pentium Architecture
The Pentium family of processors originated from the 80486 microprocessor. The term
''Pentium processor'' refers to a family of microprocessors that share a common
architecture and instruction set. The first Pentium processors were introduced in 1993. It
runs at a clock frequency of either 60 or 66 MHz and has 3.1 million transistors. Some of
the features of Pentium architecture are
Complex Instruction Set Computer (CISC) architecture with Reduced Instruction Set
Computer (RISC) performance.
64-Bit Bus
Upward code compatibility.
Pentium processor uses Superscalar architecture and hence can issue multiple
instructions per cycle.
Multiple Instruction Issue (MII) capability.
Pentium processor executes instructions in five stages. This staging, or pipelining,
allows the processor to overlap multiple instructions so that it takes less time to
execute two instructions in a row.
The Pentium processor fetches the branch target instruction before it executes the
branch instruction.
The Pentium processor has two separate 8-kilobyte (KB) caches on chip, one for
instructions and one for data. It allows the Pentium processor to fetch data and
instructions from the cache simultaneously.
When data is modified, only the data in the cache is changed. Memory data is
changed only when the Pentium processor replaces the modified data in the cache
with a different set of data
The Pentium processor has been optimized to run critical instructions in fewer clock
cycles than the 80486 processor.
Fig 35.1 Superscalar Architecture of Pentium

The Pentium processor has two primary operating modes 1. Protected Mode - In this mode all instructions and architectural features are
available, providing the highest performance and capability. This is the recommended
mode that all new applications and operating systems should target.
2. Real-Address Mode - This mode provides the programming environment of the Intel
8086 processor, with a few extensions. Reset initialization places the processor in
real mode where, with a single instruction, it can switch to protected mode
The Pentium's basic integer pipeline is five stages long, with the stages broken down as
follows:
1. Pre-fetch/Fetch : Instructions are fetched from the instruction cache and aligned in
pre-fetch buffers for decoding.
2. Decode1 : Instructions are decoded into the Pentium's internal instruction format.
Branch prediction also takes place at this stage.
3. Decode2 : Same as above, and microcode ROM kicks in here, if necessary. Also,
address computations take place at this stage.
4. Execute : The integer hardware executes the instruction.
5. Write-back : The results of the computation are written back to the register file.
Fig 35.2 Pentium pipeline stages

Floating Point Unit :
There are 8 general-purpose 80-bit Floating point registers. Floating point unit has 8 stages
of pipelining. First five are similar to integer unit. Since the possibility of error is more in
Floating Point unit (FPU) than in integer unit, additional error checking stage is there in
FPU. The floating point unit is shown as below
Fig 35.3
FRD
FDD
FADD
FEXP
FAND
FMUL - Floating Point Multiply
Floating Point Unit
Floating
Floating
Floating
Floating
Floating
Architecture of Intel 80286
Key Features
16-bit date bus
24-bit non-multiplexed bus
Packaged in a 68-pin ceramic pack
80286 has 2 24 = 16 M Byte of physical memory accessibility
Point
Point
Point
Point
Point
Rounding
Division
Addition
Exponent
And
Intel 80386 - A 32-bit Microprocessor with Memory Paging Facility

Intel 80386 is a logical extension of the 80286 microprocessor. The basic architecture of
80386 is given here.
Fig. 33.1 Basic architecture of 80386 microprocessor

Features of 80386:
More highly pipelined than 80286
Instruction fetching, instruction decoding, instruction execution and memory
management are all carried out in parallel.
32-bit data bus
32-bit non-multiplexed address bus
232 = 4 Gigabyte of physical memory
246 or 64 Terabyte of virtual memory.
Instruction set compatibility :
Instruction sets of Intel microprocessors have upward compatibility (for example, a program
Fig 32.1 Basic Architecture of 80286

Memory Bank
Memory of 80286 is setup as an odd bank and an even bank, just as it is for the 8086. The even
bank is enabled when A 0 is low and the odd bank is enabled when
is low. To access an
aligned word, both A 0
will be low.
Fig 32.2 Memory banks in 80286
Memory Addressing in 80286

1.
Real Addressing Mode - It is just like as in 8086. Address is 20 bit with 16 bit segment
and 16 bit offset. When 80286 is hardware reset, it automatically enters real address
mode.
2. Protected Virtual Addressing Mode (PVAM) - In this we have 1 GByte of virtual memory
and 16 Mbyte of physical memory. The address is 24 bit. To enter PVAM mode,
Processor Status Word (PSW) is loaded by the instruction LPSW.
Fig 32.3 Load Processor Status Word
PE - Protection Enable
MP - Monitor Processor Extension
EM - Emulate Processor Extension
TS - Task Switch
Hardware reset is the only way to come out of protected mode.
80286 Memory Management SchemeMemory is organized into logical segments. Segment size
can be anywhere between 1 Byte to 16 KByte. All 24 address pins are active and 16 MByte of
physical memory is available.
Descriptor
It is 8-byte quantity. Each segment has a descriptor. There are two main types of descriptor
Segment Descriptor
System control Descriptor
Format of a Descriptor
Fig 32.4 Descriptor Format

Access Right byte definition
6-5
Present (P)
1
0 - No
Descriptor Privilege level (DPL)
0 to 3
Segment Descriptor
1
0 - Control
For segment descriptor, i.e. for S = 1, bits 3-0 have the following meaning -
0
1 - Code
Data
Yes
Segment
Expansion/ Confirming
If code, Confirming: 1 means 'Yes', 0 means 'No'

If data, Expand down: 1 - Yes, 0 - No (normal
case)
R/W
If code, Readable: 1 - Yes,

If data, Writeable: 1 - Yes, 0 - Not
Accessed (A)
A
=
A = 1, Accessed
0,
Not
Not
accessed
Descriptors are contained in a descriptor table. There are two categories of descriptor table global and local. A system has only one global descriptor table or GDT. A local descriptor table
or LDT is set up in the system for each task or closely related group of tasks. Each task can have
its own descriptor table and memory area defined by the descriptors in it.
Accessing Segments
The 80286 microprocessor keeps the base address and limits for the descriptor tables currently in
use in internal registers. These registers are load descriptor table register (LDTR) and global
descriptor table register (GDTR). Descriptor in memory is addressed by adding segment selector
to these registers. The descriptors contain the base address of segments, which when added with
the offset in the virtual address points to the required memory location.
Accessing a Segment of Higher Privilege Level
Tasks operate at the lowest privilege level. Usually, segments at a lower privilege level are not
allowed to access segments at a higher privilege level directly. However, a lower level segment
can access a higher level segment indirectly by a Gate Descriptor. The details of a gate descriptor
are given herewith.
Fig 32.5 Privilege Level

Gate Description Format
Fig 32.6 Gate Descriptor Format
Name
Value
Description
Type
Call gate
Task gate
Interrupt gate
Trap gate
Descriptor contents are NOT valid
Descriptor contents are valid
DPL
0-3
Descriptor privilege level
Word Count
0-31
Number of words to copy from callers stack to called

procedures stack. Only used with called gates.
Selector to target code segment (call, interrupt, task gates)

Destination Selector
16-bit Selector
Selector to target task state segment (task gate)
Destination Offset
16-bit Offset
Entry point within the target code segment
Task Switching and Task gates

Each task in a PVAM system has a 22-word task state segment (TSS) associated with it. A TSS
holds copies of all registers and flags, the selector for the tasks' LDT, and a link to the TSS of the
previously executing task.
Descriptors for each task state segment are kept in the global descriptor table. A task register
(TR) in the 80286 holds the selector and the task state segment descriptor for the currently
executing task. The load task register (LTR) instruction can be used to initialize the task register
to the task state segment for a particular task. During a task switch the task register is
automatically loaded with the selector and descriptor for the new task.
Method of Task Switching
1. Long jump or call instructions that contain a selector which points to the Task State
segment descriptor
2. IRET
3. Selectors in a long jump or call points to a task gate
4. Interrupt occurs and the vectors point to a task gate descriptor
80286 Interrupt Handling
Real addressing mode has 256 interrupts with types 0-255. Each interrupt takes 4 bytes, so we
have to reserve 1KByte of memory for interrupt.
In PVAM mode also we have 256 interrupts but it is not assigned a fixed memory. The interrupt
descriptor table can be anywhere in the physical memory. Base address of interrupt descriptor
table is stored in interrupt descriptor table register (IDTR). The particular descriptor is accessed
as follows (Interrupt Type * 8) + IDTR
Descriptor
Use of Translation Look-aside Buffer (TLB) in 80386

It is cumbersome and time consuming to calculate the physical address from linear address
for every memory location. A Translation Look-aside Buffer (TLB) simplifies the process.
TLB is a page table cache, which stores the 32 recently accessed page table entries.
The paging unit receives a 32-bit linear address from the segmentation unit. The upper 20
bits of the linear address is compared with all 32-entries in the translation look-aside buffer
(TLB) to check if it matches with any of the entries. If it matches, the 32-bit physical
address is calculated from matching TLB entry and placed on the address bus.
Fig. 34.1 TLB organization in 80386

Structure of TLB:
TLB has 4 sets of eight entries each. Each entry consists of a TAG and a DATA. Tags are
24 bit wide. They contain 20 upper bits of linear address, a valid bit and three attribute bits.
The Data portion of each entry contains higher 20 bits of the Physical address.
Fig. 34.2 Structure of TLB

Introduction to Intel 80486:
CPU 80486 DX from Intel is the first 32-bit microprocessor to have an inbuilt floating point
unit. It retained the complex instruction set of 80386 but introduced more pipelining for
speed enhancement. 80486 has five stages of pipelining. Two out of five stages are used
for decoding complex instructions of 80486 architecture. The 80486 is also the first
amongst the xxx86 processors to have an on-chip cache. This 8 Kbytes of cache is a
unified data and code cache and acts on the physical addresses.
Note:
80486
SX
32-bit
address
32-bit data lines: (D 0 - D 31 )
does
lines:
not
(A 2 -
have
floating
A 31 ,
point
BE 0 -
unit
BE 3 )
In February 1990, IBM introduced RS/6000 microprocessor based on POWER architecture with UNIX operating system. PowerPC
was second generation POWER architecture. It has Reduced Instruction Set Computer (RISC) architecture. RISC architecture tries
to keep the processor as busy as possible. Salient features of RISC architecture are -
Fixed length instructions (4 byte instructions). This allows single decoding mechanism
Mostly single cycle instruction execution
Less number of instructions
PowerPC was created in 1991 by Apple-IBM-Motorola alliance. Originally intended for personal computers , PowerPC CPUs have
since become popular embedded and high-performance processors as well. It is largely based and compatible with POWER
microprocessor. Design features of PowerPC are as follows -
Broad range implementation
Simple processor design
Superscalar architecture
Multiprocessor features
64-bit architecture
Support for operation in both big-endian and little-endian mode. PowerPC can switch from one mode to another at run
time.
Separate set of floating point instructions for
Separate set of Floating Point Registers (FPRs) for floating-point instructions
Motorola PowerPC 601 was the first PowerPC. Few of its features were -
1.
64-bit microprocessor
2.
32-bit address lines
3.
Can handle integer data of 8, 16 and 32 bits
4.
RISC architecture with 4 byte instruction length
5.
PC 601 has virtual memory addressing of 4 penta byte.
Apart from the changes to the instruction set, the most significant changes in PowerPC were in the memory model and the memory
management definition. In the POWER Architecture, the processor did not maintain data memory consistent with either I/O
accesses or instruction fetches. Software had to manage memory consistency for both these areas. Before copying an area of
memory to disk, software had to ensure that any modified copies of the memory area that were in the data cache had been written
to main memory. Before starting a read from disk, software had to ensure that the data cache did not contain a copy of any part of
the memory area, and software had to invalidate any copy of the memory area in the instruction cache before restarting the program
that requested the operation. POWER processors always accessed main memory through the caches.
PowerPC memory model, however, provides greater flexibility. It implements processor-enforced data memory consistency, relieving
software of the responsibility for the consistency of memory with respect to I/O operations. The model allows speculative access to
any page unless it has an attribute indicating that it contains I/O or it exhibits other volatile characteristics. It also makes it possible
to map I/O into the main memory space.
As in the POWER memory model, the PowerPC memory model requires software to maintain instruction memory consistent with
data memory. Programs that modify or generate instructions must ensure that cached copies of a memory area containing the new
instructions are consistent with the main memory before attempting to execute those instructions.
Fig 36.1 Branch Processing Unit of PowerPC

The Branch Processing Unit (BPU) looks at lower four instructions in instruction queue to bring the branch instruction in advance.
The jump instruction is analyzed and the next instruction is brought and executed till the write-back stage. With this the branch takes
single cycle. A branch instruction has a Jump Prediction Bit associated with which tells whether there is likelihood of jump or not. In
case a jump is predicted new instructions may be brought in for the entire instruction queue. Later, if the prediction comes out to be
true then the execution continues normally and we have considerable amount of performance gain. However, if branch prediction
turns out to be false then we have something called Branch Folding. In branch folding all instructions executed after the prediction
are discarded and the execution resumes just after branch instruction. We have loss of instruction cycles in this case.
The PowerPC Architecture permits a range of implementations from low-cost controllers through high-performance processors. It
allows the implementation of processors targeted for desktop and notebook systems, yet it contains features to support the efficient
implementation of processors for use in a range of multiprocessor systems.
Core 2 Duo was the first family of desktop-class microprocessors based on Core microarchitecture.
While the first Core 2 Duo processors had much lower core frequency and approximately the same
FSB frequency and level 2 cache size as Pentium D microprocessors, they had better performance
than the fastest Pentium D 960 due to much more efficient microarchitecture. The only exception to
this were the slowest (less than 2 GHz) Core 2 Duo CPUs, that could perform slightly worse in some
benchmarks. Newer dual-core CPUs have such improvements as higher core and FSB frequency, larger
level 2 cache size, and lower power consumption. All Core 2 Duo processors use the same socket 775
package as many Pentium 4 and all Pentium D microprocessors, and can work in a number of Pentium
4 and Pentium D motherboards.
Core 2 Quad microprocessors are essentially two Core 2 Duo CPUs in one package - two cores are
located on one die, two other cores are on another die, and both dies are packaged together. This
explains why the level 2 cache on these processors is shared only between two cores. Obviously, these
CPUs have higher (about 50% higher) Thermal Design Power than dual-core microprocessors running
at the same frequency. The quad-core CPUs have the same performance as the Core 2 Duo processors
in single-threaded applications, and are faster or considerably faster in multi-threaded applications.
Performance difference in games between quad- and dual-core microprocessors is highly dependent
on the game, and varies from no difference at all to 20% performance advantage for quad-core CPUs.
The quad-core processors are packaged in socket 775 package, and work in the same motherboards
as the Core 2 Duo CPUs.
Core 2 Extreme is a brand name for the best-performing desktop Core 2 microprocessors. These
processors were always faster than other Core 2 Duo and Core 2 Quad CPUs released at the same
time. No only Extreme processors had higher core frequency, they also had unlocked clocked multiplier
which allowed their owners to increase their frequency above nominal (overclock them). A few
Extreme processors had other features that increased their performance even further: higher bus
frequency, twice as many cores, and/or large level 2 cache. Being faster than any other Core 2 Duo
and Core 2 Quad on the market, these CPUs were almost twice more expansive than the most
expensive Core 2 Duo / Quad microprocessor. The Core 2 Extreme processors were packaged in 775land package and worked in the same motherboards as Core 2 Duo and Core 2 Quad CPUs.
Core 2 Solo is a family of low-power microprocessors based on Core microarchitecture. As the name
suggests, these processors have only one core. Like other mobile Core 2 families, the Core 2 Solo
CPUs have additional low-power modes along with Dynamic Acceleration technology (it can
temporarily boosts core frequency above nominal frequency). Solo processors have much lower
Thermal Design Power than Core 2 Duo mobile microprocessors - 5.5 Watt versus 25 or 35 Watt. All
Core 2 Solo CPUs are packaged into Ball Grid Array package - they are always soldered on the
motherboard, and can be removed or replaced only with the help of special equipment.
Comparison between 8085 and Z80 Microprocessors

This tutorial gives a brief comparison among different classic microprocessor families like 8085, 8086,
80186, Zilog 80 and Motorola 6800 processor. This comparison we are giving because of demand from
our students of different countries.
Compare between 8085 and 8086, Compare between 8051 and MC6800, Compare between 8086 and 80386, Compare
between 8086 and 8088
Comparison between 8085 and Z80 Microprocessors
S.No.
8085 Microprocessor
Z80 Microprocessor
Data Lines are MULTIPLEXED
It has no MULTIPLEXED lines
74 instructions
158 Instructions
Operates at 3 to 5MHz
Operates at 4 to 20 MHz
It has 5 interrupts
It has two interrupts
No on board dynamic memory
It contains no Index register
It has on board logic to refresh Dynamic

memory
It has two Index register
It contains SIM & RIM
It contains no SIM & RIM
Comparison between 8085 and MC6800 Microprocessors

S.No.
8085 Microprocessor
MC6800 Microprocessor
It operates on Clock frequency of 3 to 5 MHz. It operates at 1 MHz frequency.
8085 has no Index register.
It has one index register.
8085 has on board clock logic circuit.
No clock logic circuit.
8085 has one Accumulator Register.
MC6800 has two Accumulator Registers.
8085 has five interrupts.
MC 6800 have two interrupts.
It has total 674 Instructions.
MC6800 has total 72 instructions.
Comparison between 8086 and 80386 Microprocessors

S.No. 8085 Microprocessor
MC6800 Microprocessor
It is a 16 bit microprocessor and it is first 16 bit

microprocessor after 8085(8-bit).
It is a 32 bit microprocessor and it is logical extension of the

80236.
It has pipelined architecture (not highly) and high speed bus

interface on single chip.
It is highly pipelined architecture and much faster speed bus

than 8086.
It is upward compatible with 80386.It means all 8086

instructions are followed by 80386.
However, 80386 can support 8086 programming model & can

also directly run the programs written for 8086 in virtual
mode if VM=1(in protected mode)
It is housed on a 40 pin DIP package.
The chip of 80836 contains 132 pins.
It is a built on a HMOS technology.
The 80386 using High-speed CHMOS III technology.
No special hardware is equipped for task Switching.
It has a special hardware for task switching.
The 8086 operates on a 5MHz. Clock.
The 80386 operate 33MHz clock frequency maximum.
The address bus and data bus are multiplexed.
It has separate address and data bus for time saving.
It has a transistor package density of 29,500 transistors.
Transistor density and complexity further increases 2,75,000.
10
It has a total of 117 instructions.
It has total 129 instructions
11
It has no mechanism protection, paging.
The 80386 contains protection mechanism paging which has

instruction two support them
12
It is operated in one mode only.
It operate in three modes

a)Real
b)Virtual
c)Protected
13
It has only instruction Queue.
It has instruction Queue as well as pre fetch queue.
14
In 8086, It is not necessity that all operation are in parallel

mode.
80386 all functional units are not parallel
15
8086 has nine flags.
It contains all nine flags of 8086 but other flags named

IOP,NT,RF,VM.
Comparison between 8086 and 8088 Microprocessors

S.No.
8086 Microprocessor
8088 Microprocessor
The instruction Queue is 6 byte long.
The instruction Queue is 4 byte long.
In 8086 memory divides into two banks, up to 1,048,576 bytes. The memory in 8088 does not divide in to two banks as
8086.
The data bus of 8086 is 16-bit wide
The data bus of 8088 is 8-bit wide.
It has BHE( bar ) signal on pin no. 34 & there is no SSO(bar)

signal.
It does not has BHE( bar ) signal on pin no. 34 & has only
SSO(bar) signal. It has no S7 pin.
The output signal is used to select memory or I/O at M/IO(bar) The output signal is used to select memory or I/O at
but if IO(bar)/M low or logic 0 it selects I/O devices and if
M(bar)/IO but if IO/M(bar) is low or at logic 0,it selects
IO(bar)/M is high or logic 1it selects memory.
Memory devices and if IO/M(bar) is high or at logic 1it
selects I/O.
It needs one machine cycle to R/W signal if it is at even

location otherwise it needs two.
It needs one machine cycle to R/W signal if it is at even

location otherwise it needs two.
In 8086, all address & data Buses are multiplexed.
In 8088, address bus, AD7- AD0 buses are multiplexed.

Pentium Architecture

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Pentium Architecture

Diunggah oleh

Hak Cipta:

Format Tersedia

Pentium Architecture

Fig 35.1 Superscalar Architecture of Pentium

Fig 35.2 Pentium pipeline stages

Floating Point Unit

Architecture of Intel 80286

16-bit date bus

24-bit non-multiplexed bus

Packaged in a 68-pin ceramic pack

80286 has 2 24 = 16 M Byte of physical memory accessibility

Intel 80386 - A 32-bit Microprocessor with Memory Paging Facility

Fig. 33.1 Basic architecture of 80386 microprocessor

Fig 32.1 Basic Architecture of 80286

Fig 32.2 Memory banks in 80286

Memory Addressing in 80286

Fig 32.3 Load Processor Status Word

System control Descriptor

Fig 32.4 Descriptor Format

Descriptor Privilege level (DPL)

If code, Confirming: 1 means 'Yes', 0 means 'No'

If code, Readable: 1 - Yes,

Fig 32.5 Privilege Level

Fig 32.6 Gate Descriptor Format

Descriptor contents are NOT valid

Descriptor contents are valid

Descriptor privilege level

Number of words to copy from callers stack to called

Selector to target code segment (call, interrupt, task gates)

Entry point within the target code segment

Task Switching and Task gates

Use of Translation Look-aside Buffer (TLB) in 80386

Fig. 34.1 TLB organization in 80386

Fig. 34.2 Structure of TLB

Mostly single cycle instruction execution

Less number of instructions

Broad range implementation

Simple processor design

Separate set of floating point instructions for

Separate set of Floating Point Registers (FPRs) for floating-point instructions

32-bit address lines

Can handle integer data of 8, 16 and 32 bits

RISC architecture with 4 byte instruction length

PC 601 has virtual memory addressing of 4 penta byte.

Fig 36.1 Branch Processing Unit of PowerPC

Comparison between 8085 and Z80 Microprocessors

Comparison between 8085 and Z80 Microprocessors

Data Lines are MULTIPLEXED

It has no MULTIPLEXED lines

It has two interrupts

No on board dynamic memory

It contains no Index register

It has on board logic to refresh Dynamic

It contains SIM & RIM

It contains no SIM & RIM

Comparison between 8085 and MC6800 Microprocessors

It operates on Clock frequency of 3 to 5 MHz. It operates at 1 MHz frequency.

8085 has no Index register.

It has one index register.

8085 has on board clock logic circuit.

No clock logic circuit.

8085 has one Accumulator Register.

MC6800 has two Accumulator Registers.

8085 has five interrupts.

MC 6800 have two interrupts.