I The Pentium Pro was introduced in November 1995 as Intels 6th generation x86
designcode-named the P6. P6 has a three-way superscalar architecture (3 insts.
per clock cycle). AD has been extended to 36 bits (address space
236 = 6.8719e+10, i.e. 64 GB). In addition to the L1 caches provided by the
Pentium, the Pent. Pro has a 256 KB L2 cache in the same package as the CPU.
I Powerful, but expensive.
(a) (b)
Figure: Intel Pentium Pro: (a) package, (b) CPU and L2 cache die.
Another Pentiums . . .
I The Pentium II processor was introduced in May 1997 and it has added
multimedia (MMX) instructions to the Pentium Pro architecture. L1D and L1P
caches have been extended to 16 KB each. It has also added more comprehensive
power management features including Sleep and Deep Sleep modes to conserve
power during idle times. The Pentium II abandoned the socket approach to
microprocessors, and introduced the slot concept. Containing 7.5 million
transistors (the first P6-generation core of the Pentium Pro contained 5.5 million
transistors). However, its L2 cache subsystem was a downgrade when compared
to Pentium Pros.
(a) (b)
Figure: (a) Intel Pentium II Deschutes; CPU Core in the middle, cache on the right, (b) mobile
version of Pentium II Tonga.
Another Pentiums . . .
I The Pentium III processor (Feb 1999) introduced streaming SIMD extensions
(SSE), cache prefetch instructions, and memory fences, and the single-instruction
multiple-data (SIMD) architecture for concurrent execution of multiple
floating-point operations. Pentium 4 enhanced these features further.
I Code name: Katmai, 250 nm, May 1999
I Coppermine, 180 nm, Mar 2000 (Remq.: The Pentium III Coppermine was the first
commercial x86 processor from Intel to attain a clock speed of 1 GHz)
I Coppermine T, 180 nm, Aug 2000
I Tualatin, 130 nm, Apr 2001
(a) (b)
Figure: Intel Pentium III: (a) standard logo, (b) code name Coppermine.
64-bit processor was born
I Intels 64-bit Itanium processor (released in 2001; formerly called IA-64) is
targeted for server applications and high-performance computing systems. The
Itanium uses a 64-bit AB to provide substantially larger address space. Its DB is
128 bits wide. In a major departure, Intel has moved from the CISC designs used
in their 32-bit processors to RISC orientation for their 64-bit Itanium processors.
I Each 128-bit instruction word contains three instructions, and the fetch mechanism can
read up to two instruction words per clock from the L1 cache into the pipeline.
I When the compiler can take maximum advantage of this, the processor can execute six
instructions per clock cycle.
I The processor has thirty functional execution units (6 general-purpose ALUs, 2 integer
units, 1 shift unit, 6 data cache units, 6 multimedia units, 2 parallel shift units,
1 parallel multiply, 1 population count, 2 82-bit floating-point multiply-accumulate
units, 2 SIMD floating-point multiply-accumulate units (two 32-bit operations each),
3 branch units) in eleven groups.
I Each unit can execute a particular subset of the instruction set, and each unit executes
at a rate of one instruction per cycle unless execution stalls waiting for data. While not
all units in a group execute identical subsets of the instruction set, common instructions
can be executed in multiple units.
(a) (b)
Figure: Intel Itanium: (a) modified logo from 2009, (b) Itanium 2 McKinley.
Intel Itanium architecture
I The IA-32 architecture provides ten 32-bit and six 16-bit registers. These
registers are grouped into general, control, and segment registers.
I The general registers are further divided into data, pointer, and index registers.
I There are four 32-bit data registers that can be used for arithmetic, logical, and
other operations:
I Four 32-bit registers (EAXaccumulator, EBXbase, ECXcounter, EDXdata); or
I Four 16-bit registers (AX, BX, CX, DX); or
I Eight 8-bit registers (AH, AL, BH, BL, CH, CL, DH, DL).
Data, pointer, and index registers
(a) (b)
Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.
I Some registers have special functions when executing specific instructions. For
example, when performing a multiplication operation, one of the two operands
should be in the EAX, AX, or AL register depending on the operand size.
Similary, the ECX or CX register is assumed to contain the loop count value for
iterative instructions.
I The two index registers (ESI, EDI) play a special role in the string processing
instructions, but can be used as general-purpose data registers as well.
I The pointer registers are mainly used to maintain the stack. Even though they
can be used as general-purpose data registers, they are almost exclusively used for
maintaining the stack.
Move operation examples
(a) (b)
Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.
Figure: The six segment registers support the segmented memory architecture.
(a)
(b)
1 section . d a t a ; FILENAME : s a n d b o x . a s m
2 section . t e x t
3
4 global _start
5
6 _start :
7 nop
8 ; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . .
9 mov edx , 'WXYZ ' ; 32 b i t move Z@5Ah...W@57h
10 mov ax , 067 FEh ; 16 b i t move
11 mov bx , ax ; 16 b i t move
12 mov c l , bh ; 8 b i t move
13 mov ch , b l ; 8 b i t move
14 xchg c l , ch ; e x c h a n g e v a l u e s c l <>ch
15 ; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . .
16 nop
17
18 section . b s s
1 sandbox : sandbox . o
2 ld o sandbox sandbox . o
3 sandbox . o : sandbox . asm
4 nasm f elf g F stabs sandbox . asm l sandbox . lst
I Makefile example:
nasm invokes the assembler
-f elf specifies that the .o file will be generated in the elf format
-g specifies that debug information is to be included in the .o file
-F stabs specifies that debug information is to be generated in the stabs format
-l listing file will be generated
Debugging tools: KDbg, gdb, . . .
(a)
(b)
Figure: Debugging example application in KDbg: (a) main window, (b) register values.