Anda di halaman 1dari 87

UNIT - IV

THE ARM RISC ARCHITECTURE

The RISC revolution

In the early 80s the idea of RISC was introduced. RISC stands for
Reduced Instruction Set computer.

RISC processors have faster clock rates. The clock rates range from 20 to
120MHz.

Most RISC processors use hardwired control and 32 bit instructions.

The limited addressing modes are used by these processors.

A large register file , separate instruction and data caches are used. It
eliminates unnecessary storage of intermediate results.

Contd..
Problems in CISC processors:
1.

Instructions were of varying length from 1 byte to 8 bytes. This causes

problems with the pre-fetching and pipelining of instructions.


2.

ALU instructions could have operands that were memory locations.

Because the number of cycles it takes to access memory varies.


3.

Most ALU instructions had only 2 operands where one of the operands is

also the destination. This means this operand is destroyed during the operation
or it must be saved before somewhere.

To overcome this, the idea of RISC was introduced .

It composed of instructions that all have exactly the same size, usually 32 bits.

Thus they can be pre-fetched and pipelined successfully.

Contd..

Thus A=B+C will be assembled as,


Load R1,A
Load R2,B
Add R3,R1,R2
Store C,R3.

Although it takes 4 instructions we can reuse the value in the registers.

The RISC architecture

RISC architecture uses separate


instruction and data caches. Their
access paths are different.

The hardwired control is found in


most RISC processors.

The CISC architecture

In a CISC processor, there is a


unified cache for holding the data
and instructions.

Therefore they have to share the


same path for data and instruction.

The CISC processors use micro


programmed control. Thus the
control memory is needed in CISC
processors.

This

will

slow

instruction execution.

down

the

RISC

CISC

1.

Clock rate is 50-150MHz.

2.

Simple

instruction

taking

one

1.

Clock rate is 33-50MHz.

2.

Complex instruction set taking

cycle.
3.

Very

multiple cycles.
few

instructions

refer

3.

memory.

Most of the instructions may refer


memory.

4.

Fixed format instructions.

4.

Variable format instructions.

5.

Few addressing modes.

5.

Many addressing modes.

6.

Multiple register sets.

6.

Single register set.

7.

Highly pipelined.

7.

Less pipelined.

8.

Complexity is in the compiler.

8.

Complexity
program.

is

in

the

micro

RISC properties

The following are the properties of RISC architecture.


1.

Register to register operations.

2.

One instruction per cycle.

3.

Hardwired instructions.

4.

Reduced number of instructions.

5.

Simple addressing modes.

6.

Simple instruction format.

7.

Instruction pipelining.

RISC properties
1. Register to register operations

The most important characteristics of RISC processor frequently


accessed operands remain in high speed storage.

To implement register to register operations, RISC processor provides


multiple sets of registers.

These register sets are organized into overlapped windows and act as
small, fast buffer for holding a subset of all variables that are most likely to
used.

Paramet
er
register

Local
register

Called procedure

Tempora
ry
register
Paramet
er
register

Current procedure
Local
register

Tempora
ry
register

Cont..

The window is divided into three fixed size areas.


Parameter register :
It will hold parameters passed down from the procedure that called the
current procedure and results to be passed back up.
Local registers :
They are used for local variables, as assigned by the compilers.
Temporary registers :

It is used to exchange parameters and results with the procedure


called by current procedure.

The temporary registers of current procedure are physically same


as parameter registers of the called registers.

This overlap permits parameters to be passed without actual


movement of data.

Cont..
2. One instruction per cycle.
In RISC processors, there is an one instruction per machine cycle.
A machine cycle is defined to be the time it takes to fetch two operands
from registers , performs an ALU operation, and stores the result in a
register.
So RISC machine instruction are not complicated and can execute as fast
as CISC machines.
3. Hardwired Instructions
With simple , one cycle instruction, there is no need for micro instructions.
The machine instructions can be hardwired.
These instructions are executed faster than the instructions implemented
with micro instructions, since it is not necessary to access a micro program
control memory during instruction execution.
4. Reduced number of instructions.
RISC processor provides limited number of instructions, which simplifies
the design of control unit.

Cont..

5. Simple addressing modes


RISC processor uses simple addressing modes.
Almost all instruction uses simple addressing modes.
This architecture of RISC processor simplifies the instruction set and
control unit.
6. Instruction pipelining
The process of fetching next instruction while the current instruction is
being executed is known as pipelining.
The CPU contains several independent units that work in parallel.
One of them fetches the instruction, and other ones decode and execute
them.
At any instant, several instructions are in various stages of processing.
The instructions have he following two phases.
Instruction Fetch (I).
Execute (E).

Cont..

The instruction fetch phase fetches the instruction to be executed from


memory.
The execute phase performs an ALU operation with register input and
output to execute the instruction.
In case of load and store instructions, three phases are required:
Instruction Fetch (I).
Execute (E).
Data Transfer (D).
Here also the instruction fetch phase fetches the instruction to be executed.
In execution phase address of memory is calculated and in data transfer
phase actual data is transferred from register to memory or from memory to
register depend upon the instruction.
In two way instruction pipelining, I and E phases of two different
instructions are performed simultaneously.
In three way instruction pipelining, three instructions can be overlapped.

Cont..

Two way instruction pipelining

Three way instruction


pipelining

RISC addressing modes

The small instruction set of typical RISC processor consists mostly of


register to register operations, and simply load and store operations for
memory access.
Each operand is brought into processor register with a load instruction and
results are transferred to memory by means of store instruction.
In this architecture almost all instructions have simple register addressing,
and thus it uses only a few addressing modes.
RISC processor has three basic addressing modes.
Register addressing,
Immediate operand and
Relative to PC addressing for branch instructions.
Register addressing: In register addressing, the instruction usually
consists of three fields: opcode field which specifies an operation one or
two source register fields and one destination register field.
For example: ADD R1, R2, R3
( R3
R1 + R2)

Cont..

Immediate addressing mode: In immediate operand addressing mode, the


second source is an immediate operand.
The operation is performed with the data specified in the source register
field and the immediate operand, and the result is stored in the destination
register field.
Example: ADD R1, #100 , R2..(R2
R1+100)
Relative to PC addressing: In relative to PC addressing, the instruction
usually consists of three fields: opcode field, condition field and address
field.
Opcode field specifies the operation.
The condition field specifies one of many possible branch conditions.
The address field specifies the signed offset which is to be added to the
contents of PC to calculate new address when branch condition is satisfied.
Example: JMP COND, R1(R2)(PC
R1 + R2).

Design for low power consumption

Power consumption is becoming the limiting factor in the amount of


functionality that can be placed in the devices.
ARM processor power efficient processing.
Most components are currently fabricated using CMOS technology.
CMS technology is cost efficient and consumes low power than the other
technology.
The sources of power consumption on a CMOS chip can be classified as
Static power dissipation.
Dynamic power dissipation.
Dynamic power is frequency dependent, while static is not.
Until recently a 5V supply was standard , but many modern processors
operates on 3V power supply.
The latest technologies operate with supplies of between 1 and 2V and this
will reduce further in future.

CMOS power components

Switching power:
This is the power dissipated by charging and discharging the gate
output capacitance.
Short-circuit power:
During transition on the input of CMOS gate both p and n transistors
can conduct simultaneously resulting a transitory conducting path from
Vdd to Vss.
This causes a power dissipation which is a small fraction.
Leakage current:
A very small current called leakage current flows through the
transistors when they are in OFF state.
The power dissipation due to leakage current is small and can be
neglected.

CMOS circuit power

Neglecting power consumption due to leakage current and short circuit, the
total power dissipation of a CMOS circuit is the summation of power
dissipation due to all the gates in the circuit.
It is given by,

Where,
f = clock frequency,
Ag = gate activity factor,
Cg = gate load capacitance.

Low power circuit design

Various approaches to the low power design are as follows,


Use low power supply voltage, Vdd.
Keep the circuit activity factor as small as possible.
Simplify the circuit and use minimum number of gates to implement it.
Use minimum clock frequency. Lower clock frequency allows
operation at a reduced Vdd resulting low power consumption.
Reducing Vdd!
Reducing Vdd, we can reduce power dissipation. However, reducing Vdd
also reduces the performance of the circuit.

Maximum operating frequency is reduced as Vdd is reduced.


So by decreasing Vt, we can improve the performance.

ARM Architecture Basics

The ARM (Advanced RISC machine) processor is basically RISC.


ARM is a 32 bit processor.
The ARM processor provides solutions for
Open platforms running complex operating systems for wireless,
consumer and imaging applications.
Embedded real time systems for mass storage, automotive, industrial
and networking applications.
Secure applications including smart cards and SIM cards.
The RISC design was adapted by ARM to create a flexible embedded
processor.
So ARM architecture is not a pure RISC architecture.
The ARM architecture incorporates a number of features from RISC
design, but rejects a number of other features.
The ARM instruction set differs from pure RISC instruction set since the
ARM instruction set is made suitable for embedded applications.

Architecture Inheritance

The features of RISC which are accepted by ARM processors.


A large uniform register file.
A load store architecture.
Uniform and fixed length instruction fields.
Three address instruction formats.
The features of RISC which are rejected by ARM processor.
Register windows.
Delayed branches.
Single cycle execution of all instruction.
In addition ARM architecture gives,
Control over ALU and shifter in every data processing instruction.
Load and store multiple instructions.

ARM core dataflow model

Load-store architecture:
It has two instruction types, load and store, for transferring data in and
out of the processor respectively.
LOAD : This instruction copies data from memory to registers in the
processor core.
STORE : This instruction copies data from registers in the processor
core to memory.
The ARM processor instruction set does not include the instruction that
directly manipulate data in memory.
The data processing is carried out only in registers.
Data bus:
The data enters the ARM core through data bus.
The data is either in the form of a instruction opcode or a data.
Data and instruction share the same bus.
Instruction decoder:
This unit decodes the instruction opcode read from the memory and
then the instruction is executed.

Cont..

Register file:
This is a bank of 32 bit registers used for storing data items.
Sign extend:
The ARM core is a 32 bit processor. So most instructions of ARM
processor treat registers as holding signed or unsigned 32 bit values.
When the processor reads signed 8 bit or 16 bit numbers from memory,
the sign extend hardware converts these numbers to 32 bit values and
then places them in a register file.
ALU and MAC:
Most of the ARM instructions are two operands instructions. The two
source registers Rn and Rm are used to store these operands.
These source operands are read from the Rn and Rm registers using the
internal buses A and B respectively.
The ALU and MAC reads the operand values from Rn and Rm
registers via internal C bus in destination register, Rd and then to the
register file.

Cont..

Address register:
This holds the address generated by the load and store instructions and
places it on the address bus.
Barrel shifter:
The contents of the Rm register alternatively can be preprocessed in
the barrel shifter before applying as an input to the ALU.
Incrementer:
For load and store instructions, the incrementer updates the contents of
the address register before the processor core reads or writes the next
register value from or to the consecutive memory locations.

ARM visible registers

Cont..

The register file in the ARM core contains all the registers, available to a
programmer.
The current mode of the processor decides the availability of the registers
to the programmer.
The ARM processor has a total of 37 registers.
All registers are 32- bit wide. They can be classified into two groups as,
General purpose registers and
Special purpose registers.
General purpose registers:
Registers r0 r12 are used as general purpose registers. Depending
upon the context, registers r13 r15 can also be used as general
purpose registers.
The general purpose registers hold either data or an address.

Cont..

Special purpose registers:


Registers r13 r15, CPSR (current program status register) and SPSR
(saved program status register) are the special register. In user mode,
this registers are labeled as r13 sp, r14 lr and r15 pc respectively.
Stack pointer (r13 sp) : Register r13 is the stack pointer. It stores the
top of the stack in the current processor mode.
Link register (r14 lr) : Register r14 is the link register. The processor
stores the return address in the register when a subroutine is called.
Program counter (r15 pc) : Register r15 is the program counter and
stores the address of the next instruction to be fetched from the
memory by the processor.
The unbanked registers:
Registers r0 r7 are unbanked registers. This means that each of them
refers to the same 32 bit physical register in all processor modes.
They are completely general purpose registers, with no special uses
implied by the architecture.

The banked registers:


Registers r8 to r14 are banked registers.
Almost all instructions allow the banked registers to be used wherever
a general purpose register is used.
Out of 37 registers, 20 registers are banked registers.
Program status register:

Format of CPCR
The current program status register is accessible in all processor modes.
It contains condition code flags, interrupt disable bits, the current processor
mode and other status and control information.
User mode and system mode do not have an SPSR, because they are not
exception.

Control flags:
The control bits change when an exception arises and can be altered
by software.
Bits 0-4 (mode select bits):
This bit determines the processor mode.
PROCESSSOR MODE

MODE SELECT BITS

Abort

10111

Fast interrupt request

10001

Interrupt request

10010

Supervisor

10011

System

11111

Undefined

11011

user

10000

Bit 5 (thumb state bit):


This bit gives the state of the core.
The state of the core determines which instruction set is being
executed.
There are three instruction set,
1. ARM.
2. Thumb.
3. Jazelle.
Some processor have extra bits allocated to decide the state of the
processor.
The J bits in the flags field is only available on jazelle enabled
processor.
The jazelle J and Thumb T bits in CPSR decide the state of the
processor.
When both, J and T bits are 0, the processor is in ARM state and
executes the ARM instructions.

Thumb:
The Thumb instruction set is a reworking of the ARM set, with a few
things omitted.
Thumb instructions are 16 bits.
This allows for greater code density in places where memory is
restricted.
The Thumb set can only address the first eight registers and there are
no conditional execution instruction.
So, the thumb instruction set will always come along with full ARM
instruction set.
Jazelle:
Jazelle executes 8 bit instructions.
It is a hybrid mix of software and hardware.
It is designed to increase the speed of the java byte codes.
The jazelle technology and a specially modified version of the java
virtual machine is needed to execute java byte codes.

Bits 6 and 7 (interupt masks):


There are two interrupts available on the ARM processor core.
1. Interrupt request (IRQ) and
2. Fast interrupt request (FIQ)
These are maskable interrupts and their masking is controlled by bits 6
and 7 of CPSR.
Bit 6(F) controls FIQ and bit 7(I) controls IRQ.
When bit 6 is set to binary 1, the corresponding interrupt request is
masked and when bit is 0, the interrupt is available.
Conditional code flags:
These flag bits are updated by the operations performed by the ALU.
The conditional code flags are usually modified by,
. Execution of comparison instruction.
. Execution of some other arithmetic, logical and move instruction.

Bit 28 (overflow flag, V):


It is set in one of two ways,
1. For an addition or subtraction, V is set to 1 if signed overflow
occurs.
2. For non addition/subtraction, V is normally left unchanged.

Bit 29 (carry flag, C):


It is set in one of four ways,
1. For an addition, including the comparison instruction CMN, C is
set to 1 if the addition produced a carry, and to 0 otherwise.
2. For a subtraction, including the comparison instruction CMP, C is
set to 0 if the subtraction produced a borrow, and to 1 otherwise.
3. For non-addition/subtraction that incorporate a shift operation, C
is set to the last bit shifted out of the value by the shifter.
4. For other non-addition/subtraction, C is normally left unchanged.

Bit 30 (zero flag, Z):


It is set to 1 if the result of the instruction is zero (which often indicates
an equal result from a comparison, and to 0 otherwise.
Bit 31 (negative flag, N):
It is set to bit 31 of the result of the instruction.
If this result is regarded as a twos complement signed integer.
N = 1 if the result is negative and N = 0 if it is positive.
The memory system:
The ARM processor views memory as a linear collection of bytes
numbered in ascending order from zero to 232 1.
The ARM7TDMI processor is bi- endian and can treat words in
memory as being stored in either,
1. Little endian (or)
2. Big endian.
Little endian is traditionally the default format for ARM processor.

Little endian:
In little endian format, the lowest addressed byte in a word is
considered the least significant byte of the word.
The highest addressed byte is the most significant.
So the byte at address 0 of the memory system connects to data lines 7
through 0.
For a word aligned address A, the figure shows how the word at
address A, the halfword at address A and A+2 and the byte addresses A,
A+1, A+2 and A+3 map on to each other when the core is configured
as little endian.

31

24

23

16

15

1 0

Word at address A
Halfword at address A+2
Byte at address
A+3

Byte at address
A+2

Halfword at address A
Byte at address
A+1

Byte at address A

Big endian:
In big endian format, the ARM processor stores the most significant
byte of a word at the lowest numbered byte and the least significant
byte at the highest numbered byte.
So the byte at address 0 of the memory system connects to data lines
31 through 24.
For a word aligned address A, the figure shows how the word at
address A, the halfword at address A and A+2 and the byte addresses A,
A+1, A+2 and A+3 map on to each other when the core is configured
as big endian.

31

24

23

16

15

1 0

Word at address A
Halfword at address A
Byte at address A

Byte at address
A+1

Halfword at address A+2


Byte at address
A+2

Byte at address
A+3

ARM instruction execution

ARM instruction are classified as,


Data processing instructions.
Data transfer instructions.
Branch instructions.
Data processing instructions:
These are two operand instructions.
One operand is always a register and the other operand is either a
second register or an immediate value.
The second operand is routed through barrel shifter to the ALU.
Arithmetic or logical operation is performed on the operands in the
ALU and the result from the ALU is written back in destination
register.

In case of immediate value, the second operand it is extracted from the


current instruction at the top of the instruction pipeline and it is routed
through barrel shifter to the ALU.
Along the instruction pipeline, PC is incremented and copied back into
both the address register and r15 in the register bank and next
instruction is loaded in the instruction pipeline.
All these operations are performed in a single clock cycle.

Data processing instruction datapath activity

Data transfer instructions:


Data transfer instructions are executed in two or more ways.
In the first cycle a memory address is computed in a manner similar to
the way a data processing instruction computes its result.
A register is used as a base address, to which offset is added.
The 12 bit offset is either taken from the second register or from the
current instruction as an immediate value and is routed through the
shifter without any shift.
The computed address from the ALU is sent to the address register.
In the second cycle actual data transfer takes place.
It is important to note that PC value is incremented and stored in the
register bank at the end of the first cycle so that the address register is
free to accept the data transfer address for the second cycle.
At the end of the second cycle the PC is loaded into address register to
fetch the next instruction.

STR datapath activity

Branch instructions:
Branch instructions are executed in three cycles.
In the first cycle, a 24 bit immediate field is extracted from the
instruction and then shifted left two bit positions using barrel shifter to
give a word aligned offset.
This offset is added with PC and the result is loaded into address
register.
In the second cycle, the return address, the contents of PC are loaded
into the link register r14 through ALU.
The third cycle is used to fill the instruction pipeline.

First two cycles of branch instruction

ARM organization and implementation


3 stage pipeline ARM organization:
The main components of an ARM organization with a 3 stage pipeline are,
Register bank :
It stores the processor state. It has two read ports and one writ port
which can each be used t access any register.
It has also an additional read port and an additional write port that give
special access to r15, the program counter (PC).
Barrel shifter :
It is used to shift or rotate one operand by any number of bits.
ALU :
It performs arithmetic and logical functions required by the instruction
set.
Address register and incrementer :
They select and hold all memory addresses and generate sequential
addresses when required.

3-Stage pipelining

5 stage pipeline ARM organization:


The pipeline provided by ARM7 is very cost effective.
For higher performances, we require processor organizations which
support more number of pipeline stages.
The time required to execute a program is given by,

NinstXCPI
Tprog
Fclk
Tprog : Time required to execute a given program.
Ninst : Number of ARM instructions executed in the program.
CPI : Average number of clock cycles per instruction.
Fclk : Processors clock frequency.
There are some ways to increase the performance,
Increase the clock rate, Fclk : To achieve this it s necessary to
simplify the pipeline stages to increase the number of pipeline stages.

Thus, to give higher performance ARM9 core employs a 5 stage pipeline.


FETCH

DECODE

EXECUTE

MEMORY

WRITE

Instruction Fetch

Thumb/ARM inst.decoder

Shift

ALU

Memory Access

Register Write

5-Stage pipelining

ARM9TDMI 5 Stage Pipelining:


It has separate instruction and data memories to support 5 stage
pipelining.
It provide forwarding paths to solve the problem of data dependencies
without stalling the 5 stage pipeline.
Data dependency is a pipeline hazard which arise when an instruction
needs to use the result of one of its predecessors before that result has
returned to the register file.
This concept is known as data forwarding.
There are some cases in which forwarding paths cannot avoid a
pipeline stall due to data dependencies.
For example,
LDR R0, [R7].
ADD R4, R0, R2.
Instruction sequence suffers a single cycle penalty due to load use
interlock on register R0.
In such cases, compilers are encouraged to not to put a dependent
instruction immediately after a load instruction.

The 5-stage pipeline stages are,


Fetch:
In this stage the processor fetches instruction from memory and places
in the instruction pipeline.
Decode:
In this stage,
1. The instruction is decoded and
2. The register operands read from the register.
Execute:
. In this stage,
1. An operand is shifted.
2. The ALU result generated.
3. If the instruction is load or a store, the memory address is computed
in the ALU.
Memory:
. In this stage, data memory is accessed if required.

Write:
In this stage, the results generated by the instruction are written back to
the register file including any data loaded from memory.

Three stage pipelined instruction execution

ARM implementation

ARM clocking scheme:


Most ARMs do not operate with edge sensitive registers.
The ARM clocking scheme is based around 2 phase non overlapping
clocks generated internally from a single input clock signal.
This scheme allows level sensitive transparent latches.
Data movement in this scheme is controlled by passing the data
alternatively through latches open during phase 1 and latches open
during phase 2.

ARM datapath timing:

As shown in figure, the register read buses are valid early in phase 1.
One operand is passed through the barrel shifter and the output of barrel shifter is
valid later in the phase 1.
ALU has input latches and they are open when valid data arrives.
ALU gets the valid operands later in the phase 1 so that the phase 2 precharge
does not get through the ALU.

The ALU then continues to process the operands in phase 2.


At the end of phase 2 ALU output valid result and it is latched in the
destination register.
The minimum datapath cycle time is given by,
T(min) = Register read time + shifter delay + ALU delay + Register write
set up time + Phase 2 to Phase 1 non overlap time.
Adder Design:
Ripple carry adder circuit

The ARM supports 32 bit addition and it has significant effect on the
datapath cycle time.
As a result it has also significant effect on processors performance.
It has worst case carry path of 32 gates long.
In order to reduce worst case carry path and to allow a higher clock rate,
ARM 2 uses 4 bit carry look ahead circuit.
4 bit carry look ahead circuit

ALU functions:
Along with the addition, ALU does address computations for memory
transfer, branch calculations, bit wise logical functions and so on.

ARM2 ALU logic

ARM6 carry select adder scheme:


Carry select adder supported by ARM6 computes the sums of various fields
of the word for a carry in of both zero and one and then the final result is
selected by using the correct carry in bit value to control the multiplexer .

In this scheme, the worst case addition time is significantly faster than the 4
bit carry look ahead adder.

ARM6 ALU Organization:


The ARM6 does not easily lead to a merging of the arithmetic and
logical functions into a single structure as was used on ARM2.
Instead, a separate logic unit runs in parallel with the adder, and a
multiplexer selects the output from the adder or from the logic unit as
required.

Carry arbitration adder:


ARM9TDMI supports improved adder logic called carry arbitration
adder.
It computes all intermediate carry values using a parallel prefix tree,
which is very fast parallel logic structure.

The above table shows the values of u and v for inputs A, B and C
(carry) for a particular bit position.
When C is unknown, values of u and v are 1 and 0, respectively.
It is important to note that u gives the carry out if the carry in is one
and v gives the carry out if the carry in is zero.

The barrel shifter:


In the ARM architecture shift time is critical since it contributes
directly to the data path cycle time.
In order to minimize the shifting time i.e., delay through shifter, a cross
bar switch matrix is used instead of actual shifting of data.
Each input is connected to each output through a switch.

In the above figure 4x4 matrix is shown. ARM processors use 32x32
matrix.
Precharging sets all outputs to logic 0, so those which are not connected to
any input during switching remain at 0 giving the zero filling required by
the shift operation.
For rotate right, the right shift diagonal is enabled + complementary left
diagonal.

Multiplier Design:
The older ARM cores support 32 bit result multiplication.
They use the barrel shifter and ALU to generate the product.
Here, multiplication is implemented using modified booth algorithm.
On the other hand recent ARM cores support 64 bit result
multiplication.
For high performance multiplication they use carry save adders.
In this technique, the carry output from bit i during step j is applied to
carry input bit i+1 during the next step j+1.
After addition of carry components in the last row, one more step is
required in which the carries are allowed to ripple from the least to the
most significant bit.

High Speed Multiplier Organization

ARM Register Bank:


The ARM register bank consists of 31 general purpose registers, each on
of 32 bit.
Each bit in the register is implemented using register cell circuit.

The register cell consists of asymmetric cross coupled CMOS inverter


pair.
When the register contents are changed the cell is overwritten by a
strong signal from the ALU bus.

Read buses A and B are provided to read the state of the cell.
Read operation activated by activating control signals read A and read B.
The register cell are arranged column wise to from 32 bit register.
Such column are packed together to form the complete register bank.
The decoders are used for the read and write enable lines which are packed
above the column.
In the ARM processor Program Counter is a part of register bank having
two write and three read ports.
The other registers in the bank have only one write port and two read ports.
The PC is kept at one end of the register array.

ARM core datapath buses

ARM control logic structure

It consists of three structural components,


Instruction Decoder PLA.
Distributed Secondary Control.
Decentralized control units.

Instruction decoder PLA: It uses internal cycle counter and some of the
instruction bits to identify the class of operation to be performed on the
datapath in the next cycle.
Distributed Secondary Control : It uses information from PLA to select
other instruction bits or processor state information to control the datapath.
Decentralized Control Units : They control the datapath for specific
instructions that take a variable number of cycles to complete their
execution.
The cycle count block indicates the current cycle number in the multicycle instruction execution.
According to the cycle count PLA generate different control outputs.
The cycle count also determines whether it is a last cycle of the current
instruction and if it is, it initiates the transfer of the next instruction
from the instruction pipeline.

Physical Design:
There are two principal mechanisms used to implement an ARM
processor core.
Hard Macrocell:
It is a physical layout.
It can be used only on the particular process for which it has been
designed.
For every new process, the layout need to be modified and
recharacterized.
Soft Macrocell:
It is a synthesizable design expressed in a hardware description
language such as VHDL.
It can readily be ported to a new process technology.
Recent ARM processor cores are available in both hard and soft forms.

ARM7TDMI core

The ARM7TDMI is the current low end ARM core.


It is mainly used in many digital mobile telephones.
Features:
ARM7TDMI core is a member of the ARM family of general purpose
32 bit microprocessors.
ARM family offers high performance for low power consumption and
small size.
ARM7TDMI core uses pipeline to increase the speed of the flow of
instructions to the processor.
It uses 3 stage pipeline with stages,
Fetch
Decode and
Execute.
ARM7TDMI core has a Von Neumann architecture, with a single 32
bit data bus carrying both instruction and data.

Cont..

Data handled by ARM7TDMI can be 8 bit (byte), 16 bit (halfword), 32 bit


(words).
ARM7TDMI core instruction set enables us to implement specialized
additional instructions using coprocessors to extend functionality.
ARM7TDMI processor contains hardware extensions for advanced
debugging features.

ARM7TDMI Organization

CLOCK SIGNALS:

Mclk : Memory clock input. This is the main clock for all memory accesses
and processor operations.
Wait : When LOW the processor extends an access over a number of cycles of
MCLK, which is useful for accessing slow memory.
Eclk : External clock output.
MEMORY INTERFACE:
MREQ : Memory request : When the processor requires memory access
during the following cycle this is low.
SEQ : Sequential Address : When the address of next memory cycle is
closely related to that of the last memory access, this is high.
LOCK : Locked operation : When the processor is performing a locked
memory access this is high. This is used to prevent the memory
controller
allowing another device to access the memory. It is active
only during
the data swap instructions.
R / W : Read / Write : When the processor is performing a read cycle, this
is low.

MAS [1:0]:
Memory access size : Used to indicate to the memory system the size
of data transfer required for both read and write cycles, become valid
before the falling edge of MCLK and remain valid until the rising edge
of MCLK.
The binary values 00, 01 and 10 represent byte, halfword and word
respectively.
BL[3:0]:
Byte latch control : The values on the data bus are latched on the
falling edge of MCLK when these signals are high.
MMU INTERFACE:
TRANS:
Memory translate : When the processor is in user mode, this is low. It
can be used either to tell the memory management system when
address translation is on.
MODE[4:0]:
Processor mode : These are the inverse of the internal status bits
including the current processor mode.

ABORT:
Memory abort : the memory system uses this signal to tell the processor that
a requested access is not allowed.
STATUS SIGNAL:
TBIT:
When the processor is executing the thumb instruction set, this is high. It is
low when executing the ARM instruction set.
CONFIGURATION:
BIGEND:
Big endian configuration : selects how the processor treats bytes in memory.
HIGH for big endian format.
LOW for little endian format.
INTERRUPTS:
FIQ:
Fast interrupt request : Taking this LOW causes the processor to be
interrupted if the appropriate enable in the processor is active.
The signal is level sensitive and must be held LOW until a suitable response
is received from the processor.

IRQ:
Interrupt request : As FIQ, but with lower priority. Can be taken LOW
to interrupt the processor.
ISYNC:
Synchronous interrupts : Set this HIGH if IRQ and FIQ are
synchronous to the processor clock. Set it LOW for asynchronous
interrupts.
INITIALIZATION:
RESET:
Used to start the processor from a known address.
A LOW level causes the instruction being executed to terminate
abnormally.
When HIGH for at least one clock cycle, the processor restarts from
address 0.

BUS CONTROL:
ENIN:
Enable input : This must be LOW for the data bus to be driven during
write cycle.

ENOUT:
Enable output : during a write cycle, this signal is driven LOW before
the rising edge of MCLK and remains LOW for the entire cycle.
DBE:
Data bus enable : Must be HIGH for data to appear on either the
bidirectional or unidirectional data output bus.
When LOW, the bidirectional data bus is placed into high impedance
state and data output is prevented on the unidirectional data output bus.
ABE:
Address bus enable : The address bus are disabled when this is LOW.
ABE must be HIGH if there is no system requirement to disable the
address drivers.
ALE:
Address latch enable : The signal is provided for backwards
compatibility with older ARM processors.
This enables these address signals to be held valid for the complete
duration of a memory access cycle.

APE:
Address pipeline enable : selects whether the address bus and other
signals operate in pipelined (APE is high).
Or depipelined mode (APE is LOW).
BUSEN:
Data bus configuration : A static configuration signal that selects
whether the bidirectional data bus (D[31:0]) or the unidirectional data
buses (Din[31:0]) and (DOUT[31:0]) are used to transfer data between
the processor and memory.
When BUSEN is LOW, D[31:0] is used.
When BUSEN is HIGH, DIN[31:0] and DOUT[31:0] is enabled.
DEBUG INTERFACE:
The ARM7TDMI processor contains hardware extensions for advanced
debugging features.
DBGACK:
Debug acknowledge : when the processor is in debug state this is high.

DBGEN:
Debug enable : A static configuration signal that disables the debug
features of the processor when held LOW.
This signal must be HIGH to enable the debug function.
DBGRQ :
Debug request : This is a level sensitive input, that when HIGH causes
ARM7TDMI core to enter debug state after executing the current
instruction.
It has also additional debugging features.
EXTERN0:
External input 0 : This is connected to the Embedded ICE debug logic
and enables breakpoints and watchpoints to be dependent on an
external condition.
EXTERN1:
External input 1 : This is connected to the Embedded ICE debug logic
and enables breakpoints and watchpoints to be dependent on an
external condition.

COMMRX:
Communication channel receive : When the communication channel
receive buffer is full this is HIGH.
This signal changes after the rising edge of MCLK.
COMMTX:
Communication channel transmit : When the communication channel
transmit buffer is empty this is HIGH.
This signal changes after the rising edge of MCLK.
EXEC:
Executed : This is HIGH when the instruction in the execution unit is
not being executed.
RANGEOUT0:
When the embedded ICE watchpoint unit 0 has matched the conditions
currently present on the address, data and control buses, then this is
HIGH.
RANGEOUT1:
When the embedded ICE watchpoint unit 1 has matched the conditions
currently present on the address, data and control buses, then this is
HIGH.

Anda mungkin juga menyukai