ARM Architecture

Outline
z Introduction
z ARM architecture
ARM Architecture
Architecture version & variants

Programmers model
Instruction Set
ARM extension
z Coprocessor Interface
z Processor Cores
Sept 14 , 2005
ARM7, ARM9, StrongARM
z AMBA
Kyoung-Su Kim
E-mail: kimks@rayman.sejong.ac.kr
Real-Time Graphics Lab., Sejong Univ.
z Operating System Support

Memory System
Stack & Subroutine system
ARM Software development
-2-
Sept 14, 2005
Introduction
ARM architecture
z Advances RISC Machines (now known as ARM) was

established as a joint venture between Acorn, Apple
and VLSI between Acorn, Apple and VLSI in
November 1990
z ARM is the industry's leading provider of 16/32-bit
embedded RISC microprocessor solutions
z The company licenses its high-performance, low-cost,
power-efficient RISC processors, peripherals, and
system-chip designs to leading international
electronics companies
z ARM provides comprehensive support required in
developing a complete system
z 32 bit RISC processor of load/store architecture
Sept 14, 2005
z Architecture version
Version 1 (obsolete)
Basic data processing
Byte, word and multi-word load/store
Software interrupt
26 bit address bus
Version 2 (obsolete)
Multiply & Multiply-accumulate
Coprocessor support
Atomic instruction for thread synchronization
26 bit address bus
-3-
Sept 14, 2005
-4-
ARM architecture
ARM architecture
z Architecture version (contd)
z Architecture version (contd)
Version 3
Version 5
32 bit address bus

Add CPSR, SPSR
Improve ARM/THUMB inter-working

Add CLZ instruction for efficient integer divide
Add software breakpoint
Add more coprocessor support
More tight definition of arithmetic flags
Add MRS, MSR. Modify exception handler
Add Data abort mode and undef mode
Version 4
Half word transfer
Introduce THUMB processor state
Add Privileged mode for operating system
2 word distance of PC from current instruction
PC+8 behavior (at ARM state)
First fully formalized architecture
-5-
Sept 14, 2005
ARM architecture
-6-
Sept 14, 2005
ARM architecture
z Architecture Variants
z Architecture Variants (contd)
THUMB ( symbol as a T)
Long Multiply Instruction (M variant)
THUMB instruction set: 16 bit re-encoded subset of 32 bit

ARM instruction set
32x32 = 64 bit. Provide full 64 bit result
Enhanced DSP instructions (E variant)

Carefully chosen addition to native ARM instruction for
DSP application
Multiply with Q15 fixed integer. Saturation
64 bit transfer
First introduced in v5
z Variants in Processor core

D: On-chip debug. Halt in response
I: Embedded ICE. On-chip breakpoint
Small code size ( up to 40 % compression)

Simplified design
Sept 14, 2005
-7-
Sept 14, 2005
-8-
ARM architecture
Programmers Model
z Feature of ARM programmers model
z Enianism (configured by input signal)
32 bit RISC processor (32-bit data & address bus)

Big and Little Endian operating modes
Fast interrupt response (for real-time
applications)
Virtual Memory System Support
Excellent high-level language support
Simple but powerful instruction set
Big Endian
z Most significant byte is at lowest address
z Word is addressed by byte address of most significant byte
Higher Address31
24 23
11
7
3
16 15
10
6
2
Little Endian
24 23
8
4
0
16 15
9
5
1
Lower Address
-9-
Programmers Model
11
7
3
8
4
0
- 10 -
Sept 14, 2005
z Registers
37 registers
31 general 32 bit registers
6 status registers
16 general registers and one or two status registers are visible at
any time
The visible registers depend on the processor mode
The other registers (the banked registers) are switched in to
support IRQ, FIQ, Supervisor, Abort and Undefined mode
processing
R0 to R15 are directly accessible
R0 to R14 are general purpose
R15 holds the Program Counter (PC)
CPSR - Current Program Status Register contains condition code
flags and the current mode bits
5 SPSRs (Saved Program Status Registers) which are loaded with
CPSR when an exceptions occurs
User mode (usr)

the normal program execution state
FIQ mode (fiq)

designed to support a data transfer or channel process
IRQ mode (irq)

used for general purpose interrupt handling
Supervisor mode (svc)

a protected mode for the operating system
Abort mode (abt)

entered after a data or instruction prefetch abort
Undefined mode (und)

entered when an undefined instruction is executed
Sept 14, 2005
0 Word Address
87
10
6
2
Programmers Model
z Operating mode (configured by software)
8
4
0
z Least significant byte is at lowest address

z Word is addressed by byte address of least significant byte
Higher Address31
8
4
0
Lower Address
Best of RISC + Best of CISC
Sept 14, 2005
0 Word Address
87
9
5
1
- 11 -
Sept 14, 2005
- 12 -
Programmers Model
Programmers Model
z Processor Status Registers
z Registers (contd)
User32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15(PC)
Fiq32
R0
R1
R2
R3
R4
R5
R6
R7
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R12_fiq
R13_fiq
R14_fiq
R15(PC)
CPSR
CPSR
SPSR_fiq
Supervisor32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_svc
R14_svc
R15(PC)
CPSR
SPSR_svc
Abort32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_abt
R14_abt
R15(PC)
IRQ32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_irq
R14_irq
R15(PC)
Undefined32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_und
R14_und
R15(PC)
CPSR
SPSR_abt
CPSR
SPSR_irq
CPSR
SPSR_und
The N, Z, C and V are condition code flags

may be changed as a result of arithmetic and logical operations in the
processor
may be tested by all instructions to determine if the instruction is to be
executed
N : Negative. Z : Zero. C : Carry. V : oVerflow
The I and F bits are the interrupt disable bits

The M0, M1, M2, M3 and M4 bits are the mode bits
31
30
29
28
R13: Stack point (in common)

R14: Linked register
R15: Program counter
M4
M3
M2
M1
M0
Mode Bits
FIQ disable
IRQ disable
Negative/Less Than
- 13 -
Programmers Model
- 14 -
Sept 14, 2005
Programmers Model
z Exceptions
z Exceptions (contd)
Brake the normal execution of program
Type of exception (contd)
Handle the interrupts from peripherals

Guarantee the currently executed instruction in execution pipeline
Undefined instruction trap

When the ARM comes across an instruction which it cannot handle it offers
it to any coprocessors which may be present
If a coprocessor can perform this instruction but is busy at that time, ARM
will wait until the coprocessor is ready or until an interrupt occurs
If no coprocessor can handle the instruction then ARM will take the
undefined instruction trap
Type of exception
FIQ (Fast Interrupt reQuest)
Externally generated by taking the nFIQ input LOW
Fast handling for data or channel transfer
IRQ(Interrupt ReQuest)
Normal interrupt caused by a LOW level on the nIRQ input
ABORT
Exception Priorities
Signaled by the external ABORT input

Indicates that the current memory access cannot be completed
Software interrupt
Generated by the software interrupt instruction (SWI)
Getting into Supervisor mode
usually to request a particular supervisor function. OS support
Sept 14, 2005
Overflow
Carry/Borrow/Extend
Zero
Individual stack for each processor mode
Sept 14, 2005
27
- 15 -
Sept 14, 2005
(1) Reset (highest priority)

(2) Data abort
(3) FIQ
(4) IRQ
(5) Prefetch abort
(6) Undefined Instruction, Software interrupt (lowest priority)
- 16 -
ARM architecture
Instruction Set (contd)
z Instruction Set
z Conditional execution
Instruction Format
All ARM instructions are conditionally executed

The execution may or may not take place depending on the values of the N, Z,
C and V flags in the CPSR
All THUMB instructions are decompressed to Always conditional instruction
Condition Field in instruction
3 address instruction format

used in ARM state
31
2 address instruction format
27
Cond
0000 = EQ - Z set (equal)
used in ARM and THUMB state
- 17 -
Sept 14, 2005
0001 = NE - Z clear (not equal)

0010 = CS - C set (unsigned higher or same)
0011 = CC - C clear (unsigned lower)
0100 = MI - N set (negative)
0101 = PL - N clear (positive or zero)
0110 = VS - V set (overflow)
0111 = VC - V clear (no overflow)
1000 = HI - C set and Z clear (unsigned higher)
1001 = LS - C clear or Z set (unsigned lower or same)
1010 = GE - N set and V set, or N clear and V clear (greater or equal)
1011 = LT - N set and V clear, or N clear and V set (less than)
1100 = GT - Z clear, and either N set and V set, or N clear and V clear (greater than)
1101 = LE - Z set, or N set and V clear, or N clear and V set (less than or equal)
1110 = AL - always
1111 = NV - never
- 18 -
Sept 14, 2005
z Control instruction
z Data processing instruction
Branch and branch with link
31
Jump to desired instruction

Save the current PC for return (with L bit)
28 2726 25 24
cond
21 20 19
0 0 # opcode S
1615
Rn
12 11
operand 2
Rd
destination register
first operand register
set condition codes
31
28 27
cond
25 24 23
101
arithmetic/logic function
24-bit signed word offset
25
Branch and exchange
11
Jump to desired instruction with exchange of instruction

set
8 7
#rot
8-bit immediate
immediate alignment
11
7 6 5 4 3
#shift
Rm[0] == 1: Subsequent inst. are THUMB.

Rm[0] == 0: Subsequent inst. are ARM.
25
Sh 0
Rm
immediate shift length

shift type
second operand register
11
31
cond
Sept 14, 2005
2827
6 5 4 3
0001001011111111111100
L 1
8 7 6 5 4 3
Rs
Rm
0 Sh 1
Rm
register shift length
- 19 -
Sept 14, 2005
- 20 -

z Data processing instruction (contd)

Op c o de
[2 4 :2 1 ]
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Mn e mo n i c
Me an i n g
Ef f e c t
AND
EOR
SUB
RSB
ADD
ADC
SBC
RSC
TST
TEQ
CMP
CMN
ORR
MOV
BIC
MVN
Logical bit-wise AND

Logical bit-wise exclusive OR
Subtract
Reverse subtract
Add
Add with carry
Subtract with carry
Reverse subtract with carry
Test
Test equivalence
Compare
Compare negated
Logical bit-wise OR
Move
Bit clear
Move negated
Rd := Rn AND Op2
Rd := Rn EOR Op2
Rd := Rn - Op2
Rd := Op2 - Rn
Rd := Rn + Op2
Rd := Rn + Op2 + C
Rd := Rn - Op2 + C - 1
Rd := Op2 - Rn + C - 1
Scc on Rn AND Op2
Scc on Rn EOR Op2
Scc on Rn - Op2
Scc on Rn + Op2
Rd := Rn OR Op2
Rd := Op2
Rd := Rn AND NOT Op2
Rd := NOT Op2
Shift operation
In any data processing instructions, the second register
operand can have a shift operation applied to it.
Logical shift
LSL: Logical shift left by 0 to 31 places

Fill the vacated bits at the least significant end of the word
with zeros.
LSR: Logical shift right by 0 to 31 places
Fill the vacated bits at the most significant end of the word
with zeros.
- 21 -
Sept 14, 2005
- 22 -
Sept 14, 2005
z Multiply Instruction
Shift operation (contd)
Product two 32 bit values in registers

May need more cycle and power in implementation
Arithmetic shift
ASR: = LSR
ASL: Arithmetic shift left
Convert another instruction if possible

Ex) b = a * 5 : b = a + a << 2
Sign extend the shifting bits

31
28 27
cond
Op c o de
[2 3 :2 1 ]
000
001
100
101
110
111
Rotation: ROR, RRX
Sept 14, 2005
- 23 -
Sept 14, 2005
24 23
0000
21 20 19
mul
16 15
12 11
S Rd/RdHi Rn/RdLo
8 7
Rs
4 3
1001
Rm
Mn e mo n i c
Me an i n g
Ef f e c t
MUL
MLA
UMULL
UMLAL
SMULL
SMLAL
Multiply (32-bit result)

Multiply-accumulate (32-bit result)
Unsigned multiply long
Unsigned multiply-accumulate long
Signed multiply long
Signed multiply-accumulate long
Rd := (Rm * Rs) [31:0]

Rd := (Rm * Rs + Rn) [31:0]
RdHi:RdLo := Rm * Rs
RdHi:RdLo += Rm * Rs
RdHi:RdLo := Rm * Rs
RdHi:RdLo += Rm * Rs
- 24 -

z Data transfer instruction (contd)
z Data transfer instruction
Multiple data transfer (LDM, STM)
Single data transfer (LDR, STR)
load (LDM) or store (STM) any subset of the currently visible

registers
Use
Single word(32bit), half word(16 bit) and byte(8 bit)

transfer
Addressing
Stack: maintaining full or empty stacks which can grow up or

down memory
Context switching: Save or restore the working registers
Block copy: moving large blocks of data around main memory
Register offset
Address = base register offset register
Immediate offset
Address = base register immediate constant
Post-indexing: modify address after use
Pre-indexing: modify address before use
Addressing
Pre/Post indexing
Auto increment or decrement
Write back the base register
Write back
Special bit
If enable, update the base register
PSR & force user bit
- 25 -
Sept 14, 2005
- 26 -
Sept 14, 2005
z Data transfer instruction (contd)
z PSR instruction (MRS, MSR)
Single data swap (SWAP)
The MRS and MSR instructions are formed from a subset of

the Data Processing operations
These instructions allow access to the CPSR and SPSR
registers:
Swap a byte or word quantity between a register and

external memory
Implemented as a memory read followed by a memory
write which are locked together
The MRS instruction allows the contents of the CPSR or SPSR_<mode>

to be moved to a general register
The MSR instruction allows the contents of a general register to be
moved to the CPSR or SPSR_<mode> register
Atomic instruction
Cant be interrupted during execution
External memory management unit is locked during
operation by LOCK signal output
Use
Synchronization in the multi-threading program (OS
support)
Lock
Semaphore
Sept 14, 2005
- 27 -
Sept 14, 2005
- 28 -

z Coprocessor instructions (contd)
z Coprocessor instructions
Coprocessor data transfers (LDC, STC)
Coprocessor
General mechanism to extend the instruction set through the
addition to the core
Example : system controller such as MMU & cache. FPU
Registers
Load (LDC) or store (STC) a subset of a coprocessorss registers directly

to memory
ARM is responsible for supplying the memory address, and the
coprocessor supplies or accepts the data and controls the number of
words transferred
private to coprocessor
ARM controls the data flow
Coprocessor concerns only the data processing and memory transfer
operations
Coprocessor register transfers (MRC, MCR)

Communicate information directly between ARM and a coprocessor
Coprocessor data operation (CDP)

This class of instruction is used to tell a coprocessor to perform
some internal operation
No result is communicated back to ARM, and it will not wait for
the operation to complete
31
28 27
24 23
20 19
16 15
12 11
8 7
5 4 3
z Software Interrupt Instruction (SWI)

Used to enter Supervisor mode in a controlled manner
The instruction causes the software interrupt trap to be taken, which effects
the mode change
0
31
cond
1110
Cop1
CRn
CRd
CP#
Cop2 0
CRm
- 29 -
Sept 14, 2005
ARM architecture
ARM version 6
Improved memory management
Multiprocessing
Add new synchronization instruction (LDREX, STREX)
Improved exception handling
Jazelle logic turned ON

Java Decode Java Decode
Register
Read
New bit in PSR
ALU
Control
Signals
Mixed endian support

Media extension
Thumb Decode
Compute
Partial Products
Sum/Accumulate
& Saturation
Register Register
Decode Read
Shift + ALU
Memory Access
ARM SIMD
(16bit 2 way and 8 bit 4 way)
FFT, MPEG4
Saturation, Selection
Register
Write
ARM Decode
Register Register
Decode Read
FETCH
Sept 14, 2005
- 30 -
z ARM extension (contd)
Instruction extension (Java state), not a coprocessor

Implemented in the ARM Pipeline as a FSM
Dynamic rere-mapping of Stack to Registers
Instruction
Fetch
24-bit (interpreted) immediate
Sept 14, 2005
Jazelle : ARMs Java extension (symbol as a J)
Stack
Management
24 23
1111
ARM architecture
z ARM extension
Bytecode
Instruction
Stream
28 27
cond
DECODE
EXECUTE
MEMORY WRITEBACK
- 31 -
Sept 14, 2005
- 32 -
Coprocessor Interface
Coprocessor Interface (contd)
z Implementation dependent
z Busy-waiting
For ARM7 (von neuman architecture)
If CPA goes LOW, ARM watch the CPB (coprocessor busy) line
ARM will busy-wait while CPB is HIGH, unless an enabled
interrupt occurs
When CPB goes LOW, the instruction continues to completion
M em ory
ARM 7
nCPI
CPA
CPB
z Pipeline following
z Data transfer cycles
C op rocesso r
Coprocessor must supply or accept data at ARM7 bus rate

ARM7 will continue to increment the address until CPA, CPB high
z Coprocessor present / absent (CPA)
nCPI LOW to execute a coprocessor instruction

Each coprocessor copies the instruction
Each coprocessor inspect the CP# field to see which coprocessor it is for
Every coprocessor in a system must have a unique number
If that number matches the contents of the CP# field the coprocessor should
drive the CPA (coprocessor absent) line LOW
If no coprocessor has a number which matches the CP# field, CPA and CPB
will remain HIGH, and ARM7 will take the undefined instruction trap
z Privileged Instructions
z Idempotency
Any action taken by the coprocessor before it goes not-busy must
be idempotent, ie must be repeatable with identical results after
interrupt
- 33 -
Sept 14, 2005
Processor Cores
Processor Cores
A[31:0]
z ARM7
Two main blocks: datapath and
decoder
Register bank (r0 to r15)
Two read ports to A- bus/ Bbus
One write port from ALU- bus
Additional read/ write ports for
program counter r15
Barrel shifter / ALU
Address registers/ incrementer
Single Memory Port
holds either PC address (with
increment) or operand address
control
z ARM7 (contd)
address register
P
C
Pipeline: 3 Stage pipeline
incrementer
Fetch : fetch instruction code from memory into the

instruction pipeline
Decode : instruction decoded to obtain control signals for
the datapath ready for the next stage
Execute : instruction owns the datapath - register read;
shifting; ALU results generated and write- back
PC
register
bank
instruction
decode
A
L
U
b
u
s
multiply
register
A
&
B
b
u
s
control
b
u
s
barrel
shifter
fetch
PC
ALU
decode
PC+4
3
in struction
data in register
D[31:0]
- 35 -
execute
R15
fetch
PC+4
data out register
Sept 14, 2005
- 34 -
Sept 14, 2005
decode
execute
fetch
decode
PC+8
execute
time
Sept 14, 2005
- 36 -
Processor Cores
Processor Cores
z ARM7(contd)
z ARM7(contd)
2 Phase Non-overlapping clocking scheme
Multi-cycle operation
phase 1
Single cycle throughput for almost simple data processing

instruction
Multi-cycle for mul, load/store
phase 2
1 clock cycle
Datapath timing
ALU operands
latched
fetch ADD decode
fetch STR
execute
decode
calc. addr. data xfer
fetch ADD
decode
fetch ADD
execute
decode
ph ase 1
ADD
STR
ADD
ADD
ADD
execute
ph ase 2
register
read
time
read bus valid
shift time
shift out valid
precharge
invalidates
buses
register
write time
ALU t ime
fetch ADD decode
execute
instruction
time
ALU o ut
- 37 -
Sept 14, 2005
Processor Cores
Processor Cores
z ARM7(contd)
z ARM7(contd)
Memory Interface
De-pipelined addressing
- 38 -
Sept 14, 2005
Cycle type
Pipelined addressing
mREQ must be valid before actual reference cycle
Sept 14, 2005
- 39 -
Sept 14, 2005
- 40 -
Processor Cores
Processor Cores
next
pc
+4
I-cache
z ARM9
fetch
z ARM9(contd)
pc + 4
Separate memory port

for high CPI
pc + 8
5 Stage Pipeline
Multi-cycle operation: MUL, multiple load/store
Data forwarding
I decode
r15
instruction
decode
register read
Instruction
Data
immediate
fields
mul
LDM/
STM
Datapath
+4
postindex
reg
shift
shift
ARM7TDMI:
pre-index
Almost same as ARM7

Compatible to ARM7 MOVB, BLpc
Fetch
Decode
Execute
execute
ALU
forwarding
paths
mux
instruction
fetch
ARM
decode
Thumb
decompress
reg
read
shift/ALU
reg
write
shift/ALU
data memor y
access
reg
write
Execute
Memory
SUBS pc
byte repl.
buffer/
data
D-cache
load/store
address
ARM9TDMI:
rot/sgn ex
LDR pc
register write
write-back
- 41 -
Sept 14, 2005
Processor Cores
Separate instruction
and data port
5 Stage pipeline
same as ARM9
First developed by
DEC, now Intel
Fetch
Decode
Write
- 42 -
Sept 14, 2005
+4
fetch
pc + 4
z StrongARM(contd)
branch
offset
Harvard
architecture
decode
Processor Cores
next
pc
I-cache
z StrongARM
r. read
instr uction
fetch
instruction
decode
r15
+ disp
branch
target
B, BL
Branch target adder in decode stage
I decode
pc + 8
+4
postindex
Applied B, BL and return from function call
immediate
elds
MOV pc
LDM/
STM
Reduce the taken branch penalty to 1 cycle
register read
CMP r0, #0
BNE label
reg
shift
shift
pre-index
execute
ALU & multiply
forwarding
paths
mux
fetch CMP
read r0
set CCs
(buf fer)
(write)
fetch BNE
+ disp
(execute)
(buf fer)
(write)
fetch ..
(decode)
(execute)
(buf fer)
fetch tgt
decode
execute
SUBS pc
rotate
*SA1110: v4
*XScale: v5TE
load/store
address
D-cache
buffer/
data
Penalty cycle
rot/sgn ex
LDR pc
register write
Sept 14, 2005
write-back
- 43 -
Sept 14, 2005
- 44 -
AMBA
Processor Cores
z StrongARM(contd)
z Advanced Microcontroller Bus Architecture
Multiply implementation
Memory port
Multiplier
Branch adder
ARM7
8 bit
ARM9
8 bit
StrongARM
12 bit
Standard of on-chip communication between

different macrocells for high performance embedded
system design
Hierarchical Bus architecture
Reduce the issue latency of MUL to 1 ~3 cycle

Compared to ARM7, ARM9 (1~4 cycle)
- 45 -
Sept 14, 2005
AMBA
AMBA
z AMBA buses
z AMBA buses(contd)
AHB(Advanced High Performance Bus)

Connect between high-performance system modules
ASB(Advanced System Bus)

Subset of AHB
APB(Advanced Peripheral Bus)

Simple interface for low-performance peripherals
Sept 14, 2005
- 46 -
Sept 14, 2005
- 47 -
Sept 14, 2005
AHB
ASB
APB
- burst transfers
- split transactions
- single-cycle bus
master handover
- single-clock edge
operation
- wider data bus
configurations
(64/128 bits)
- multiple bus
masters (up to 16)
- pipelined operation
- burst transfers
- pipelined operation
- multiple bus
- low power
- latched address and
control
- simple interface
- suitable for many
peripherals
masters
- 48 -
AMBA
AMBA
z AMBA AHB component
z AMBA AHB component (contd)
Master
Initiate read and write operations by providing an address and
control information. Only one bus master is allowed to actively
use the bus at any one time.
Slave
Responds to a read or write operation within a given addressspace range. The bus slave signals back to the active master the
success, failure or waiting of the data transfer.
Arbiter
Ensures that only one bus master at a time is allowed to initiate
data transfers. Can use the priority
Decoder
Decode the address of each transfer and provide a select signal for
the slave that is involved in the transfer.
- 49 -
Sept 14, 2005
AMBA
- 50 -
Sept 14, 2005
Operating System Support
z AMBA APB
z Memory System
APB bridge: only master in APB. Act as slave in

AHB
APB slave: peripherals
Simple protocol
Memory hierarchy
Cache system
Temporal locality
Spatial locality
z Processor core
Master in AHB
Connect through the memory interface of core
Sept 14, 2005
- 51 -
Sept 14, 2005
- 52 -
Cache system (contd)

z Single cache shared
between instruction and
data
Cache system (contd)
z Write strategy
Separate data and

instruction cache
Write- through
All write are passed to main memory immediately
If there is a hit, the cache is updated to hold new value
Processor slow down to main memory speed during write
Write- through with buffered write

Use a buffer to hold data to write back to main memory
Processor only slowed down to write buffer speed (which is fast)
Write buffer transfers data to main memory (slowly), processor
continues its tasks
Copy- back
Write operation updates the cache, but not main memory
Cache remember that it is different from main memory via a dirty
bit
It is copied back to main memory only when the cache line is used
by new data
- 53 -
Sept 14, 2005
- 54 -
Sept 14, 2005
z Memory System (contd)
z Protection Unit
CP15 system control coprocessor
Physical
Address
On-chip coprocessor which controls the on-chip cache,

memory management and other system configuration
signals
Mapped as a coprocessor of number 15
Protection unit
Register Purpose
0
ID Register
1
Configuration
2
Cache Control
3
Write Buffer Control
5
Access Permissions
6
Region Base and Size
7
Cache Operations
9
Cache Lock Down
15
Test
4, 8, UNUSED
10-14
0x0
Configure ...
1. Cacheable
2. Use Write buffer
3. Privileged access
4. Enable / Disable
5. Size and Base
Address
6. ......
Region 0
Reginn 1
Region 2
Embedded system with fixed and controlled application

Not need full virtual memory system
Stand-alone mp3 player
Region 3
0xf..f
Memory Management Unit

General purpose application where the range and number of
programs is unknown at design time
Need virtual memory system support
PDA
Sept 14, 2005
- 55 -
31
28 27
cond
24 23
1110
21 20 19
000 L
16 15
CRn
12 11
Rd
8 7
5 4 3
1 1 1 1 Cop2 1
CRm
load from coprocessor/store to coprocessor
Sept 14, 2005
- 56 -
z Protection Unit (contd)
z Memory Management Unit
ARM protection unit

31
Virtual memory system
12 11
address
cacheable,
bufferable,
permissions
region 7
region 6
region 5
region 4
region 3
priority
encoder
attribute
registers
region 2
region 1
region 0
- 57 -
Sept 14, 2005
Memory Management Unit (contd)
- 58 -
Sept 14, 2005
z ARM MMU
z Selection translation sequence
Translates virtual address to physical address

Controls memory access permission
Use 2 level page table with TLB
31
virtual
address
Page: fixed size of chunk of memory

TLB: cache of virtual to physical address mapping
CP15
register 2
20 19
table index
section index
31
14 13
31
memory
access
physical
address
page
table
virtual
address
page
table
data
1st level
page
table
virtual
address
1st level
page
table
2nd level
page
table
physical
address
14 13
translation table base address

31
data
20 19
section base address

31
2nd level
page
table
translation table base address
memory
access
2 1 0
table index
12 11 10 9 8
00000000
00
5 4 3 2 1 0
AP 0 domain ? C B 1 0
20 19
section base address
section index
31
data
Sept 14, 2005
- 59 -
Sept 14, 2005
- 60 -
z Small page translation sequence
z CP15 control registers
vi rtual
address
31
20 19
first level table index
12 11
pa ge table index
pa ge offset
Register
0
1
2
3
5
6
7
8
9
10
13
14
15
4, 11-12
CP15
register 2
31
14 13
tr anslation table base address

31
14 13
2 1
tr anslation table base address
ta ble index
31
10 9
pa ge table base address
5 4
0 do main
me mory
access
31
2 1
???
10 9
pa ge table base address
12 11 10 9
pa ge base address
8 7
5 4
01
2 1
pa ge table index
31
00
00
2 1
AP3 AP2 AP1 AP0 C B 1 0
me mory
access
31
12 11
pa ge base address
pa ge offset
31
Purpose
ID Register
Control
Translation Table Base
Domain Access Control
Fault Status
Fault Address
Cache Operations
TLB Operations
Read Buffer Operations
TLB lockdown
Process ID Mapping
Debug Support
Test & Clock Control
UNUSED
da ta
Memory
a mc ess
- 61 -
Sept 14, 2005
Stack & Subroutine System (contd)
z Stack and Subroutine System
z Stack(contd)
Idea of stack
Pop operation
The multiple load/ store instructions can be used to

implement last-in- first- out storage called a STACK .
A stack is a portion of main memory used to store data
temporarily
A PUSH operation which stores a number of registers onto
the stack memory.
Sept 14, 2005
- 62 -
Sept 14, 2005
- 63 -
Implemented
by LDM/STM
instruction
Sept 14, 2005
- 64 -

z Subroutine with saving the context
z Subroutine
Subroutines allow you to modularize your code so
that they are more reusable.
- 65 -
Sept 14, 2005

z ARM Software Development
ARM software development tookit
armcc, armasm, armlink
ARMulator
Cycle accurate simulator
MMU, coprocessor
Profiler
Boot-up code
On reset , processor starts at address 0x0
ARM procedure call standard

Can inter-work assembly routine with C/C++ program
Sept 14, 2005
- 67 -
Sept 14, 2005
- 66 -

ARM Architecture

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

ARM Architecture

Diunggah oleh

Hak Cipta:

Format Tersedia

Outline

Architecture version & variants

ARM7, ARM9, StrongARM

z Operating System Support

Sept 14, 2005

z Advances RISC Machines (now known as ARM) was

Sept 14, 2005

z Architecture version (contd)

z Architecture version (contd)

32 bit address bus

Improve ARM/THUMB inter-working

Add MRS, MSR. Modify exception handler

Add Data abort mode and undef mode

First fully formalized architecture

Sept 14, 2005

Sept 14, 2005

z Architecture Variants (contd)

Long Multiply Instruction (M variant)

THUMB instruction set: 16 bit re-encoded subset of 32 bit

32x32 = 64 bit. Provide full 64 bit result

Enhanced DSP instructions (E variant)

z Variants in Processor core

Small code size ( up to 40 % compression)

Sept 14, 2005

Sept 14, 2005

z Feature of ARM programmers model

z Enianism (configured by input signal)

32 bit RISC processor (32-bit data & address bus)

Sept 14, 2005

User mode (usr)

FIQ mode (fiq)

IRQ mode (irq)

Supervisor mode (svc)

Abort mode (abt)

Undefined mode (und)

Sept 14, 2005

z Operating mode (configured by software)

z Least significant byte is at lowest address

Best of RISC + Best of CISC

Sept 14, 2005

Sept 14, 2005

The N, Z, C and V are condition code flags

The I and F bits are the interrupt disable bits

R13: Stack point (in common)

Sept 14, 2005

Brake the normal execution of program

Type of exception (contd)

Handle the interrupts from peripherals

Undefined instruction trap

Signaled by the external ABORT input

Sept 14, 2005

Individual stack for each processor mode

Sept 14, 2005

Sept 14, 2005

(1) Reset (highest priority)

Instruction Set (contd)

All ARM instructions are conditionally executed

3 address instruction format

2 address instruction format

used in ARM and THUMB state

Sept 14, 2005

Instruction Set (contd)

0001 = NE - Z clear (not equal)

Sept 14, 2005