Anda di halaman 1dari 17

Outline

z Introduction
z ARM architecture

ARM Architecture

Architecture version & variants


Programmers model
Instruction Set
ARM extension

z Coprocessor Interface
z Processor Cores

Sept 14 , 2005

ARM7, ARM9, StrongARM

z AMBA

Kyoung-Su Kim
E-mail: kimks@rayman.sejong.ac.kr
Real-Time Graphics Lab., Sejong Univ.

z Operating System Support


Memory System
Stack & Subroutine system
ARM Software development
-2-

Sept 14, 2005

Introduction

ARM architecture

z Advances RISC Machines (now known as ARM) was


established as a joint venture between Acorn, Apple
and VLSI between Acorn, Apple and VLSI in
November 1990
z ARM is the industry's leading provider of 16/32-bit
embedded RISC microprocessor solutions
z The company licenses its high-performance, low-cost,
power-efficient RISC processors, peripherals, and
system-chip designs to leading international
electronics companies
z ARM provides comprehensive support required in
developing a complete system
z 32 bit RISC processor of load/store architecture
Sept 14, 2005

z Architecture version
Version 1 (obsolete)
Basic data processing
Byte, word and multi-word load/store
Software interrupt
26 bit address bus

Version 2 (obsolete)
Multiply & Multiply-accumulate
Coprocessor support
Atomic instruction for thread synchronization
26 bit address bus

-3-

Sept 14, 2005

-4-

ARM architecture

ARM architecture

z Architecture version (contd)

z Architecture version (contd)

Version 3

Version 5

32 bit address bus


Add CPSR, SPSR

Improve ARM/THUMB inter-working


Add CLZ instruction for efficient integer divide
Add software breakpoint
Add more coprocessor support
More tight definition of arithmetic flags

Add MRS, MSR. Modify exception handler

Add Data abort mode and undef mode

Version 4
Half word transfer
Introduce THUMB processor state
Add Privileged mode for operating system
2 word distance of PC from current instruction
PC+8 behavior (at ARM state)

First fully formalized architecture

-5-

Sept 14, 2005

ARM architecture

-6-

Sept 14, 2005

ARM architecture

z Architecture Variants

z Architecture Variants (contd)

THUMB ( symbol as a T)

Long Multiply Instruction (M variant)

THUMB instruction set: 16 bit re-encoded subset of 32 bit


ARM instruction set

32x32 = 64 bit. Provide full 64 bit result

Enhanced DSP instructions (E variant)


Carefully chosen addition to native ARM instruction for
DSP application
Multiply with Q15 fixed integer. Saturation
64 bit transfer
First introduced in v5

z Variants in Processor core


D: On-chip debug. Halt in response
I: Embedded ICE. On-chip breakpoint

Small code size ( up to 40 % compression)


Simplified design

Sept 14, 2005

-7-

Sept 14, 2005

-8-

ARM architecture

Programmers Model

z Feature of ARM programmers model

z Enianism (configured by input signal)

32 bit RISC processor (32-bit data & address bus)


Big and Little Endian operating modes
Fast interrupt response (for real-time
applications)
Virtual Memory System Support
Excellent high-level language support
Simple but powerful instruction set

Big Endian
z Most significant byte is at lowest address
z Word is addressed by byte address of most significant byte
Higher Address31

24 23

11
7
3

16 15

10
6
2

Little Endian

24 23

8
4
0

16 15

9
5
1

Lower Address
-9-

Programmers Model

11
7
3

8
4
0
- 10 -

Sept 14, 2005

z Registers
37 registers
31 general 32 bit registers
6 status registers
16 general registers and one or two status registers are visible at
any time
The visible registers depend on the processor mode
The other registers (the banked registers) are switched in to
support IRQ, FIQ, Supervisor, Abort and Undefined mode
processing
R0 to R15 are directly accessible
R0 to R14 are general purpose
R15 holds the Program Counter (PC)
CPSR - Current Program Status Register contains condition code
flags and the current mode bits
5 SPSRs (Saved Program Status Registers) which are loaded with
CPSR when an exceptions occurs

User mode (usr)


the normal program execution state

FIQ mode (fiq)


designed to support a data transfer or channel process

IRQ mode (irq)


used for general purpose interrupt handling

Supervisor mode (svc)


a protected mode for the operating system

Abort mode (abt)


entered after a data or instruction prefetch abort

Undefined mode (und)


entered when an undefined instruction is executed

Sept 14, 2005

0 Word Address

87

10
6
2

Programmers Model

z Operating mode (configured by software)

8
4
0

z Least significant byte is at lowest address


z Word is addressed by byte address of least significant byte
Higher Address31

8
4
0

Lower Address

Best of RISC + Best of CISC

Sept 14, 2005

0 Word Address

87

9
5
1

- 11 -

Sept 14, 2005

- 12 -

Programmers Model

Programmers Model
z Processor Status Registers

z Registers (contd)
User32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15(PC)

Fiq32
R0
R1
R2
R3
R4
R5
R6
R7
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R12_fiq
R13_fiq
R14_fiq
R15(PC)

CPSR

CPSR
SPSR_fiq

Supervisor32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_svc
R14_svc
R15(PC)

CPSR
SPSR_svc

Abort32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_abt
R14_abt
R15(PC)

IRQ32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_irq
R14_irq
R15(PC)

Undefined32
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13_und
R14_und
R15(PC)

CPSR
SPSR_abt

CPSR
SPSR_irq

CPSR
SPSR_und

The N, Z, C and V are condition code flags


may be changed as a result of arithmetic and logical operations in the
processor
may be tested by all instructions to determine if the instruction is to be
executed
N : Negative. Z : Zero. C : Carry. V : oVerflow

The I and F bits are the interrupt disable bits


The M0, M1, M2, M3 and M4 bits are the mode bits

31

30

29

28

R13: Stack point (in common)


R14: Linked register
R15: Program counter

M4

M3

M2

M1

M0

Mode Bits
FIQ disable
IRQ disable

Negative/Less Than
- 13 -

Programmers Model

- 14 -

Sept 14, 2005

Programmers Model

z Exceptions

z Exceptions (contd)

Brake the normal execution of program

Type of exception (contd)

Handle the interrupts from peripherals


Guarantee the currently executed instruction in execution pipeline

Undefined instruction trap


When the ARM comes across an instruction which it cannot handle it offers
it to any coprocessors which may be present
If a coprocessor can perform this instruction but is busy at that time, ARM
will wait until the coprocessor is ready or until an interrupt occurs
If no coprocessor can handle the instruction then ARM will take the
undefined instruction trap

Type of exception
FIQ (Fast Interrupt reQuest)
Externally generated by taking the nFIQ input LOW
Fast handling for data or channel transfer

IRQ(Interrupt ReQuest)
Normal interrupt caused by a LOW level on the nIRQ input
ABORT

Exception Priorities

Signaled by the external ABORT input


Indicates that the current memory access cannot be completed

Software interrupt
Generated by the software interrupt instruction (SWI)
Getting into Supervisor mode
usually to request a particular supervisor function. OS support

Sept 14, 2005

Overflow
Carry/Borrow/Extend
Zero

Individual stack for each processor mode

Sept 14, 2005

27

- 15 -

Sept 14, 2005

(1) Reset (highest priority)


(2) Data abort
(3) FIQ
(4) IRQ
(5) Prefetch abort
(6) Undefined Instruction, Software interrupt (lowest priority)

- 16 -

ARM architecture

Instruction Set (contd)

z Instruction Set

z Conditional execution

Instruction Format

All ARM instructions are conditionally executed


The execution may or may not take place depending on the values of the N, Z,
C and V flags in the CPSR
All THUMB instructions are decompressed to Always conditional instruction
Condition Field in instruction

3 address instruction format


used in ARM state

31

2 address instruction format

27

Cond
0000 = EQ - Z set (equal)

used in ARM and THUMB state

- 17 -

Sept 14, 2005

Instruction Set (contd)

0001 = NE - Z clear (not equal)


0010 = CS - C set (unsigned higher or same)
0011 = CC - C clear (unsigned lower)
0100 = MI - N set (negative)
0101 = PL - N clear (positive or zero)
0110 = VS - V set (overflow)
0111 = VC - V clear (no overflow)
1000 = HI - C set and Z clear (unsigned higher)
1001 = LS - C clear or Z set (unsigned lower or same)
1010 = GE - N set and V set, or N clear and V clear (greater or equal)
1011 = LT - N set and V clear, or N clear and V set (less than)
1100 = GT - Z clear, and either N set and V set, or N clear and V clear (greater than)
1101 = LE - Z set, or N set and V clear, or N clear and V set (less than or equal)
1110 = AL - always
1111 = NV - never

- 18 -

Sept 14, 2005

Instruction Set (contd)

z Control instruction

z Data processing instruction

Branch and branch with link

31

Jump to desired instruction


Save the current PC for return (with L bit)

28 2726 25 24

cond

21 20 19

0 0 # opcode S

1615

Rn

12 11

operand 2

Rd

destination register
first operand register
set condition codes

31

28 27

cond

25 24 23

101

arithmetic/logic function

24-bit signed word offset

25

Branch and exchange

11

Jump to desired instruction with exchange of instruction


set

8 7

#rot

8-bit immediate

immediate alignment
11

7 6 5 4 3

#shift

Rm[0] == 1: Subsequent inst. are THUMB.


Rm[0] == 0: Subsequent inst. are ARM.

25

Sh 0

Rm

immediate shift length


shift type
second operand register
11

31

cond
Sept 14, 2005

2827

6 5 4 3

0001001011111111111100

L 1

8 7 6 5 4 3

Rs

Rm

0 Sh 1

Rm

register shift length

- 19 -

Sept 14, 2005

- 20 -

Instruction Set (contd)

Instruction Set (contd)


z Data processing instruction (contd)

z Data processing instruction (contd)


Op c o de
[2 4 :2 1 ]
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

Mn e mo n i c

Me an i n g

Ef f e c t

AND
EOR
SUB
RSB
ADD
ADC
SBC
RSC
TST
TEQ
CMP
CMN
ORR
MOV
BIC
MVN

Logical bit-wise AND


Logical bit-wise exclusive OR
Subtract
Reverse subtract
Add
Add with carry
Subtract with carry
Reverse subtract with carry
Test
Test equivalence
Compare
Compare negated
Logical bit-wise OR
Move
Bit clear
Move negated

Rd := Rn AND Op2
Rd := Rn EOR Op2
Rd := Rn - Op2
Rd := Op2 - Rn
Rd := Rn + Op2
Rd := Rn + Op2 + C
Rd := Rn - Op2 + C - 1
Rd := Op2 - Rn + C - 1
Scc on Rn AND Op2
Scc on Rn EOR Op2
Scc on Rn - Op2
Scc on Rn + Op2
Rd := Rn OR Op2
Rd := Op2
Rd := Rn AND NOT Op2
Rd := NOT Op2

Shift operation
In any data processing instructions, the second register
operand can have a shift operation applied to it.
Logical shift

LSL: Logical shift left by 0 to 31 places


Fill the vacated bits at the least significant end of the word
with zeros.
LSR: Logical shift right by 0 to 31 places
Fill the vacated bits at the most significant end of the word
with zeros.

- 21 -

Sept 14, 2005

Instruction Set (contd)

- 22 -

Sept 14, 2005

Instruction Set (contd)

z Data processing instruction (contd)

z Multiply Instruction

Shift operation (contd)

Product two 32 bit values in registers


May need more cycle and power in implementation

Arithmetic shift
ASR: = LSR
ASL: Arithmetic shift left

Convert another instruction if possible


Ex) b = a * 5 : b = a + a << 2

Sign extend the shifting bits


31

28 27

cond

Op c o de
[2 3 :2 1 ]
000
001
100
101
110
111

Rotation: ROR, RRX

Sept 14, 2005

- 23 -

Sept 14, 2005

24 23

0000

21 20 19

mul

16 15

12 11

S Rd/RdHi Rn/RdLo

8 7

Rs

4 3

1001

Rm

Mn e mo n i c

Me an i n g

Ef f e c t

MUL
MLA
UMULL
UMLAL
SMULL
SMLAL

Multiply (32-bit result)


Multiply-accumulate (32-bit result)
Unsigned multiply long
Unsigned multiply-accumulate long
Signed multiply long
Signed multiply-accumulate long

Rd := (Rm * Rs) [31:0]


Rd := (Rm * Rs + Rn) [31:0]
RdHi:RdLo := Rm * Rs
RdHi:RdLo += Rm * Rs
RdHi:RdLo := Rm * Rs
RdHi:RdLo += Rm * Rs

- 24 -

Instruction Set (contd)

Instruction Set (contd)


z Data transfer instruction (contd)

z Data transfer instruction

Multiple data transfer (LDM, STM)

Single data transfer (LDR, STR)

load (LDM) or store (STM) any subset of the currently visible


registers
Use

Single word(32bit), half word(16 bit) and byte(8 bit)


transfer
Addressing

Stack: maintaining full or empty stacks which can grow up or


down memory
Context switching: Save or restore the working registers
Block copy: moving large blocks of data around main memory

Register offset
Address = base register offset register
Immediate offset
Address = base register immediate constant
Post-indexing: modify address after use
Pre-indexing: modify address before use

Addressing
Pre/Post indexing
Auto increment or decrement
Write back the base register

Write back

Special bit

If enable, update the base register

PSR & force user bit

- 25 -

Sept 14, 2005

Instruction Set (contd)

- 26 -

Sept 14, 2005

Instruction Set (contd)

z Data transfer instruction (contd)

z PSR instruction (MRS, MSR)

Single data swap (SWAP)

The MRS and MSR instructions are formed from a subset of


the Data Processing operations
These instructions allow access to the CPSR and SPSR
registers:

Swap a byte or word quantity between a register and


external memory
Implemented as a memory read followed by a memory
write which are locked together

The MRS instruction allows the contents of the CPSR or SPSR_<mode>


to be moved to a general register
The MSR instruction allows the contents of a general register to be
moved to the CPSR or SPSR_<mode> register

Atomic instruction
Cant be interrupted during execution
External memory management unit is locked during
operation by LOCK signal output

Use
Synchronization in the multi-threading program (OS
support)
Lock
Semaphore

Sept 14, 2005

- 27 -

Sept 14, 2005

- 28 -

Instruction Set (contd)

Instruction Set (contd)


z Coprocessor instructions (contd)

z Coprocessor instructions

Coprocessor data transfers (LDC, STC)

Coprocessor
General mechanism to extend the instruction set through the
addition to the core
Example : system controller such as MMU & cache. FPU
Registers

Load (LDC) or store (STC) a subset of a coprocessorss registers directly


to memory
ARM is responsible for supplying the memory address, and the
coprocessor supplies or accepts the data and controls the number of
words transferred

private to coprocessor
ARM controls the data flow
Coprocessor concerns only the data processing and memory transfer
operations

Coprocessor register transfers (MRC, MCR)


Communicate information directly between ARM and a coprocessor

Coprocessor data operation (CDP)


This class of instruction is used to tell a coprocessor to perform
some internal operation
No result is communicated back to ARM, and it will not wait for
the operation to complete
31

28 27

24 23

20 19

16 15

12 11

8 7

5 4 3

z Software Interrupt Instruction (SWI)


Used to enter Supervisor mode in a controlled manner
The instruction causes the software interrupt trap to be taken, which effects
the mode change

0
31

cond

1110

Cop1

CRn

CRd

CP#

Cop2 0

CRm
- 29 -

Sept 14, 2005

ARM architecture

ARM version 6
Improved memory management
Multiprocessing
Add new synchronization instruction (LDREX, STREX)

Improved exception handling

Jazelle logic turned ON


Java Decode Java Decode
Register
Read

New bit in PSR

ALU
Control
Signals

Mixed endian support


Media extension

Thumb Decode

Compute
Partial Products

Sum/Accumulate
& Saturation

Register Register
Decode Read

Shift + ALU

Memory Access

ARM SIMD
(16bit 2 way and 8 bit 4 way)
FFT, MPEG4
Saturation, Selection

Register
Write

ARM Decode
Register Register
Decode Read

FETCH
Sept 14, 2005

- 30 -

z ARM extension (contd)

Instruction extension (Java state), not a coprocessor


Implemented in the ARM Pipeline as a FSM
Dynamic rere-mapping of Stack to Registers

Instruction
Fetch

24-bit (interpreted) immediate

Sept 14, 2005

Jazelle : ARMs Java extension (symbol as a J)

Stack
Management

24 23

1111

ARM architecture

z ARM extension

Bytecode
Instruction
Stream

28 27

cond

DECODE

EXECUTE

MEMORY WRITEBACK
- 31 -

Sept 14, 2005

- 32 -

Coprocessor Interface

Coprocessor Interface (contd)

z Implementation dependent

z Busy-waiting

For ARM7 (von neuman architecture)

If CPA goes LOW, ARM watch the CPB (coprocessor busy) line
ARM will busy-wait while CPB is HIGH, unless an enabled
interrupt occurs
When CPB goes LOW, the instruction continues to completion

M em ory

ARM 7

nCPI
CPA
CPB

z Pipeline following
z Data transfer cycles

C op rocesso r

Coprocessor must supply or accept data at ARM7 bus rate


ARM7 will continue to increment the address until CPA, CPB high

z Coprocessor present / absent (CPA)

nCPI LOW to execute a coprocessor instruction


Each coprocessor copies the instruction
Each coprocessor inspect the CP# field to see which coprocessor it is for
Every coprocessor in a system must have a unique number
If that number matches the contents of the CP# field the coprocessor should
drive the CPA (coprocessor absent) line LOW
If no coprocessor has a number which matches the CP# field, CPA and CPB
will remain HIGH, and ARM7 will take the undefined instruction trap

z Privileged Instructions
z Idempotency
Any action taken by the coprocessor before it goes not-busy must
be idempotent, ie must be repeatable with identical results after
interrupt

- 33 -

Sept 14, 2005

Processor Cores

Processor Cores
A[31:0]

z ARM7
Two main blocks: datapath and
decoder
Register bank (r0 to r15)
Two read ports to A- bus/ Bbus
One write port from ALU- bus
Additional read/ write ports for
program counter r15
Barrel shifter / ALU
Address registers/ incrementer
Single Memory Port
holds either PC address (with
increment) or operand address

control

z ARM7 (contd)

address register
P
C

Pipeline: 3 Stage pipeline

incrementer

Fetch : fetch instruction code from memory into the


instruction pipeline
Decode : instruction decoded to obtain control signals for
the datapath ready for the next stage
Execute : instruction owns the datapath - register read;
shifting; ALU results generated and write- back

PC
register
bank
instruction
decode
A
L
U
b
u
s

multiply
register
A

&
B

b
u
s

control

b
u
s

barrel
shifter

fetch

PC
ALU

decode

PC+4

3
in struction

data in register
D[31:0]

- 35 -

execute

R15
fetch

PC+4
data out register

Sept 14, 2005

- 34 -

Sept 14, 2005

decode

execute

fetch

decode

PC+8
execute
time

Sept 14, 2005

- 36 -

Processor Cores

Processor Cores

z ARM7(contd)

z ARM7(contd)
2 Phase Non-overlapping clocking scheme

Multi-cycle operation

phase 1

Single cycle throughput for almost simple data processing


instruction
Multi-cycle for mul, load/store

phase 2
1 clock cycle

Datapath timing

ALU operands
latched

fetch ADD decode

fetch STR

execute

decode

calc. addr. data xfer

fetch ADD

decode

fetch ADD

execute

decode

ph ase 1

ADD
STR
ADD
ADD
ADD

execute

ph ase 2

register
read
time

read bus valid

shift time

shift out valid

precharge
invalidates
buses

register
write time

ALU t ime

fetch ADD decode

execute

instruction
time

ALU o ut

- 37 -

Sept 14, 2005

Processor Cores

Processor Cores

z ARM7(contd)

z ARM7(contd)

Memory Interface

De-pipelined addressing

- 38 -

Sept 14, 2005

Cycle type

Pipelined addressing

mREQ must be valid before actual reference cycle

Sept 14, 2005

- 39 -

Sept 14, 2005

- 40 -

Processor Cores

Processor Cores
next
pc

+4
I-cache

z ARM9

fetch

z ARM9(contd)

pc + 4

Separate memory port


for high CPI

pc + 8

5 Stage Pipeline
Multi-cycle operation: MUL, multiple load/store
Data forwarding

I decode
r15

instruction
decode

register read

Instruction
Data

immediate
fields

mul
LDM/
STM

Datapath

+4

postindex

reg
shift

shift

ARM7TDMI:

pre-index

Almost same as ARM7


Compatible to ARM7 MOVB, BLpc

Fetch

Decode

Execute

execute

ALU

forwarding
paths

mux

instruction
fetch

ARM
decode

Thumb
decompress

reg
read

shift/ALU

reg
write

shift/ALU

data memor y
access

reg
write

Execute

Memory

SUBS pc

byte repl.
buffer/
data

D-cache

load/store
address

ARM9TDMI:

rot/sgn ex
LDR pc

register write

write-back

- 41 -

Sept 14, 2005

Processor Cores

Separate instruction
and data port

5 Stage pipeline
same as ARM9
First developed by
DEC, now Intel

Fetch

Decode

Write
- 42 -

Sept 14, 2005

+4
fetch

pc + 4

z StrongARM(contd)

branch
offset

Harvard
architecture

decode

Processor Cores

next
pc

I-cache

z StrongARM

r. read

instr uction
fetch

instruction
decode

r15

+ disp
branch
target

B, BL

Branch target adder in decode stage

I decode

pc + 8

+4

postindex

Applied B, BL and return from function call

immediate
elds

MOV pc
LDM/
STM

Reduce the taken branch penalty to 1 cycle

register read

CMP r0, #0
BNE label

reg
shift

shift

pre-index

execute

ALU & multiply

forwarding
paths

mux

fetch CMP

read r0

set CCs

(buf fer)

(write)

fetch BNE

+ disp

(execute)

(buf fer)

(write)

fetch ..

(decode)

(execute)

(buf fer)

fetch tgt

decode

execute

SUBS pc

rotate

*SA1110: v4
*XScale: v5TE

load/store
address

D-cache

buffer/
data

Penalty cycle

rot/sgn ex
LDR pc

register write

Sept 14, 2005

write-back

- 43 -

Sept 14, 2005

- 44 -

AMBA

Processor Cores
z StrongARM(contd)

z Advanced Microcontroller Bus Architecture

Multiply implementation
Memory port

Multiplier

Branch adder

ARM7

8 bit

ARM9

8 bit

StrongARM

12 bit

Standard of on-chip communication between


different macrocells for high performance embedded
system design
Hierarchical Bus architecture

Reduce the issue latency of MUL to 1 ~3 cycle


Compared to ARM7, ARM9 (1~4 cycle)

- 45 -

Sept 14, 2005

AMBA

AMBA

z AMBA buses

z AMBA buses(contd)

AHB(Advanced High Performance Bus)


Connect between high-performance system modules

ASB(Advanced System Bus)


Subset of AHB

APB(Advanced Peripheral Bus)


Simple interface for low-performance peripherals

Sept 14, 2005

- 46 -

Sept 14, 2005

- 47 -

Sept 14, 2005

AHB

ASB

APB

- burst transfers
- split transactions
- single-cycle bus
master handover
- single-clock edge
operation
- wider data bus
configurations
(64/128 bits)
- multiple bus
masters (up to 16)
- pipelined operation

- burst transfers
- pipelined operation
- multiple bus

- low power
- latched address and
control
- simple interface
- suitable for many
peripherals

masters

- 48 -

AMBA

AMBA

z AMBA AHB component

z AMBA AHB component (contd)

Master
Initiate read and write operations by providing an address and
control information. Only one bus master is allowed to actively
use the bus at any one time.

Slave
Responds to a read or write operation within a given addressspace range. The bus slave signals back to the active master the
success, failure or waiting of the data transfer.

Arbiter
Ensures that only one bus master at a time is allowed to initiate
data transfers. Can use the priority

Decoder
Decode the address of each transfer and provide a select signal for
the slave that is involved in the transfer.
- 49 -

Sept 14, 2005

AMBA

- 50 -

Sept 14, 2005

Operating System Support

z AMBA APB

z Memory System

APB bridge: only master in APB. Act as slave in


AHB
APB slave: peripherals
Simple protocol

Memory hierarchy
Cache system
Temporal locality
Spatial locality

z Processor core
Master in AHB
Connect through the memory interface of core
Sept 14, 2005

- 51 -

Sept 14, 2005

- 52 -

Cache system (contd)


z Single cache shared
between instruction and
data

Cache system (contd)

z Write strategy

Separate data and


instruction cache

Write- through
All write are passed to main memory immediately
If there is a hit, the cache is updated to hold new value
Processor slow down to main memory speed during write

Write- through with buffered write


Use a buffer to hold data to write back to main memory
Processor only slowed down to write buffer speed (which is fast)
Write buffer transfers data to main memory (slowly), processor
continues its tasks

Copy- back
Write operation updates the cache, but not main memory
Cache remember that it is different from main memory via a dirty
bit
It is copied back to main memory only when the cache line is used
by new data
- 53 -

Sept 14, 2005

Operating System Support

- 54 -

Sept 14, 2005

Operating System Support

z Memory System (contd)

z Protection Unit

CP15 system control coprocessor

Physical
Address

On-chip coprocessor which controls the on-chip cache,


memory management and other system configuration
signals
Mapped as a coprocessor of number 15
Protection unit

Register Purpose
0
ID Register
1
Configuration
2
Cache Control
3
Write Buffer Control
5
Access Permissions
6
Region Base and Size
7
Cache Operations
9
Cache Lock Down
15
Test
4, 8, UNUSED
10-14

0x0
Configure ...
1. Cacheable
2. Use Write buffer
3. Privileged access
4. Enable / Disable
5. Size and Base
Address
6. ......

Region 0
Reginn 1
Region 2

Embedded system with fixed and controlled application


Not need full virtual memory system
Stand-alone mp3 player

Region 3
0xf..f

Memory Management Unit


General purpose application where the range and number of
programs is unknown at design time
Need virtual memory system support
PDA
Sept 14, 2005

- 55 -

31

28 27

cond

24 23

1110

21 20 19

000 L

16 15

CRn

12 11

Rd

8 7

5 4 3

1 1 1 1 Cop2 1

CRm

load from coprocessor/store to coprocessor

Sept 14, 2005

- 56 -

Operating System Support

Operating System Support

z Protection Unit (contd)

z Memory Management Unit

ARM protection unit


31

Virtual memory system

12 11

address

cacheable,
bufferable,
permissions

region 7
region 6
region 5
region 4
region 3

priority
encoder

attribute
registers

region 2
region 1
region 0

- 57 -

Sept 14, 2005

Memory Management Unit (contd)

- 58 -

Sept 14, 2005

Memory Management Unit (contd)

z ARM MMU

z Selection translation sequence

Translates virtual address to physical address


Controls memory access permission
Use 2 level page table with TLB

31

virtual
address

Page: fixed size of chunk of memory


TLB: cache of virtual to physical address mapping

CP15
register 2

20 19

table index

section index

31

14 13

31

memory
access
physical
address
page
table
virtual
address
page
table

data

1st level
page
table
virtual
address

1st level
page
table

2nd level
page
table

physical
address

14 13

translation table base address


31

data

20 19

section base address


31

2nd level
page
table

translation table base address

memory
access

2 1 0

table index
12 11 10 9 8

00000000

00
5 4 3 2 1 0

AP 0 domain ? C B 1 0

20 19

section base address

section index

31

data

Sept 14, 2005

- 59 -

Sept 14, 2005

- 60 -

Memory Management Unit (contd)

Memory Management Unit (contd)

z Small page translation sequence

z CP15 control registers

vi rtual
address
31

20 19

first level table index

12 11

pa ge table index

pa ge offset

Register
0
1
2
3
5
6
7
8
9
10
13
14
15
4, 11-12

CP15
register 2
31

14 13

tr anslation table base address


31

14 13

2 1

tr anslation table base address

ta ble index

31

10 9

pa ge table base address

5 4

0 do main

me mory
access

31

2 1

???

10 9

pa ge table base address

12 11 10 9

pa ge base address

8 7

5 4

01
2 1

pa ge table index

31

00

00
2 1

AP3 AP2 AP1 AP0 C B 1 0

me mory
access

31

12 11

pa ge base address

pa ge offset

31

Purpose
ID Register
Control
Translation Table Base
Domain Access Control
Fault Status
Fault Address
Cache Operations
TLB Operations
Read Buffer Operations
TLB lockdown
Process ID Mapping
Debug Support
Test & Clock Control
UNUSED

da ta
Memory
a mc ess

- 61 -

Sept 14, 2005

Operating System Support

Stack & Subroutine System (contd)

z Stack and Subroutine System

z Stack(contd)

Idea of stack

Pop operation

The multiple load/ store instructions can be used to


implement last-in- first- out storage called a STACK .
A stack is a portion of main memory used to store data
temporarily
A PUSH operation which stores a number of registers onto
the stack memory.

Sept 14, 2005

- 62 -

Sept 14, 2005

- 63 -

Implemented
by LDM/STM
instruction

Sept 14, 2005

- 64 -

Stack & Subroutine System (contd)

Stack & Subroutine System (contd)


z Subroutine with saving the context

z Subroutine
Subroutines allow you to modularize your code so
that they are more reusable.

- 65 -

Sept 14, 2005

Operating System Support


z ARM Software Development
ARM software development tookit
armcc, armasm, armlink

ARMulator
Cycle accurate simulator
MMU, coprocessor
Profiler

Boot-up code
On reset , processor starts at address 0x0

ARM procedure call standard


Can inter-work assembly routine with C/C++ program

Sept 14, 2005

- 67 -

Sept 14, 2005

- 66 -

Anda mungkin juga menyukai