1
3. What Is ARM?
Advanced RISC Machine
First RISC microprocessor for commercial use
Market-leader for low-power and cost-sensitive embedded applications
Architectural simplicity
which allows
Very small implementations
which result in
Very low power consumption
2
3 ARM Architecture
Typical RISC architecture:
• Large uniform register file
• Load/store architecture
• Simple addressing modes
• Uniform and fixed-length instruction fields
Enhancements:
• Each instruction controls the ALU and shifter
• Auto-increment
and auto-decrement addressing modes
• Multiple Load/Store
• Conditional execution
3
3 Pipeline Organization
Increases speed –
most instructions executed in single cycle
Versions:
3-stage (ARM7TDMI and earlier)
5-stage (ARMS, ARM9TDMI)
6-stage (ARM10TDMI)
4
3 Pipeline Organization
3-stage pipeline: Fetch – Decode - Execute
Three-cycle latency,
one instruction per cycle throughput
i
n
s
t i Fetch Decode Execute
r
u Fetch Decode Execute
i+1
c
t
i i+2 Fetch Decode Execute
o cycle
n
t t+1 t+2 t+3 t+4 5
3 Pipeline Organization
6
3 Operating Modes
7
3 Exceptions
Priorit
Exception Mode IV Address
y
Reset Supervisor 1 0x00000000
8
3 ARM Registers
Current Program Status Register (CPSR)
Saved Program Status Register (SPSR)
On exception, entering mod mode:
(PC + 4) LR
CPSR SPSR_mod
PC IV address
R13, R14 replaced by R13_mod, R14_mod
In case of FIQ mode R7 – R12 also replaced
9
3 ARM Registers
System & User FIQ Supervisor Abort IRQ Undefined
R0 R0 R0 R0 R0 R0
R1 R1 R1 R1 R1 R1
R2 R2 R2 R2 R2 R2
R3 R3 R3 R3 R3 R3
R4 R4 R4 R4 R4 R4
R5 R5 R5 R5 R5 R5
R6 R6 R6 R6 R6 R6
R7 R7_fiq R7 R7 R7 R7
R8 R8_fiq R8 R8 R8 R8
R9 R9_fiq R9 R9 R9 R9
R10 R10_fiq R10 R10 R10 R10
R11 R11_fiq R11 R11 R11 R11
R12 R12_fiq R12 R12 R12 R12
R13 R13_fiq R13_svc R13_abt R13_irq R13_und
R14 R14_fiq R14_svc R14_abt R14_irq R14_und
R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)
10
3 Instruction Set
11
3 ARM Instruction Set
Conditional execution:
Each data processing instruction
prefixed by condition code
Result – smooth flow of instructions through pipeline
16 condition codes:
signed greater
EQ equal MI negative HI unsigned higher GT
than
12
3 ARM Instruction Set
Data processing
instructions
Data transfer
instructions
Block transfer
instructions
Branching instructions
Multiply instructions
Software interrupt
instructions
13
3 Data Processing Instructions
14
3 Data Processing Instructions
Arithmetic operations:
ADD, ADDC, SUB, SUBC, RSB, RSC
Bit-wise logical operations:
AND, EOR, ORR, BIC
Register movement operations:
MOV, MVN
Comparison operations:
TST, TEQ, CMP, CMN
15
3 Data Processing Instructions
e.g.:
if (z==1) R1=R2+(R3*4)
compiles to
( SINGLE INSTRUCTION ! )
16
3 Data Transfer Instructions
Load/store instructions
Used to move signed and unsigned
Word, Half Word and Byte to and from registers
Can be used to load PC
(if target address is beyond branch instruction range)
18
3 Multiply Instructions
Instructions:
19
3 Software Interrupt
SWI instruction
Forces CPU into supervisor mode
Usage: SWI #n
31 28 27 24 23 0
Cond Opcode Ordinal
20
3 Branching Instructions
Branch (B):
jumps forwards/backwards
up to 32 MB
Branch link (BL): same
+ saves (PC+4) in LR
Suitable for function call/return
Condition codes for conditional branches
21
3 Thumb Instruction Set
22
3 THUMB Instruction Set
More traditional:
No condition codes
Two-address data processing instructions
Access to R0 – R8 restricted to
MOV, ADD, CMP
PUSH/POP for stack manipulation
Descending stack (SP hardwired to R13)
No MSR and MRS, must change to ARM to
modify CPSR (change using BX or BLX)
ARM entered automatically after RESET or entering exception mode
Maximum 255 SWI calls
23
4. Creating Assembly listing
24
4. Efficient C Programming for ARM
•Data Types
•Loops
•Register allocation
•Function Calls
•Pointer Aliasing
•Structure Arrangement
•Endianness
•Inline Functions and assembly
•Portability Issues
25
4. Data Types
26
4. Rules for efficient use of Data-types
• For local variables which work from
registers, don’t use char or short because they
are stored in 32-bit space and extra
instructions are needed to contain the range of
char. Signed or unsigned int are
recommended
• For array and global data held in main
memory, use type with smallest size.
• Use Explicit cast when reading global
values into local variables
• Avoid char or short for function
arguments and returns.
27
4. Loops
• Use loops that count down to zero. This saves the
space used by a register to hold the comparison value.
Comparison with zero doesn’t add to overhead
• If it is known that minimum one loop will occour,
usage of do-while saves overhead by avoiding the zero-
comparison.
• Use unsigned loop counters and use i!=0 rather than
i>0, to avoid usage of cmp instruction abd reduce
overhead
• If loop over head is high, and repeat count is low it
helps if we Unroll the loop
28
4. Efficient Register Allocation
29
4. Calling Functions Efficiently
Try to restrict functions to four arguments. This will
make them more efficient to call. Use structures to group
related arguments and pass structure pointers instead of
multiple arguments.
Define small functions in the same source file and before
the functions that call them. The compiler can then
optimize the function call or inline the small function.
Critical functions can be inlined using the _inline
keyword.
30
4. Efficient Structure Arrangement
31
4. Bit-fields
32
4. Endianness and Alignment
Avoid using unaligned data if you can.
Use the type char* for data that can be at any byte
alignment. Access the data by reading bytes and
combining with logical operations. Then the code won't
depend on alignment or ARM endianness configuration.
For fast access to unaligned structures, write different
variants according to pointer alignment and processor
endianness.
33
4. Division
34
4. Inline Functions and Assembly
35
GNU For ARM Development
Arm-elf-gcc
Objcopy
Ld
make
36
5. Arm-elf-gcc
The arm compiler component of gcc.
Typical usage:
37
5. ld
38
5. Objcopy
39