Anda di halaman 1dari 39

ARM

1
3. What Is ARM?
 Advanced RISC Machine
 First RISC microprocessor for commercial use
 Market-leader for low-power and cost-sensitive embedded applications

 Architectural simplicity
which allows
 Very small implementations
which result in
 Very low power consumption

2
3 ARM Architecture
 Typical RISC architecture:
• Large uniform register file
• Load/store architecture
• Simple addressing modes
• Uniform and fixed-length instruction fields
 Enhancements:
• Each instruction controls the ALU and shifter
• Auto-increment
and auto-decrement addressing modes
• Multiple Load/Store
• Conditional execution

3
3 Pipeline Organization

 Increases speed –
most instructions executed in single cycle
 Versions:
 3-stage (ARM7TDMI and earlier)
 5-stage (ARMS, ARM9TDMI)
 6-stage (ARM10TDMI)

4
3 Pipeline Organization
 3-stage pipeline: Fetch – Decode - Execute
 Three-cycle latency,
one instruction per cycle throughput

i
n
s
t i Fetch Decode Execute
r
u Fetch Decode Execute
i+1
c
t
i i+2 Fetch Decode Execute
o cycle
n
t t+1 t+2 t+3 t+4 5
3 Pipeline Organization

 Pipeline flushed and refilled on branch,


causing execution to slow down
 Special features in instruction set
eliminate small jumps in code
to obtain the best flow through pipeline

6
3 Operating Modes

 Seven operating modes:


 User
 Privileged:
• System (version 4 and above)
• FIQ
• IRQ
• Abort
exception modes
• Undefined
• Supervisor

7
3 Exceptions

Priorit
Exception Mode IV Address
y
Reset Supervisor 1 0x00000000

Undefined instruction Undefined 6 0x00000004

Software interrupt Supervisor 6 0x00000008

Prefetch Abort Abort 5 0x0000000C

Data Abort Abort 2 0x00000010

Interrupt IRQ 4 0x00000018

Fast interrupt FIQ 3 0x0000001C


Table 1 - Exception types, sorted by Interrupt Vector addresses

8
3 ARM Registers
 Current Program Status Register (CPSR)
 Saved Program Status Register (SPSR)
 On exception, entering mod mode:
 (PC + 4) LR
 CPSR SPSR_mod
 PC  IV address
 R13, R14 replaced by R13_mod, R14_mod
 In case of FIQ mode R7 – R12 also replaced

9
3 ARM Registers
System & User FIQ Supervisor Abort IRQ Undefined
R0 R0 R0 R0 R0 R0
R1 R1 R1 R1 R1 R1
R2 R2 R2 R2 R2 R2
R3 R3 R3 R3 R3 R3
R4 R4 R4 R4 R4 R4
R5 R5 R5 R5 R5 R5
R6 R6 R6 R6 R6 R6
R7 R7_fiq R7 R7 R7 R7
R8 R8_fiq R8 R8 R8 R8
R9 R9_fiq R9 R9 R9 R9
R10 R10_fiq R10 R10 R10 R10
R11 R11_fiq R11 R11 R11 R11
R12 R12_fiq R12 R12 R12 R12
R13 R13_fiq R13_svc R13_abt R13_irq R13_und
R14 R14_fiq R14_svc R14_abt R14_irq R14_und
R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)

CPSR CPSR CPSR CPSR CPSR CPSR


SPSR_fiq SPSR_svc SPSR_abt SPSR_irq SPSR_und

10
3 Instruction Set

 Two instruction sets:


 ARM
• Standard 32-bit instruction set
 THUMB
• 16-bit compressed form
• Code density better than most CISC
• Dynamic decompression in pipeline

11
3 ARM Instruction Set

 Conditional execution:
 Each data processing instruction
prefixed by condition code
 Result – smooth flow of instructions through pipeline
 16 condition codes:

signed greater
EQ equal MI negative HI unsigned higher GT
than

unsigned lower signed less than


NE not equal PL positive or zero LS LE
or same or equal

unsigned higher signed greater


CS VS overflow GE AL always
or same than or equal

CC unsigned lower VC no overflow LT signed less than NV special purpose

12
3 ARM Instruction Set

ARM instruction set

Data processing
instructions
Data transfer
instructions
Block transfer
instructions
Branching instructions

Multiply instructions
Software interrupt
instructions

13
3 Data Processing Instructions

 Arithmetic and logical operations


 3-address format:
 Two 32-bit operands
(op1 is register, op2 is register or immediate)
 32-bit result placed in a register
 Barrel shifter for op2 allows full 32-bit shift
within instruction cycle

14
3 Data Processing Instructions
 Arithmetic operations:
 ADD, ADDC, SUB, SUBC, RSB, RSC
 Bit-wise logical operations:
 AND, EOR, ORR, BIC
 Register movement operations:
 MOV, MVN
 Comparison operations:
 TST, TEQ, CMP, CMN

15
3 Data Processing Instructions

e.g.:

if (z==1) R1=R2+(R3*4)

compiles to

EQADDS R1,R2,R3, LSL #2

( SINGLE INSTRUCTION ! )

16
3 Data Transfer Instructions

 Load/store instructions
 Used to move signed and unsigned
Word, Half Word and Byte to and from registers
 Can be used to load PC
(if target address is beyond branch instruction range)

LDR Load Word STR Store Word


LDRH Load Half Word STRH Store Half Word
LDRSH Load Signed Half Word STRSH Store Signed Half Word
LDRB Load Byte STRB Store Byte
LDRSB Load Signed Byte STRSB Store Signed Byte
17
3 Multiply Instructions
 Integer multiplication (32-bit result)
 Long integer multiplication (64-bit result)
 Built in Multiply Accumulate Unit (MAC)
 Multiply and accumulate instructions add product to
running total

18
3 Multiply Instructions
 Instructions:

MUL Multiply 32-bit result

MULA Multiply accumulate 32-bit result

UMULL Unsigned multiply 64-bit result

UMLAL Unsigned multiply accumulate 64-bit result

SMULL Signed multiply 64-bit result

SMLAL Signed multiply accumulate 64-bit result

19
3 Software Interrupt

 SWI instruction
 Forces CPU into supervisor mode
 Usage: SWI #n

31 28 27 24 23 0
Cond Opcode Ordinal

 Maximum 224 calls



Suitable for running privileged code and
making OS calls

20
3 Branching Instructions

 Branch (B):
jumps forwards/backwards
up to 32 MB
 Branch link (BL): same
+ saves (PC+4) in LR
 Suitable for function call/return
 Condition codes for conditional branches

21
3 Thumb Instruction Set

 Compressed form of ARM


 Instructions stored as 16-bit,
 Decompressed into ARM instructions and
 Executed
 Lower performance (ARM 40% faster)
 Higher density (THUMB saves 30% space)
 Optimal – “interworking”
(combining two sets) – compiler
supported

22
3 THUMB Instruction Set
 More traditional:
 No condition codes
 Two-address data processing instructions
 Access to R0 – R8 restricted to
 MOV, ADD, CMP
 PUSH/POP for stack manipulation
 Descending stack (SP hardwired to R13)
 No MSR and MRS, must change to ARM to
modify CPSR (change using BX or BLX)
 ARM entered automatically after RESET or entering exception mode
 Maximum 255 SWI calls

23
4. Creating Assembly listing

 The following lines of code produce assembly


listing
#arm-elf-gcc -02 -fomit-frame-pointer -c -o test.o
test.c
#arm-elf-objdump -d test.o > test.txt

24
4. Efficient C Programming for ARM

•Data Types
•Loops
•Register allocation
•Function Calls
•Pointer Aliasing
•Structure Arrangement
•Endianness
•Inline Functions and assembly
•Portability Issues

25
4. Data Types

Data Type Implementation

Char Unsigned 8-bit


Short Signed 16-bit
Int Signed 32-bit
Long Signed 32-bit
Long long signed 64-bit

26
4. Rules for efficient use of Data-types
• For local variables which work from
registers, don’t use char or short because they
are stored in 32-bit space and extra
instructions are needed to contain the range of
char. Signed or unsigned int are
recommended
• For array and global data held in main
memory, use type with smallest size.
• Use Explicit cast when reading global
values into local variables
• Avoid char or short for function
arguments and returns.
27
4. Loops
• Use loops that count down to zero. This saves the
space used by a register to hold the comparison value.
Comparison with zero doesn’t add to overhead
• If it is known that minimum one loop will occour,
usage of do-while saves overhead by avoiding the zero-
comparison.
• Use unsigned loop counters and use i!=0 rather than
i>0, to avoid usage of cmp instruction abd reduce
overhead
• If loop over head is high, and repeat count is low it
helps if we Unroll the loop
28
4. Efficient Register Allocation

 Try to limit the number of local variables in


the internal loop of functions to 12. The
compiler should be able to allocate these to
ARM registers.
 You can guide the compiler as to which
variables are important by ensuring these
variables are used within the innermost loop.

29
4. Calling Functions Efficiently
 Try to restrict functions to four arguments. This will
make them more efficient to call. Use structures to group
related arguments and pass structure pointers instead of
multiple arguments.
 Define small functions in the same source file and before
the functions that call them. The compiler can then
optimize the function call or inline the small function.
 Critical functions can be inlined using the _inline
keyword.

30
4. Efficient Structure Arrangement

 Lay structures out in order of increasing element size.


Start the structure with the smallest elements and finish
with the largest.
 Avoid very large structures. Instead use a hierarchy of
smaller structures.
 For portability, manually add padding (that would
appear implicitly) into API structures so that the layout
of the structure does not depend on the compiler.

31
4. Bit-fields

 Avoid using bit-fields. Instead use #define or


enum to define mask values.
 Test, toggle, and set bit-fields using integer
logical AND, OR, and exclusive OR oper-ations
with the mask values. These operations compile
efficiently, and you can test, toggle, or set
multiple fields at the same time.

32
4. Endianness and Alignment
 Avoid using unaligned data if you can.
 Use the type char* for data that can be at any byte
alignment. Access the data by reading bytes and
combining with logical operations. Then the code won't
depend on alignment or ARM endianness configuration.
 For fast access to unaligned structures, write different
variants according to pointer alignment and processor
endianness.

33
4. Division

 Avoid divisions as much as possible. Do not use


them for circular buffer handling.
 If you can't avoid a division, then try to take
advantage of the fact that divide routines often
generate the quotient n/d and modulus n%d
together.

34
4. Inline Functions and Assembly

 Use inline functions to declare new operations


or primitives not supported by the C compiler.
 Use inline assembly to access ARM
instructions not supported by the C compiler.
Examples are coprocessor instructions or
ARMv5E extensions.

35
GNU For ARM Development
Arm-elf-gcc
Objcopy
Ld
make

36
5. Arm-elf-gcc
 The arm compiler component of gcc.
 Typical usage:

 arm-elf-gcc -mcpu=arm7tdmi -mthumb -O2 -g


-c hello_world.c
 arm-elf-gcc -mcpu=arm7tdmi -mthumb -o
hello_world hello_world.o -lc
 -mcpu specifies the processor type, -m to
select thumb or arm mode.

37
5. ld

 It’s the linker used to link arm applications.


 Typical usage:
− ld -g -Tlpc2104-rom.ld -Map=main.map --cref -o
helloworld helloworld.o boot.o crt0.o -lc –lm.
 -T for linker script, -Map produces map file, -l for
libraries.

38
5. Objcopy

 Converts .obj to downloadable .hex format


 Typical usage:
Objcopy --output-target ihex main main.hex

39

Anda mungkin juga menyukai