14.1. Introduction
The MOS Technology 6502 was used in many personal computers121 including the BBC
Computer from ACORN. In the early 1980s ACORN were looking to upgrade their
product and designed a RISC device known as the ACORN RISC Machine or simply
ARM (version V1). Apple Computers, ACORN and VLSI combined to further develop
the ARM and it is now known as the Advanced RISC Machine. The ARM business was
converted to sell processor cores the first product being the ARM6.
The ARM CPU is contained in a range of devices from a range of manufacturers. This has
lead to a diverse family line as illustrated in figure 151. Currently the CPU core is known
as Cortex. There are 3 different Cortex cores available: the A range which is enhanced for
large applications, the R range enhanced for real time operations and the M range for
microcontrollers. Within the M series there is the mid range M3, the larger M4 with DSP
(Digital Signal Processing) capability and the smaller M0 which is designed to compete
with 8 and 16 bit microcontrollers122.
ARM1
ARM7
ARM9
ARM11
A5
Cortex A8
A9
A15
Applications
R4
Cortex R5
R7
Realtime
M4
Cortex M3
M0
Microcontroller
121
Personal computer here is used in the general sense and applies to the era prior to a personal computer
implying an IBM compatible PC.
122
Not shown in the figure is the M1 variant which is designed to be implemented into FPGA. (Field
programmable gate arrays)
NVIC Interface
ETM Interface
Hardware Divider
32 Bit ALU
Single cycle 32 bit multiplier
Control Logic
Thumb/Thumb2 Decode
Instruction Interface
Data Interface
Configurable
NVIC
DAP
ARM Core
ETM External
Trace Macro
Figure 152.
Serial Wire
Viewer
Data Watch
Points
Flash Patch
Bus Matrix
Code Interface
123
124
This list would also include the stack pointer, program counters and the status registers.
The word general is used here as it is possible to move small numbers and patterns. For example there is
code for mov #0x33333333 where in this case each nibble of the operand is the same. However, mov
#0x12345678 is not a valid instruction.
125
R0
R13 (PSP)
LR
PC
R15
XPSR
Interrupt Mask
Registers
Special
Registers
Control
MOV32
A pseudo instruction or directive. The assembler will generate two
instructions MOV and MOVT to perform the operation. Eg. MOV32
R0,#0x12345678 becomes MOV R0,0x5678 and MOVT R0,0x1234126
MOVS
Adding an S will change the status bits as appropriate.
Eg MOVS R0,R1 will set the Z flag if the contents of R1 are zero.
The Load and Store Instructions.
The load LDR and store STR instruction have two forms
LDR Rt,[Re+offset] / STR Rt,[Re+offset]
In this form the register Re plus the offset defines the effective memory address and Rt
the target register. Often the program counter is used as Re. Consider the following code
0x080001234 LDR R0,[pc+0x120]
0x080001238 Next instruction
Other instructions
End of function/procedure
0x080001358 DCD 0x12345678
0x08000135C more data
Figure 156. Sample code fragment using the LDR instruction.
1.
The LDR instruction will be decoded as LDR R0,[pc+0x120]
2.
The offset 0x120 will be added to the current program counter to give an effective
address of 0x08001238+0x0120 = 0x08001358.
3.
The contents of memory location 0x08001358 are loaded into register R0. R0
becomes 0x12345678.
LDR Rd,= Address / STR Rd,=Address
The second form of the load and store instructions is actually a pseudo instruction to the
assembler. Since the 32 bit operands are not permitted the assembler generates two
program lines:
The address or operand is placed in memory using the define constant double
(DCD) constant pseudo.
At run time this Address is loaded into the register using the indirect addressing
[pc127+offset] where the assembler has calculated the offset.
In summary the LDR Rd,=Address will generate the code shown bold in figure 156.
Figure 157 expands the example for the situation where the contents of a memory address
need to be loaded into a register. The address is first loaded into one register as per figure
156 and then this register is used as the pointer or index to the target memory.
R0 =
R1 =
Program Counter
when offset
calculated.
20000800
DCD 0x12345678
Note the order is important. The MOV instruction will clear the upper 2 bytes whereas the MOVT
instruction will not modify the lower 2 bytes.
127
PC equates to R15.
IO_register = 0x01234567;
For a CISC this would be implemented with an assembly language instruction of the
form:
MOV IO_register,#0x01234567
This instruction would occupy 3 double words in memory and would take 4 clock cycles
to execute. See figure 158.
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Address of IO_Register
Value to be stored (0x01234567)
R1,[R0 + offset]
128
This explanation assumes the original ARM instructions. With the Thumb2 instruction set there will be a
reduction in the size of the instructions.
where offset is the difference between the new address and the value in R0. This will add
some complexity to the compiler.
Arithmetic and Logic Operations
Mathematical and logic operations using the ARM are of the following form:
ADD Rd,Rs,operand2
where Rd is the destination register
Rs is one of the source registers or operand1
operand2 is the second operand and could be a register or an
immediate number as summarised with the mov instruction.
Note appending an S to the instruction will cause the flags to be modified as appropriate.
Link and stack operations.
All programs use functions or subroutines. Since the same block of code may be used
many times subroutines reduce the memory requirements. However there is a
performance penalty in pushing and popping return addresses on and off the stack129.
The ARM addresses the performance challenge by using an intermediate register known
as the link register.R14. On a jump to subroutine or function call a register to register
transfer occurs, the program counter R15 being transferred to the LR. The operation is
reversed for the return.
Stack = LR = ret add1
PC = address fn1.
LR = ret address2
void fn1()
{
}
BX LR
void fn
{
PUSH {LR}
fn1(); BL.W fn1
PC = address fn()
LR = ret address1
}
POP {PC}
void main( )
{
fn(); BL.W fn
}
LR PC
Stack PC
Figure 160. Saving the return address in the ARM link register.
Consider the example of figure 160. The main program will invoke the function fn( ). The
assembly instruction to do this is BL.W fn. The BL instruction will
(i) Place the return address into the link register. This is effectively a MOV
R14,R15 operation, a register to register operation which the ARM processor has
been optimised to perform.
(ii) Branch to the function. This is effectively a MOV R15,[R15,offset] another
register instruction the ARM has been optimised to perform.
The first function fn() invokes a second function fn1( ). If the previous procedure is
repeated the return address would be lost as R14 the link register is over written. Hence at
the start of the first function the LR is pushed onto the stack.
If no further functions are called, as in fn1(), the PUSH to save the LR is not required.
To exit from a function, that is to execute a return from subroutine, the instruction is BX
LR. This is effectively a MOV R15,R14 instruction.
129
For the software functions have the advantages that they may be written once and reused, they are easier
to test and make the code more readable.
Since the first function saved the LR on the stack it should be returned before the exit.
The full code will be POP {LR} to return the LR and BX LR to return from the function.
This operation may be optimised by bypassing the LR and placing the return address
directly into the program counter. That is POP {PC}
If-then-else operations.
Microcontrollers make decisions. As a test results in a true condition the microcontroller
may execute one path in the program or if it is false it may execute a second. Traditionally
there can be significant overheads in the assembler code to execute the if-then-else
operations. See figure 161.
;if (R0 = R1)
_if_then_else
cmp r0,r1
bne _else
; then
{ then operations }
;handle true condition
bra _exit
else
{else operations}
_else
;handle else condition
_exit
Figure 162 gives working code that matches the ARM assembler syntax. Starting in
column 1 are the labels used. Across at least one space are the instructions followed by
the operands. In this case there is only one instruction, the B or branch always131. The
remaining instructions are all pseudo instructions or directives to the assembler.
130
Combinations between IT and IT <x><y><z> where x, y and z are either T or E are possible. There must
be at least one true conditional instruction. After the first true/then instruction the then and else may be
mixed. For example ITETE is acceptable. The conditions on the instructions must match the IT instruction.
131
The assembler will accept the syntax B . where . represents the current program counter. The B Loop
has been used here just for clarity.
AREA
EXPORT
DCD
DCD
0
Reset_Handler
AREA
EXPORT
__Vectors
Reset_Handler
;
Loop
PROC
B
ENDP
END
132
While in this example the stack pointer is not required a value is necessary to move the assembler
forward to the next location which is required.
133
The ARM simulator and debugger on start up will run to the label Reset_Handler so this label is
required. Users cannot substitute their own.
Done?
Yes -exit
All designs are an iterative process. At this point the number of variables and their assignment is not cast
in concrete. Changes can be made as necessary.
To test if all numbers have been added the counter R1 will be compared with the number
of elements R0 using the compare instruction. If these are equal the Z flag is set. If they
are not equal Z is cleared. When the result is not equal the addition must be repeated. The
required code is:
cmp r1,r0
;all done ?
bne
Next
;no - do next
The array of numbers.
If the data is fixed it will be placed in ROM. Typically it will be coded between the ENDP
and END directives in the code template of figure 162. Possible data might be:
AREA Table, DATA, READONLY
Tbe
DCB 0x12,0x34,0x56,0x78,0x21,0x43,0x65,0x87
DCB 0x93,0x39,0x05,0x50,0x75,0x57,0x21,0x81
The table has been given the label Tbe. Each element in the table is a byte defined by
the directive define constant byte DCB. Since each element is a byte it is loaded into R4
using the LDRB instruction and the pointer is incremented by one after each pass.
If the data were 32 bits (DCD directive) the LDR rather than the LDRB instruction would
be used and the pointer (R3) incremented by 4 after each pass.
Debug
System
Internal Bus
Peripherals
Memory
Clock &
Reset
Input Output.
this instruction set often required multiple instructions to duplicate the full ARM
instruction set.
The Thumb2 instruction set was introduced in 2003 and enhances the Thumb Instruction
Set with additional 32 bit instructions. Overall the Thumb2 gives a code density similar to
the Thumb but with a performance similar to the original ARM. Figure 165 illustrates the
relationship between the ARM, Thumb and Thumb2 instruction sets. As illustrated
Thumb2 enhances the ARM instruction set with some additional instructions. The
STMicroelectronics STM32F107VC used in the Keil MCBSTM32C EVB uses the
Thumb2 instruction set.
ARM Instruction Set
Thumb2
Instruction
Set
F04F0010
F04F0100
F04F0200
4B09
MOV
MOV
MOV
LDR
r0,#0x10
r1,#0x00
r2,#0x00
r3,[pc,#36];
0x0800002E
0x08000030
0x08000032
0x08000036
0x0800003A
781C
4422
F1030301
F1010101
4281
LDR
ADD
ADD
ADD
CMP
r4,[r3]
r2,r2,r4
r3,#0x01
r1,#0x01
r1,r0
0x0800003C
D1F7
BNE 0x0800002E
The assembler will give a warning message. Normally no action is required by the programmer.
destination to fill up the pipeline. To reduce the overhead of branch instruction the
processor includes a prediction unit that will calculate the branch address so the
destination address is available if required.
Instruction
N
N+1
N+2
(Branch)
Next
Branch
Taken
1
Fetch
2
Decode
Fetch
3
Execution
Decode
Fetch
Execution
Decode
Execution
Fetch
Decode
Execution
14.10. Exercises.
1.
2.
3.
In the example of section 14.5 one register (R0) was used to contain the size of the
array while a second (R1) was used to count the elements of the array that had
been used. When R1=R0 the operations were complete. An alternative approach is
to set the counter to the size of the array and on each operation decrement the
counter. Modify the code to use this technique/approach137.
The following data is stored in memory:
Tbe
DCB 0x12,0x34,0x56,0x78,0x21,0x43,0x65,0x87
DCB 0x93,0x39,0x05,0x50,0x75,0x57,0x21,0x81
Using the flowchart of figure 168 as a guide write the code to sort the data and
place into a second table at address Tbe2.
Register-register operations in the ARM microcontroller take one clock cycle.
With a 50MHz clock each instruction takes 20ns. Some instructions take
additional cycles. When the branch instruction is taken the pipeline must be
flushed. Assume that figure 167 describes the timing profile of critical code.
Estimate the average instruction time.
136
The last 2 ADD instructions of figure 166 are examples of 32 bit instructions that are not word aligned.
In undertaking this exercise review the consequences of adding S to the instruction. Ie ADDS instead
of ADD, SUBS instead of SUB etc.
137
Counter1 =16
Counter2 =15
Pointer to first entry.
Swap entries.
Not Done
Decrement Counter1
Not Done
MUL
MVN
NEG
ORR
POP
PUSH
ROR
SBC
STMIA
STR
STRB
STRH
SUB
SWI
TST
Multiply register
Move inverted register to register
Negate
Logical OR
Load multiple registers from stack
Save multiple registers to stack
Rotate right
Subtract with Carry
Store multiple registers to memory
Store 32-bit word to memory
Store byte to memory
Store 16-bit half-word to memory
Subtract constant or register
Call software interrupt function
Test bits
Rd := Rd + Rm + C
Rd := Rn + { imm, Rm }
Rd := Rd & Rm
Rd := (signed) Rm >> { imm5, Rs }
R15 := label
BIC
BKPT1
BL
BLX1
Rd := Rd AND NOT Rm
BX
CMN
CMP
EOR
LDMIA
LDR
LDRB
LDRH
LDRSB
LDRSH
Rd := [address][31:0]
Rd := ZeroExtend ([address][7:0])
Rd := ZeroExtend ([address][15:0])
Rd := SignExtend ([address][7:0])
Rd := SignExtend ([address][15:0])
LSL
LSR
MOV
Rd := Rm << { imm5; Rs }
Rd := (unsigned) Rm >> { imm5, Rs }
Rd := { imm, Rm }
MUL
MVN
NEG
ORR
POP
PUSH
ROR
SBC
STMIA
STR
STRB
Rd := Rm * Rs
Rd := NOT Rm
Rd := -Rm
Rd := Rd OR Rm
Pop register list
Push register list
Rd := Rd ROR Rs[7:0]
Rd := Rd Rm - NOT C
Store register list
[address][31:0] := Rd
[address][7:0] := Rd[7:0]
Description
Add with carry
Add constant or register to register
Logical AND
Arithmetic shift right
Branch conditional or
unconditional
Bit Clear
Enter debug state Breakpoint
Branch with link
Branch with link and exchange
Branch and exchange
Compare negated register
Compare constant or register
Exclusive OR
Load multiple registers from
memory
Load 32-bit word from memory
Load byte from memory
Load 16-bit halfword from memory
Load signed byte from memory
Load signed 16-bit halfword from
memory
Logical Shift Left
Logical Shift Right
Move constant or register to
register
Multiply register
Move inverted register to register
Negate
Logical OR
Load multiple registers from stack
Save multiple registers to stack
Rotate right
Subtract with Carry
Store multiple registers to memory
Store 32-bit word to memory
Store byte to memory
STRH
SWI
SUB
TST
[address][15:0] :=Rd[15:0]
Software interrupt
Rd := Rn - { imm; Rm }
Set CPSR flags on: Rn AND Rm