NZCV
unused
IF T
mode
Not all ARM processors are capable of executing Thumb instructions; those that are have a T in their name, such as the ARM7TDMI described in Section 9.1 on page 248.
Thumb 16-bit compressed instruction set; On-chip Debug support, enabling the processor to halt in response to a debug
request. An enhanced Multiplier, with higher performance than its predecessors and yielding a full 64-bit result. Embedded ICE hardware to give on-chip breakpoint and watchpoint support.
ychang@CS.NCHU, 2012 ychang@CS.NCHU,
Thumb entry
The normal way they switch to execute Thumb instructions is by executing a Branch and Exchange instruction (see BX, pp. 115 and example, pp. 116).
This instruction sets the T bit if the bottom bit of the specified register was set, and switches
the PC to the address given in the remainder of the register.
Thumb exit An explicit switch back to an ARM instruction stream can be caused by
executing a Thumb BX instruction (see example, pp. 117). An implicit return takes place whenever an exception is taken, since exception entry is always handled in ARM code.
Thumb Systems
A typical embedded system will include a small amount of fast 32-bit memory on the same chip as the ARM core and will execute speed-critical routines (such as digital signal processing algorithms) in ARM code from this memory. The bulk of the code will not be speed-critical and may execute from a 16-bit off-chip ROM.
r13 is used as a stack pointer. r14 is used as the link register. r15 is the program counter (PC).
The remaining registers (r8 to r12 and CPSR) have only restricted access:
A few instructions allow the Hi register (r8 to r15) to be specified. The CPSR condition code flags are set by arithmetic and logical operations and control
conditional branching.
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 SP (r13) LR (r14) PC (r15)
ychang@CS.NCHU, 2012 ychang@CS.NCHU,
Hi registers CPSR
Thumb-ARM similarities
All Thumb instructions are 16 bits long. They map onto ARM instructions so they inherit many properties of the ARM instruction set:
The load-store architecture. Support for 8-bit byte, 16-bit half-word and 32-bit word data types. A 32-bit unsegmented memory.
Thumb-ARM differences
In order to achieve a 16-bit instruction length a number of characteristic features of the ARM IS have been abandoned:
Most Thumb instructions are executed unconditionally. (All ARM instructions are executed
conditionally.) Many Thumb data processing instructions use a 2-address format. (the destination register is the same as one of the source registers). (ARM data processing instructions, with the exception of the 64-bit multiplies, use a 3-address format.) Thumb instruction formats are less regular than ARM instruction formats, as a result of the dense encoding.
Short conditional branches to control (for example) loop exit; Medium-range unconditional branches to goto sections of code; Long-range subroutine calls.
ARM handles all these with the same instruction, typically wasting many bits of the 24-bit offset in the first two cases. Thumb has to be more efficient, using different formats for each of these cases.
15 12 11 8 7 0
1101
15
cond
11 10
8-bit offset
0
Thumb target
11100
15 12 11 10
11-bit offset
0
(2) B <label>
Thumb target
1111 H
15 11 10
11-bit offset
1 0
(3) BL <label>
Thumb target
11101
15
10-bit offset
8 7 6 5 3 2
0
0
ARM target
01000111
ychang@CS.NCHU, 2012 ychang@CS.NCHU,
L H Rm
000
(4) B{L}X Rm
11011111
8-bit immediate
The address of the next Thumb instruction is saved in r14_svc. The CPSR is saved in SPSR_svc. The processor disables IRQ, clears the Thumb bit and enters supervisor mode by modifying
the relevant bits in the CPSR. The PC is forced to address 0x08.
The ARM instruction SWI handler is then entered. The normal return instruction restores the Thumb execution state. Assembler format: SWI <8-bit immediate>
The equivalent ARM instruction has an identical assembler syntax; the 8-bit immediate is zero-extended to fill the 24-bit field in the ARM instruction.
000110
15 10
A
9 8
Rm
6 5
Rn
3 2
Rd
0
000111
15 13 12 1 1 10
A #imm3
8 7
Rn
Rd
0
001
15
Op
Rd/Rn
6
#imm8
5 3 2 0
13 12 1 1 10
000
15
Op
10 9
#sh
6 5
Rn
3 2
Rd
0
010000
15 10 9
Op
8 7 6
Rm/Rs
5 3
Rd/Rn
2 0
010001
15 12 1 1 10
Op
8
D M
7
Rm
Rd/Rn
0
1010
15
Rd
8 7 6
#imm8
0
10110000
#imm7
10
11
12
All the data processing instructions that operate with and on the Lo registers update the condition code bits (the S bit is set in the equivalent ARM instruction). The instructions that operate with and on the Hi registers do not change the condition code bits, with the exception of CMP which only changes the condition codes. The instructions that are indicated above as requiring 1 or 2 Hi regs must have one or both register operands specified in the Hi register area. #imm3, #imm7 and #imm8 denote 3-, 7- and 8-bit immediate fields respectively. #sh denotes a 5-bit shift amount field.
ychang@CS.NCHU, 2012 ychang@CS.NCHU,
13
011
15
B L
12 11 10
#off5
6 5
Rn
3 2
Rd
0
1000
15
L
12 11
#off5
9 8 6 5
Rn
3 2
Rd
0
0101
15
Op
11 10 8
Rm
7
Rn
Rd
0
01001
15 12 11 10
Rd
8 7
#off8
0
1001
Rd
#off8
These instructions are a carefully derived subset of the ARM single register transfer instructions, and have exactly the same semantics as the ARM equivalent. In all cases the offset is scaled to size of the data type, so the range of the 5-bit offset is 32 bytes in a load or store byte instruction, 64 bytes in a load or store half-word instruction and 128 bytes in a load or store word instruction.
14
15
16
1100
15
Rn
10 9 8 7
reg list
0
101111
L R
reg list
The block copy forms of the instruction use the LDMIA and STMIA addressing modes. The base register may be any of the Lo register, and the register list may include any subset of these registers but should not include the base register. The stack forms use SP (r13) as the base register. In addition to the eight registers which may be specified in the register list, the link register (LR, or r14) may be included in the PUSH instruction and PC (r15) may be included in the POP form.
17
18
Thumb Implementation
The Thumb instruction set can be incorporated into a 3-stage pipeline ARM processor macrocell with relatively minor changes. The 5-stage pipeline implementations are trickier. The biggest addition is the Thumb instruction decompressor in the instruction pipeline; this logic translates a Thumb instruction into its equivalent ARM instruction.
B operand bus data in immediate elds ARM instruction decoder
mux
Thumb decompressor
select high or low half-word
mux
instruction pipeline
19
Performing look-up to translate the major and minor opcodes. Zero-extending the 3-bit register specifiers to 4-bit specifiers. Mapping other fields across as required.
Example
Thumb: ADD ARM: ADDS Rd, #imm8 Rd, Rd, #imm8
The simplicity of the decompression logic is crucial to the efficiency of the Thumb instruction set. There would be little merit in the Thumb architecture if it resulted in complex, slow and power-hungry decompression logic.
20
Example
Thumb: ADD ARM: ADDS Rd, #imm8 Rd, Rd, #imm8
Since the only conditional Thumb instructions are branches, the condition always is used in
translating all other Thumb instructions. Whether or not a Thumb data processing instruction should modify the condition codes in the CPSR is implicit in the Thumb opcode; this must be made explicit in the ARM instruction. The Thumb 2-address format can always be mapped into the ARM 3-address format by replicating a register specifier
15 13 12 11 10 8 7 0
0 0 1 10
alwa ys condition
Rd
#imm8
zero shift
immediate value
31
28 27 26 25 24
21 20 19
16 15
12 11
1110 00 1 0100 1
ychang@CS.NCHU, 2012 ychang@CS.NCHU,
0 Rd
0 Rd
0000
#imm8
21
Thumb applications
Thumb Properties
The Thumb code requires 70% of the space of the ARM code. The Thumb code uses 40% more instructions than the ARM code. With 32-bit memory, the ARM code is 40% faster than the Thumb code. With 16-bit memory, the Thumb code is 45% faster than the ARM code. Thumb code uses 30% less external memory power than ARM code. So where performance is all-important, a system should use 32-bit memory and run ARM code. Where cost and power consumption are more important, a 16-bit memory system and Thumb code may be a better choice.
Thumb Systems A high-end 32-bit ARM system may use Thumb code for certain non-critical
routines to save power or memory requirements. A low-end 16-bit system may have a small amount of on-chip 32-bit RAM for critical routines running ARM code, but use off-chip Thumb code for all noncritical routines.
22