Stack Processing
A stack is usually implemented as a linear data structure which grows up (an ascending stack) or down (a descending stack) memory A stack pointer holds the address of the current top of the stack, either by pointing to the last valid data item pushed onto the stack (a full stack), or by pointing to the vacant slot where the next data item will be placed (an empty stack) ARM multiple register transfer instructions support all four forms of stacks
Full ascending: grows up; base register points to the highest address containing a valid item empty ascending: grows up; base register points to the first empty location above the stack Full descending: grows down; base register points to the lowest address containing a valid data empty descending: grows down; base register points to the first empty location below the stack
83
The ARM architecture uses the load-store multiple instructions to carry out stack operations.
The pop operation (removing data from a stack) uses a load multiple instruction; similarly, the push operation (placing data onto the stack) uses a store multiple instruction. When using a stack you have to decide whether the stack will grow up or down in memory. A stack is either ascending (A) or descending (D). Ascending stacks grow towards higher memory addresses; in contrast, descending stacks grow towards lower memory addresses. When you use a full stack (F), the stack pointer sp points to an address that is the last used or full location (i.e., sp points to the last item on the stack). In contrast, if you use an empty stack (E) the sp points to an address that is the first unused or empty location (i.e., it points after the last item on the stack).
There are a number of load-store multiple addressing mode aliases available to support stack operations (see Table). Next to the pop column is the actual load multiple instruction equivalent.
84
For example, a full ascending stack would have the notation FA appended to the load multiple instructionLDMFA. This would be translated into an LDMDA instruction.
85
Example 20 The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows a push onto a full descending stack. You can see that when the stack grows the stack pointer points to the last full entry in the stack. PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080014
NOTE : Stack pointer points to the last full entry in the stack.
86
Example 21 In contrast, Next figure shows a push operation on an empty stack using the STMED instruction. The STMED instruction pushes the registers onto the stack but updates register sp to point to the next empty location. PRE r1 = 0x00000002 r4 = 0x00000003 sp = 0x00080010
87
88
Stack Examples
STMFD sp!, {r0,r1,r3-r5}
STMED sp!, {r0,r1,r3-r5} STMFA sp!, {r0,r1,r3-r5} STMEA sp!, {r0,r1,r3-r5}
0x418 S P
r5 r4 r3 r1 r0
S P
Old SP
Old SP
S P
r5 r4 r3 r1 r0
r5 r4 r3 r1 r0
Old SP
Old SP
r5 r4 r3 r1 r0
0x400
S P
0x3e8
Load-Store Instructions
Three basic forms to move data between ARM registers and memory
Single register load and store instruction
A byte, a 16-bit half word, a 32-bit word
90
Rn
Rd
92
Example 21 The swap instruction loads a word from memory into register r0 and overwrites the memory with register r1. PRE mem32[0x9000] = 0x12345678 r0 = 0x00000000 r1 = 0x11112222 r2 = 0x00009000
SWP r0, r1, [r2] POST mem32[0x9000] = 0x11112222 r0 = 0x12345678 r1 = 0x11112222 r2 = 0x00009000
This instruction is particularly useful when implementing semaphores and mutual exclusion in an operating system. You can see from the syntax that this instruction can also have a byte size qualifier B, so this instruction allows for both a word and a byte swap.
93
PRE
0X00009008
0X00009008 0X00009004
r0 r1 r2
0X00009000
POST
0X00009008
0X00009008 0X00009004
r0 r1 r2
0X00009000
94
Concept of SEMAPHORE
In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment. A semaphore, in its most basic form, is a protected integer variable that can facilitate and restrict access to shared resources in a multi-processing environment. The two most common kinds of semaphores are counting semaphores and binary semaphores. Counting semaphores represent multiple resources, while binary semaphores, as the name implies, represents two possible states (generally 0 or 1; locked or unlocked).
95
be
accessed
using
the
following
wait() is called when a process wants access to a resource. This would be equivalent to the arriving customer trying to get an open table. If there is an open table, or the semaphore is greater than zero, then he can take that resource and sit at the table. If there is no open table and the semaphore is zero, that process must wait until it becomes available. signal() is called when a process is done using a resource, or when the patron is finished with his meal. The following is an implementation of this counting semaphore (where the value can be greater than 1):
96
In this implementation, a process wanting to enter its critical section it has to acquire the binary semaphore which will then give it mutual exclusion until it signals that it is done. For example, we have semaphore s, and two processes, P1 and P2 that want to enter their critical sections at the same time. P1 first calls wait(s). The value of s is decremented to 0 and P1 enters its critical section. While P1 is in its critical section, P2 calls wait(s), but because the value of s is zero, it must wait until P1 finishes its critical section and executes signal(s). When P1 calls signal, the value of s is incremented to 1, and P2 can then proceed to execute in its critical section (after decrementing the semaphore again). Mutual exclusion is achieved because only one process can be in its critical section at any time.
97
Example 22 This example shows a simple data guard that can be used to protect data from being written by another task. The SWP instruction holds the bus until the transaction is complete.
loop MOV r1, =semaphore MOV r2, #1 SWP r3, r2, [r1] ; hold the bus until complete CMP r3, #1 BEQ loop
The address pointed to by the semaphore either contains the value 0 or 1. When the semaphore equals 1, then the service in question is being used by another process. The routine will continue to loop around until the service is released by the other process in other words, when the semaphore address location contains the value 0.
98
3. Load-Store Instructions
4. Software Interrupt Instruction 5. Program Status Register Instructions
99
Binary encoding
31 28 27 24 23
COND
OPCODE
100
Binary encoding
31 28 27 OPCODE 24 23 24-BIT (INTERPRETED) IMMEDIATE 0
COND
Description
The 24-bit immediate field does not influence the operation of the instruction but may be interpreted by the system code. If the condition is passed the instruction enters supervisor mode using the standard ARM exception entry sequence. In detail, the processor actions are: 1. Save the address of the instruction after the SWI in r14_svc. 2. Save the CPSR in SPSR_svc. 3. Enter supervisor mode and disable IRQs (but not FIQs) by setting CPSR[4:0] to 100112 and CPSR[7] tol. 4. Set the PC to and begin executing the instructions there.
To return to the instruction after the SWI the system routine must not only copy r14_svc 101 back into the PC, but it must also restore the CPSR from SPSR_svc.
102
Example 23 Here we have a simple example of an SWI call with SWI number 0x123456, used by ARM toolkits as a debugging SWI. Typically the SWI instruction is executed in user mode. PRE cpsr = nzcVqift_USER pc = 0x00008000 lr = 0x003fffff; lr = r14 r0 = 0x12 0x00008000 POST SWI 0x123456
Since SWI instructions are used to call operating system routines, you need some form of parameter passing. This is achieved using registers. In this example, register r0 is used to pass the parameter 0x12. The return values are also passed back via registers. Code called the SWI handler is required to process the SWI call. The handler obtains the SWI number using the address of the executed instruction, which is calculated from the link register lr.
103
3. Load-Store Instructions
4. Software Interrupt Instruction 5. Program Status Register Instructions(MSR, MRS)
104
Byte organizations
Little-endian mode:
- with the lowest-order byte residing in the loworder bits of the word
Big-endian mode:
- the lowest-order byte stored in the highest bits of the word
Byte organizations
Thumb Mode
Thumb is a 16-bit instruction set
Optimized for code density from C code Improved performance form narrow memory Subset of the functionality of the ARM instruction set
108
Thumb implementation uses more instructions, the overall memory footprint is reduced. Code density was the main driving force for the Thumb instruction set. Because it was also designed as a compiler target, rather than for hand-written assembly code, we recommend that you write Thumb-targeted code in a high-level language like C or C++.
109
The higher registers r8 to r12 are only accessible with MOV, ADD, or CMP instructions.
CMP and all the data processing instructions that operate on low registers update the condition flags in the cpsr.
110
111
112
113
Thumb Entry
ARM cores startup, after reset, execution ARM instructions Executing a branch and Exchange instruction (BX)
Set the T bit if the bottom bit of the specified register was set Switch the PC to the address given in the remainder of the register
Thumb Exit
Executing a thumb BX instruction
114
ARM-Thumb Interworking
ARM-Thumb interworking is the name given to the method of linking ARM and Thumb code together for both assembly and C/C++.
To call a Thumb routine from an ARM routine, the core has to change state. This state change is shown in the T bit of the cpsr. The BX and BLX branch instructions cause a switch between ARM and Thumb state while branching to a routine. The BX lr instruction returns from a routine, also with a state switch if necessary.
115
There are two versions of the BX or BLX instructions: an ARM instruction and a Thumb equivalent. The ARM BX instruction enters Thumb state only if bit 0 of the address in Rn is set to binary 1; otherwise it enters ARM state. The Thumb BX instruction does the same.
Syntax: BX Rn BLX Rn | label
116
Interworking Instructions
Interworking is achieved using the Branch Exchange instructions In Thumb state BX Rn In ARM state (on Thumb-aware cores only) BX<condition> Rn Where Rn can be any registers (R0 to R15) The performs a branch to an absolute address in 4GB address space by copying Rn to the program counter Bit 0 of Rn specifies the state to change to
117
118
Example 24
;Start off in ARM state CODE32 ADR r0,Into_Thumb+1 ;generate branch target ;address & set bit 0 ;hence arrive Thumb state BX r0 ;branch exchange to Thumb CODE16 ;assemble subsequent as Thumb Into_Thumb ADR r5,Back_to_ARM ;generate branch target to ;word-aligned address, ;hence bit 0 is cleared. BX r5 ;branch exchange to ARM CODE32 ;assemble subsequent as ARM Back_to_ARM
119
Summary
120
BIC r0,r1,r2 sets r0 to r1 and not r2 - uses the second source operand as a mask, a bit in mask is 1, the corresponding bit in first source operand is cleared
Compare Negated compare, uses an addition to set the status bits Bit-wise test, a bit-wise AND Bit-wise negated test, an exclusive-or