Anda di halaman 1dari 51

The Acorn RISC Machine

The first ARM processor was developed at Acorn Computers Limited, of Cambridge, England, between ctober !"#$ and April !"#%& At that time, and until the formation of Advanced RISC Machines Limited 'which later was renamed simpl( ARM Limited) in !""*, ARM stood for Acorn RISC Machine.

Architectural inheritance
At the time the first ARM chip was designed, the onl( e+amples of RISC architectures were the ,er-ele( RISC I and II and the Stanford MI.S 'which stands for Microprocessor without Interlocking Pipeline Stages).

/eatures used
The ARM architecture incorporated a number of features from the ,er-ele( RISC design, but a number of other features were re0ected& Those that were used were1 2 a load3store architecture4 2 fi+ed3length $53bit instructions4 2 $3address instruction formats&

/eatures re0ected
Register windows
The register ban-s on the ,er-ele( RISC processors incorporated a large number of registers, $5 of which were visible at an( time& Reduce the data traffic between the processor and memor( resulting from register saving and restoring& The principal problem with register windows is the large chip area occupied b( the large number of registers&

shadow registers used to handle e+ceptions on the ARM& 2 6ela(ed branches ,ranches cause pipelines problems since the( interrupt the smooth flow of instructions& Most RISC processors avoid this problem b( using dela(ed branches where the branch ta-es effect after the following instruction has e+ecuted& The( wor- well on single issue pipelined processors, but the( do not scale well to super3scalar implementations and can interact badl( with branch prediction mechanisms.

n the original ARM dela(ed branches were not used because the( made e+ception handling more comple+. 2 Single3c(cle e+ecution of all instructions Although the ARM e+ecutes most data processing instructions in a single cloc- c(cle, man( other instructions ta-e multiple cloc- c(cles& simple load or store instruction re7uires at least two memor( accesses 'one for the instruction and one for the data)&

single c(cle operation of all instructions is onl( possible with separate data and instruction memories, which were considered too e+pensive for the intended ARM application areas& Instead of single3c(cle e+ecution of all instructions, the ARM was designed to use the minimum number of c(cles re7uired for memor( accesses&

The ARM programmer8s model


A processor8s instruction set defines the operations that the programmer can use to change the state of the system incorporating the processor& This state usuall( comprises the values of the data items in the processor8s visible registers and the s(stem8s memor(& A processor will t(picall( have man( invisible registers involved in e+ecuting an instruction, the values of these registers before and after the instruction is e+ecuted are not significant4 onl( the values in the visible registers have an( significance&

Current .rogram Status Register 'C.SR)


The C.SR is used in user3level programs to store the condition code bits& These bits are used, for e+ample, to record the result of a comparison operation and to control whether or not a conditional branch is ta-en&

The bits at the bottom of the register control the processor mode& The 9T: field is used to switch between ARM and Thumb instruction sets& The 9I: and 9/: flags enable normal and fast interrupts respectivel(& /inall(, the 9mode: field selects one of seven e+ecution modes& ;ser mode is the main e+ecution mode& ,( running application software in user mode, the operating s(stem can achieve protection and isolation&

/ast interrupt processing mode is entered whenever the processor receives an interrupt signal from the designated fast interrupt source& <ormal interrupt processing mode is entered whenever the processor receives an interrupt signal from an( other interrupt source& Software interrupt mode is entered when the processor encounters a soft3ware interrupt instruction& ;ndefined instruction mode is entered when the processor attempts to e+ecute an instruction that is supported neither b( the main integer core nor b( one of the coprocessors&

S(stem mode is used for running privileged operating s(stem tas-s& Abort mode is entered in response to memor( faults& <1 <egative4 the last AL; operation which changed the flags produced a negative result 'the top bit of the $53bit result was a one)& =1 =ero4 the last AL; operation which changed the flags produced a >ero result 'ever( bit of the $53bit result was >ero)&

C1 Carr(4 the last AL; operation which changed the flags generated a carr(3out, either as a result of an arithmetic operation in the AL; or from the shifter& ?1 o?erflow4 the last arithmetic AL; operation which changed the flags generated an overflow into the sign bit&

The memor( s(stem


In addition to the processor register state, an ARM s(stem has memor( state& Memor( ma( be viewed as a linear arra( of b(tes numbered from >ero up to 5$53l& 6ata items ma( be #3bit b(tes, !@3bit half3words or $53bit words& Aords are alwa(s aligned on B3b(te boundaries 'that is, the two least significant address bits are >ero) and half3words are aligned on even b(te boundaries&

Load3store architecture
In common with most RISC processors, ARM emplo(s a load3store architecture& The instruction set will onl( process 'add, subtract, and so on) values which are in registers 'or specified directl( within the instruction itself), and will alwa(s place the results of such processing into a register& The onl( operations which appl( to memor( state are ones which cop( memor( values into registers 'load instructions) or cop( register values into memor( 'store instructions)&

ARM does not support such 8memor(3to3memor(8 operations& Therefore all ARM instructions fall into one of the following three categories1 !& 6ata processing instructions These use and change onl( register values& /or e+ample, an instruction can add two registers and place the result in a register& 5& 6ata transfer instructions These cop( memor( values into registers 'load instructions) or cop( register values into memor( 'store instructions)&

$& Control flow instructions <ormal instruction e+ecution uses instructions stored at consecutive memor( addresses& Control flow instructions cause e+ecution to switch to a different address, either permanentl( 'branch instructions) or saving a return address to resume the original se7uence 'branch and lin- instructions)&

Supervisor mode
ARM processor supports a protected supervisor mode& The protection mechanism ensures that user code cannot gain supervisor privileges without appropriate chec-s being carried out to ensure that the code is not attempting illegal operations& s(stem3level functions can onl( be accessed through specified supervisor calls& These functions generall( include an( accesses to hardware peripheral registers, and to widel( used operations such as character input and output.

The ARM instruction set


All ARM instructions are $5 bits wide 'e+cept the compressed !@3bit Thumb Instructions) and are aligned on B3b(te boundaries in memor(& The most notable features of the ARM instruction set are1 The load3store architecture& 2 $3address data processing instructions 'that is, the two source operand registers and the result register are all independentl( specified)&

2 Conditional e+ecution of ever( instruction& 2 The inclusion of ver( powerful load and store multiple register instructions& The abilit( to perform a general shift operation and a general AL; operation in a single instruction that e+ecutes in a single cloc- c(cle& pen instruction set e+tension through the coprocessor instruction set, including adding new registers and data t(pes to the programmer8s model& A ver( dense !@3bit compressed representation of the instruction set in the Thumb architecture&

The IC s(stem
The ARM handles IC 'inputCoutput) peripherals 'such as dis- controllers, networ- interfaces, and so on) as memor(3mapped devices with interrupt support& The internal registers in these devices appear as addressable locations within the ARM8s memor( map and ma( be read and written using the same 'load3 store) instructions as an( other memor( locations.

.eripherals ma( attract the processor8s attention b( ma-ing an interrupt re7uest using either the normal interrupt (IRQ) or the fast interrupt (FIQ) input. ,oth interrupt inputs are level3sensitive and mas-able& Some s(stems ma( include direct memor( access '6MA) hardware e+ternal to the processor to handle high3bandwidth IC traffic&

ARM e+ceptions
The ARM architecture supports a range of interrupts, traps and supervisor calls, all grouped under the general heading of e+ceptions& The general wa( these are handled is the same in all cases1 1. The current state is saved b( cop(ing the .C into rl4_exc and the C.SR into S.SRDe+c 'where exc stands for the e+ception t(pe).

5& The processor operating mode is changed to the appropriate e+ception mode& $& The .C is forced to a value between **!@ and !C!@, the particular value depending on the t(pe of e+ception& The instruction at the location the .C is forced to the vector address will usuall( contain a branch to the e+ception handler& The e+ception handler will use rl$De+c, which will normall( have been initiali>ed to point to a dedicated stac- in memor(, to save some user registers for use as wor- registers&

ARM development tools


Coherent range of tools developed b( ARM Limited, man( third part( and public domain tools available, such as an ARM bac-3end for the gcc C compiler. C or assembler source files are compiled or assembled into AR o!"ect format (.aof) files# which are then lin-ed into AR image format (.aif) files.

The image format files can be built to include the debug tables re7uired b( the ARM s(mbolic debugger 'ARMsd which can load, run and debug programs either on hardware such as the ARM 6evelopment ,oard or using a software emulation of the ARM 'the ARMulator)& The ARM C compiler is compliant with the A<SI 'American <ational Standards Institute) standard for C and is supported b( the appropriate librar( of standard functions&

It uses the ARM .rocedure Call Standard for all e+ternall( available functions. It can produce assembl( source output instead of ARM ob0ect format& The compiler can also produce Thumb Code& The ARM assembler is a full macro assembler which produces ARM ob0ect format output that can be lin-ed with output from the C compiler& The lin-er ta-es one or more ob0ect files and combines them into an e+ecutable program&

It resolves s(mbolic references between the ob0ect files and e+tracts ob0ect modules from libraries as needed b( the program& It can assemble the various components of the program in a number of different wa(s, depending on whether the code is to run in RAM 'Random Access Memor(, which can be read and written) or R M 'Read nl( Memor(), whether overla(s are re7uired, and so on&

The ARM s(mbolic debugger is a front3end interface to assist in debugging programs running either under emulation 'on the ARMulator) or remotel( on a target s(stem such as the ARM development board& The remote s(stem must support the appropriate remote debug protocols either via a serial line or through a ETAF test interface& It allows the setting of brea-points, which are addresses in the code that, if e+ecuted, cause e+ecution to halt so that the processor state can be e+amined&

emulator) is a suite of The ARMulator (AR programs that models the behaviour of various ARM processor cores in software on a host s(stem& It can operate at various levels of accurac(1 Instruction$accurate modelling gives the e+act behaviour of the s(stem state without regard to the precise timing characteristics of the processor& 2 C%cle$accurate modelling gives the e+act behaviour of the processor on a c(cleb(3c(cle basis, allowing the e+act number of cloc- c(cles that a program re7uires to be established&

&iming$accurate modelling presents signals at the correct time within a c(cle, allowing logic dela(s to be accounted for& All these approaches run considerabl( slower than the real hardware& The ARM 6evelopment ,oard is a circuit board incorporating a range of components and interfaces to support the development of ARM3based s(stems&

$3stage pipeline ARM organi>ation


The register ban-, which stores the processor state& It has two read ports and one write port which can each be used to access an( register, plus an additional read port and an additional write port that give special access to r!%, the program counter& The barrel shifter, which can shift or rotate one operand b( an( number of bits& The AL;, which performs the arithmetic and logic functions re7uired b( the instruction set&

The address register and incrementer, which select and hold all memor( addresses and generate se7uential addresses when re7uired. The data registers, which hold data passing to and from memor(& The instruction decoder and associated control logic& In a single3c(cle data processing instruction, two register operands are accessed, the value on the , bus is shifted and combined with the value on the A bus in the AL;, then the result is written bac- into the register ban-.

D[31:

The $3stage pipeline


/etch4 the instruction is fetched from memor( and placed in the instruction pipeline& 6ecode4 the instruction is decoded and the datapath control signals prepared for the ne+t c(cle& In this stage the instruction 8owns8 the decode logic but not the datapath& E+ecute4 the instruction 8owns8 the datapath4 the register ban- is read, an operand shifted, the AL; result generated and written bac- into a destination register&

$3stage pipeline ARM organi>ation&


Ahen the processor is e+ecuting simple data processing instructions the pipeline enables one instruction to be completed ever( clocc(cle& An individual instruction ta-es three cloc- c(cles to complete, so it has a three3 c(cle latenc(, but the throughput is one instruction per c(cle&

ARM multi3c(cle instruction $3 stage pipeline operation

%3stage pipeline ARM organi>ation


The time, T , re7uired to e+ecute a given program is given b(1

where <inst is the number of ARM instructions e+ecuted in the course of the program. C.I is the average number of cloc- c(cles per instruction& fcl- is the processor8s cloc- fre7uenc(

Increase the cloc- rate, fcl'. This re7uires the logic in each pipeline stage to be simplified and, therefore, the number of pipeline stages to be increased& Reduce the average number of cloc- c(cles per instruction, C(I. This re7uires either that instructions which occup( more than one pipeline slot in a $3stage pipeline ARM are re3implemented to occup( fewer slots, or that pipeline stalls caused b( dependencies between instructions are reduced, or a combination of both&

The %3stage pipeline


/etch4 the instruction is fetched from memor( and placed in the instruction pipeline& 6ecode4 the instruction is decoded and register operands read from the register file& There are three operand read ports in the register file, so most ARM instructions can source all their operands in one c(cle& E+ecute4 an operand is shifted and the AL; result generated& If the instruction is a load or store the memor( address is computed in the AL;&

,ufferCdata4 data memor( is accessed if re7uired& therwise the AL; result is simpl( buffered for one cloc- c(cle to give the same pipeline flow for all instructions& Arite3bac-4 the results generated b( the instruction are written bac- to the register file, including an( data loaded from memor(.

6ata forwarding
The onl( wa( to resolve data dependencies without stalling the pipeline is to introduce for)arding paths. 6ata dependencies arise when an instruction needs to use the result of one of its predecessors before that result has returned to the register file& /orwarding paths allow results to be passed between stages as soon as the( are available, and the %3stage ARM pipeline re7uires each of the three source operands to be forwarded from an( of three intermediate result registers

Even with forwarding, it is not possible to avoid a pipeline stall& Consider the following code se7uence1 L6R r<, G & & H 4 load r< from somewhere A66 r5, r!, r< 4 and use it immediatel( The onl( wa( to avoid this stall is to encourage the compiler 'or assembl( language programmer) not to put a dependent instruction immediatel( after a load instruction&

ARM instruction e+ecution


!& 6ata processing instructions1

5& 6ata transfer instructions1

$& ,ranch instructions1

Anda mungkin juga menyukai