Anda di halaman 1dari 30

ES ZG512

Embedded System Design


Introduction to ARM
Surabhi Narayanan
BITS Pilani Bangalore Professional Development Center
Pilani Campus
Reference Materials
• Steve Furber, ARM System-on-chip Architecture, Second
Edition, Pearson, 2007
• ARM Architecture Reference Manual

BITS Pilani, Pilani Campus


Where do Processors
Spend Time ?
Instruction Type Dynamic Usage
Data movement 43%
Control flow 23%
Arithmetic operations 15%
Comparisons 13%
Logical operations 5%
Others 1%

Source: Steve Furber, ARM System-on-chip Architecture, Second Edition, Pearson, 2007

BITS Pilani, Pilani Campus


Without Pipelining
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
C
d
e
r D

Sequential laundry takes 6 hours for 4 loads


If they learned pipelining, how long would laundry take?
BITS Pilani, Pilani Campus
With Pipelining
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20
T
a A
s Pipelined laundry takes 3.5
k hours for 4 loads
B
O
r
C
d
e
r D

BITS Pilani, Pilani Campus


Pipeline
Example of a 3-stage pipeline:
 Fetch, Decode, Execute
 3 cycle latency
Instruction

i Fetch Decode Execute

i+1 Fetch Decode Execute

i+2 Fetch Decode Execute


cycle

t t+1 t+2 t+3 t+4

BITS Pilani, Pilani Campus


Pipeline Hazards
 Structural Hazards
 These occur when a single piece of hardware is used in more than one stage
of the pipeline, so it's possible for two instructions to need it at the same
time.
 Example: One memory unit being shared by subsequent instructions.

 Control Hazards
 This is when a decision needs to be made, but the information needed to
make the decision is not available yet.
 Example: Branch instruction in a pipeline.
In some RISC architectures, the instruction following the branch is
executed whether or not the branch is taken. It is called delayed branch.

BITS Pilani, Pilani Campus


Pipeline Hazards
 Data Hazards
 This is when reads and writes of data occur in a different order in the
pipeline than in the program code.
 RAW (Read After Write): Occurs when, one instruction reads a location after
an earlier instruction writes new data to it, but in the pipeline the write
occurs after the read.
 WAR (Write After Read): A write occurs after a read, but the pipeline causes
write to happen first.
 WAW (Write After Write): Two writes occur out of order.

In order to resolve these hazards, pipeline stall happens.

BITS Pilani, Pilani Campus


CISC Processor
Architecture
 CISC – Complex Instruction Set Computer
 Instructions are of variable format and length
 Large number of instructions
 Practically any instruction can reference memory
 Single set of work registers
 Micro-programmed instruction decode logic. Complexity of
instruction is handled at this level
 Great variety of addressing modes
 No instruction pipelines
 Programmer’s coding effort is simplified
 Example: x86 Architecture

BITS Pilani, Pilani Campus


RISC Processor
Architecture
 RISC – Reduced Instruction Set Computer
 Fixed 32-bit instruction size
 Load-store architecture – Instructions operating on data
operates only on registers.
 Large register bank of thirty-two 32-bit registers
 Hard-wired instruction decode logic (CISC processors use
large microcode ROMs)
 Pipelined execution
 Single-cycle execution (most of the instructions)
 Performance is enhanced
 Example: ARM Architecture

BITS Pilani, Pilani Campus


ARM - Characteristics
• ARM cores are very simple compared to other
general purpose processors.

• Both ARM ISA and pipeline design are aimed at


minimising energy consumption.

• ARM architecture is highly modular.

• While being small and low power, ARM processors


provide high performance.
BITS Pilani, Pilani Campus
ARM - Characteristics
• 32-bit RISC Processor Core
 Means all operations are 32 bits
• Soft Processor
 ARM licenses the core, doesn’t manufactures the chip as such
• Load-Store Architecture
 All Memory Operations are either Load or Store, not any other
operation
• Pipelined architecture
• Von-Neumann or Harvard
 Initial versions of ARM follows Von-Neumann Architecture (single memory)
 Later versions follows Harvard (multiple memory) – ARM9

BITS Pilani, Pilani Campus


ARM - Characteristics
 T: Thumb Mode
 D: Debug interface (JTAG)
 M: Multiplier
 I: ICE interface (Trace-Break point)

• Two Instruction sets


 32-bit mode
 16-bit THUMB (here the size of the op-code 16 bit, but
operands are 32 bits)
 These are compressed instructions. So after being fetched, the 16-bit
instructions has to be decompressed to 32-bit.
 So slight delay will be involved.
 But there will be memory saving
 Not all 32-bit instructions have 16-bit versions
BITS Pilani, Pilani Campus
RISC Characteristics not
available in ARM
• Register Window
• Large number of register banks with 32 visible at any point of time
• Procedure entry and exit instructions move the visible window to the new
registers
• It requires large chip area, because of large number of registers,
thereby increasing cost – so rejected in ARM architecture

• Delayed Branches
• Branch take effects after the following instruction is executed
• Affects the atomicity of the instructions – therefore rejected in ARM
architecture

• Single Cycle Instructions


• Simple load and store instructions require two memory accesses (one for
instruction and one for data)
• Single cycle load-and-store require separate data and instruction
memory, there by increasing cost – so rejected in ARM architecture.
BITS Pilani, Pilani Campus
ARM Cores
• ARMv1 to ARMv3 - Obsolete
• ARMv4 – only 32 bit ARM instructions
• ARMv4T – 32 bit ARM with THUMB

• ARMv5TE – Enhanced Instructions Set


 Support for DSP instruction sets
• ARMv5TEJ – Supports Jazelle DBX
 Java byte codes are directly executed by ARM core (DBX – Direct Byte Access)
 Java byte codes are recognized and processed by a Jazelle Decoding engine

• ARMv6
 SIMD (Single Instruction Multiple Data)
 Thumb2
• Some instructions can be 32 bits, some can be 16 bits op-code
 Neon
• Supports image and video processing
BITS Pilani, Pilani Campus
ARM Cores
ARMv7-A – For high performance computing
ARMv7-R – For real time system
ARMv7-M – For microcontrollers
ARMv8 – 32/64 bit CPU, multithreaded, high performance

BITS Pilani, Pilani Campus


Development of the ARM
Architecture
v4 v5 v6 v7

• Halfword • Improved • SIMD Instructions Thumb-2


and signed interworking • Multi-processing
halfword / • Saturated • v6 Memory Extensions:
byte support arithmetic architecture v7A (applications) – NEON
• DSP MAC • Unaligned data v7R (real time) – HW Divide
• System instructions support v7M (microcontroller) – HW
mode Divide and Thumb-2 only
Extensions: Extensions:
• Thumb Jazelle (5TEJ) Thumb-2 (6T2)
instruction TrustZone (6Z)
set (v4T) Multicore (6K)
Thumb only (6-M)

 Note that implementations of the same architecture can be different


 ARM7TDMI - architecture v4T. Von Neumann core with 3 stage pipeline
 ARM920T - architecture v4T. Harvard core with 5 stage pipeline and MMU

BITS Pilani, Pilani Campus


Architecture Revisions
ARMv7

ARM1156T2F-S™
version

ARM1136JF-S™

ARMv6

ARM1176JZF-S™
ARM102xE XScaleTM ARM1026EJ-S™

ARMv5

ARM9x6E ARM926EJ-S™
® SC200™
ARM7TDMI-S™ StrongARM ARM92xT

SC100™ ARM720T™

1994 1996 1998 2000 2002 2004 2006

source: http://www.arm.com/files/ppt/ARM_Teaching_Material.ppt)

BITS Pilani, Pilani Campus


Latest ARM Processor
Cores
Cortex-A – Application Processor Based on
Cortex-R – Embedded Real Time Processor
ARMv7 and
Cortex-M – Embedded Microcontrollers
SECURECORE – Securecore processors ARMv8 cores

Application processors: Embedded Real Time processors: Embedded Microcontrollers:


Smartphones Automotive Control Systems Merchant MCUs
Feature Phones Wireless and Wired Sensor Networks Automotive Control Systems
Tablets / eReaders Wireless base station infrastructure Motor Control Systems
Advanced Personal Media Players Mass Storage Controllers White Goods controllers
Digital Television Printers Smart Meters
Set-top Boxes Network Devices Sensors
Satellite Receivers Internet of Things
High-End Printers Securecore processors:
Personal Navigation Devices SIMs
Server Smart Cards
Enterprise Advanced Payment Systems
Wearables Electronic Passports
Home Networking Electronic Ticketing
Transportation

BITS Pilani, Pilani Campus


Which architecture is my
processor?

Source: http://www.arm.com/files/ppt/ARM_Processors_Architectures_-_Uni_Program_.pptx BITS Pilani, Pilani Campus


ARM Architecture Profiles
Application profile (ARMv7-A e.g. Cortex-A8)
 Memory management support (MMU)
 Highest performance at low power
 Influenced by multi-tasking OS system requirements
 TrustZone and Jazelle-RCT for a safe, extensible system

Real-time profile (ARMv7-R e.g. Cortex-R4)


 Protected memory (MPU)
 Low latency and predictability ‘real-time’ needs
 Evolutionary path for traditional embedded business

Microcontroller profile (ARMv7-M e.g. Cortex-M3)


 Lowest gate count entry point
 Deterministic and predictable behaviour a key priority
 Deeply for embedded systems

BITS Pilani, Pilani Campus


ARM Nomenclature
ARMxyzTDMIEJFS
• x: Series • E: Enhanced Instructions
• y: MMU – Enhanced instruction set,
• z: Cache may be DSP instructions
• T: Thumb • J: Java Acceleration by Jazelle
• D: Debugger – HW support to run Java byte
• M: Multiplier code
– ARM Processor has HW Multiplier • F: Vector Floating-point
unit – HW support for floating
• I: Embedded ICE Macrocel point instructions
– It is the HW circuit to generate • S: Synthesizable version
trace information, used for – ARM can be modified, since
advance debugging it comes with the soft core

BITS Pilani, Pilani Campus


ARM Nomenclature -
Examples
• ARM7TDMI
– ARM7 family of processor, which has Thumb instruction set, Debug
support, HW Multiplier support and Embedded Trace Macrocel
support
– It is the basic of all ARM cores.
– All ARM cores have TDMI support

• ARM946E-S
– ARM9xx family of soft processor, with DSP instruction set

BITS Pilani, Pilani Campus


Data Sizes and Instruction
Sets
ARM is a 32-bit load / store RISC architecture
– The only memory accesses allowed are loads and stores
– Most internal registers are 32 bits wide
– Most instructions execute in a single cycle

When used in relation to ARM cores


– Halfword means 16 bits (two bytes)
– Word means 32 bits (four bytes)
– Doubleword means 64 bits (eight bytes)

BITS Pilani, Pilani Campus


Data Sizes and Instruction
Sets
ARM cores implement two basic instruction sets
– ARM instruction set – instructions are all 32 bits long
– Thumb instruction set – instructions are a mix of 16 and 32
bits
• Thumb-2 technology added many extra 32- and 16-bit instructions
to the original 16-bit Thumb instruction set

Depending on the core, may also implement other instruction sets


– VFP instruction set – 32 bit (vector) floating point
instructions
– NEON instruction set – 32 bit SIMD instructions
– Jazelle-DBX - provides acceleration for Java VMs (with
additional software support)
– Jazelle-RCT - provides support for interpreted languages

BITS Pilani, Pilani Campus


ARM Processor Modes
ARM has seven basic operating modes
– Each mode has access to its own stack space and a different subset of registers
– Some operations can only be carried out in a privileged mode
Mode Description
Supervisor Entered on reset and when a Supervisor call
(SVC) instruction (SVC) is executed
Exception modes

Entered when a high priority (fast) interrupt is


FIQ
raised

IRQ Entered when a normal priority interrupt is raised


Privileged
modes
Abort Used to handle memory access violations

Undef Used to handle undefined instructions

Privileged mode using the same registers as User


System
mode
Mode under which most Applications / OS tasks Unprivileged
User
run mode

BITS Pilani, Pilani Campus


The ARM Register Set
• R0 to R15 are directly accessible
Registers reserved for specific functions:
• SP (R13) – Stack Pointer
• LR (R14) – Link Register
– Used to save the return address of the subroutine (or exception)
– Can be used as general purpose register, if return address is stored in the
stack

• PC (R15) – Program Counter


• CPSR: Current Program Status Register
– Contains condition code flags and the current mode bits

• SPSR: Saved Program Status Register


– Loaded with CPSR when an exceptions occurs
BITS Pilani, Pilani Campus
The ARM Register Set
User mode IRQ FIQ Undef Abort SVC
r0
r1
ARM has 37 registers, all 32-bits long
r2
r3 A subset of these registers is accessible in each mode
r4
r5 Note: System mode uses the User mode register set.
r6
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr spsr spsr spsr spsr spsr

Current mode Banked out registers


BITS Pilani, Pilani Campus
Program Status Register

BITS Pilani, Pilani Campus


Program Counter (r15)
When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned
– Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as
instruction cannot be halfword or byte aligned)

When the processor is executing in Thumb state:


– All instructions are 16 bits wide
– All instructions must be halfword aligned
– Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction
cannot be byte aligned)

When the processor is executing in Jazelle state:


– All instructions are 8 bits wide
– Processor performs a word access to read 4 instructions at once
BITS Pilani, Pilani Campus

Anda mungkin juga menyukai