Anda di halaman 1dari 17

Embedded Computer: Memory System, Input/Output

Outline

Memory System

Types of memory Caches Memory Mapped I/O Polling Interrupt

Input/Output

Ingo Sander ingo@imit.kth.se

September 4, 2007

IL2206 Embedded Systems

The memory bottleneck

Memory System

Most instructions in a RISC processor can execute in a single clock cycle BUT Access to the main memory (typically in SDRAM) is slow If memory access time can be shortened the system would perform considerably better

September 4, 2007

IL2206 Embedded Systems

Memory Performance

Memory Bandwidth

Memory Bandwidth

rate at which information can be transferred from the memory system Time between the following two time instances

If R is the number of request that the memory can serve simultaneously then BW = R/L Example:

Latency

time instance where the processor issues a request to the memory time instance where the requested data arrives and is available for use by processor
IL2206 Embedded Systems 5

A 32-bit memory with latency 20 ns has a bandwidth BW = 32 Bit / 20 ns = 1.6 GBit/s = 20 MByte/s

September 4, 2007

September 4, 2007

IL2206 Embedded Systems

Types of memory

SRAM vs. DRAM


ROM (Read Only Memory)


SRAM (Static RAM)


Mask-programmable Flash programmable (can be reprogrammed, but has long access times)

Faster Easier to integrate with logic Higher power consumption Denser Must be refreshed

RAM (Random Access Memory)


DRAM (Dynamic RAM)


DRAM SRAM

September 4, 2007

IL2206 Embedded Systems

September 4, 2007

IL2206 Embedded Systems

Synchronous DRAM

Flash issues

Clock signal is used internally to pipeline accesses


Memory must be fast enough to respond to request Request takes multiple clock cycles 1, 2, 4, 8 locations

Flash is programmed at system voltages Erasure time is long Must be erased in blocks Limited number of erasures

Provides burst mode access:


A Flash Memory is very useful in combination with SRAM or SDRAM devices, since it can load these devices at power-on
9 September 4, 2007 IL2206 Embedded Systems 10

September 4, 2007

IL2206 Embedded Systems

Memory Access Times and Costs


Memory Technology SRAM DRAM Magnetic disk Typical Access Time 0.5 ns -5 ns 50 ns 70 ns 5,000,000 ns 20,000,000 ns $ per GB in 2004 $4000 - $10000 $100 - $200 $0.5 - $2

Embedded system memories


Large fast memories are very expensive Embedded systems have to be produced at a low cost

single SRAM main memory is in general too expensive combination of fast and slow memories is often still feasible

Source: Patterson and Hennessy, 2004


September 4, 2007 IL2206 Embedded Systems 11 September 4, 2007 IL2206 Embedded Systems 12

Caches

Memory is a bottleneck

Large fast memories are too expensive, but small fast memories are feasible A cache memory is a small, but fast memory that is located near the CPU to reduce memory access times Ideally the processor does only need to access the cache and not the main memory

While the CPU is fast, each memory access takes long time and slows down the system

Caches can increase the performance, if most memory requests do not need to access the main memory
CPU
(fast)

CPU
(fast)

Memory
(very slow)

Memory Cache
(fast) (very slow)

Bus
(slow)

Bus
(slow)

September 4, 2007

IL2206 Embedded Systems

13

September 4, 2007

IL2206 Embedded Systems

14

Caches and CPUs


address cache controller data cache address data main memory

Cache operation

Many main memory locations are mapped onto one cache entry May have caches for:

CPU data
2000 Wolf (Morgan Kaufman)

instructions; data; data + instructions (unified).

Memory access time is no longer 2000 Wolf (Morgan deterministic! Kaufman)


IL2206 Embedded Systems 16

September 4, 2007

IL2206 Embedded Systems

15

September 4, 2007

Terms

Types of misses

Cache hit: required location is in cache. Cache miss: required location is not in cache. Working set: set of locations used by program in a time interval.

Compulsory (cold): location has never been accessed. Capacity: working set is too large. Conflict: multiple locations in working set map to same cache entry.

2000 Wolf (Morgan Kaufman)

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

17

September 4, 2007

IL2206 Embedded Systems

18

Memory system performance


Write operations

h = cache hit rate. tcache = cache access time, tmain = main memory access time. Average memory access time:

Write-through: immediately copy write to main memory


Causes unnecessary memory communication Memory has always a valid copy of the cache block

tav = htcache + (1-h)tmain

Write-back: write to main memory only when location is removed from cache

Tries to minimize communication with memory Memory may have an invalid copy of the cache block. Must be updated, when a cache block is replaced

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

19

September 4, 2007

IL2206 Embedded Systems

20

Replacement

Cache performance benefits


Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies:

Random. Least-recently used (LRU).

Keep frequently-accessed locations in fast cache. Cache retrieves more than one word at a time.

Sequential accesses are faster after first access.

In case of a modified cache entry in a write-back cache replacement means also to write the contents of the dirty cache entry back to the memory. Thus a cache miss can be expensive!
IL2206 Embedded Systems 21

2000 Wolf (Morgan Kaufman)

September 4, 2007

September 4, 2007

IL2206 Embedded Systems

22

Data Transfer to Cache


Cache organizations

Words are transferred between cache and processor Blocks (of multiple words, given by the block size) are transferred between cache and memory
Word Transfer Block Transfer


Main Memory

CPU

Cache

Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). Direct-mapped: each memory location maps onto exactly one cache entry. N-way set-associative: each memory location can go into one of N entries.

September 4, 2007

IL2206 Embedded Systems

23

September 4, 2007

IL2206 Embedded Systems

24

Direct-mapped cache

Cache Line 0 1

Example Direct Mapped Cache


A direct-mapped cache consists of several cache lines, where each cache line has a status bit, a tag and data (cache block) There is a given mapping for each memory location!
Cache Block Tag Wd 0 Wd 0 Wd 1 Wd 1 Wd 2 Wd 2 Wd 3 Wd 3 Memory Address 0 10 Block 1 20 Block 2 30 Block 3 40 Block 4 50 Block 5 60 Block 6 70 Block 7 80 Block 8 FF0


Block 0

7 Status Bit

Wd 0

Wd 1

Wd 2

Wd 3

Cache has 2 KBytes (512 words), organized as 64 cache lines with a block size of 8 words Memory has 64 Kbytes (16 KWords), which can be seen as 2048 blocks of 8 Words Address size is 16 bits The direct map technique uses the modulo (remainder) operation to map on a cache block

Block 0, 64, 128, ... is mapped on Block 0 in the cache Block 1, 65, 129, is mapped on Block 1 in the cache

September 4, 2007

IL2206 Embedded Systems

Block 1024

25

September 4, 2007

IL2206 Embedded Systems

26

Example Direct Mapped Cache


Main Memory Memory Address
5 Tag 6 Block 3 2 Word Byte Offset

Direct-mapped cache
Block 0 Block 1 0x0000 0x0020

Cache Line 0 Line 1


Block 63 Block 64 Block 65
0 4 1 5 2 6 3 7

1 valid

0xabcd tag

byte byte byte data cache line

Line 63
Block 127
1 5 32 Data (8 words)

A block has 8 words

tag

index offset = hit value byte


( or halfword/word)

Valid Tag

Block 2047

0xFFE0
27 September 4, 2007 IL2206 Embedded Systems 28

September 4, 2007

IL2206 Embedded Systems

Direct-mapped cache locations


Example 2-way set-associative cache


Memory Address Main Memory
Block 0
6 Tag 5 Set
Set 0 Set 1

Many locations map onto the same cache block. Conflict misses are easy to generate:

5 Offset

Cache Way 1 Way 1

Block 1

Way 0 Way 0

Array a[] uses locations 0, 1, 2, Array b[] uses locations 1024, 1025, 1026, Operation a[i] + b[i] generates conflict misses.

Block 31 Block 32 Block 33

0 4

1 5

2 6

3 7

Set 31

Way 0

Way 1

A block has 8 words

Block 127
1 2000 Wolf (Morgan Kaufman) 6 32 Data (8 words) Valid Tag

Block 2043
IL2206 Embedded Systems 29 September 4, 2007 IL2206 Embedded Systems 30

September 4, 2007

Set-Associative Caches
One-way set associative (direct-mapped)
Block (Set) 0 1 2 3 4 5 6 7 Tag Tag Tag Tag Tag Tag Tag Tag Data Data Data Data Data Data Data Data

Fully associative cache



Data Data Data Data Tag Tag Tag Tag Data Data Data Data

Two-way set associative


Set 0 Tag Tag Tag Tag

1 element per set

1 2 3

2 elements per set

There is a complete freedom, where to place a block in the cache But all blocks have to be searched for the correct tag pattern In order to have an acceptable performance, the tags must be searched in parallel

Eight-way set associative (fully associative)


Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

8 elements per set


September 4, 2007 IL2206 Embedded Systems 31 September 4, 2007 IL2206 Embedded Systems 32

Example caches

Summary Memory Systems


StrongARM:

16 Kbyte, 32-way, 32-byte block instruction cache. 16 Kbyte, 32-way, 32-byte block data cache (write-back). 512 Bytes to 64KBytes direct-mapped I- and Dcache with a cache block size of 4 (D), 16(D) or 32(I&D) Bytes
IL2206 Embedded Systems 33

Memory is a bottleneck in the system Different memories exist


Cost increases with memory performance

A cache memory can significantly decrease execution time at low cost


Nios II

Execution time is very hard to predict Problem for design of real-time systems Locality is important to utilize caches efficiently There can be several level of different caches Embedded systems have usually only one cache level
IL2206 Embedded Systems 34

September 4, 2007

September 4, 2007

Input and Output Devices


Input/Output

Input/Output Devices are used to communicate with the environment An example is a UART (Universal Asynchronous Receiver/Transmitter) These devices (like other peripheral devices) are controlled by reading and writing to registers
Register Select Data Bus
Control Signals

Status Register Mode Register

Output Input

Data Register

I/O Device
September 4, 2007 IL2206 Embedded Systems 36

Serial communication

Universal Asynchronous Receiver/Transmitter (UART)


Characters are transmitted separately

Component for serial to parallel conversion Has a serial receiver/transmitter Many parameters can be configured

no char start bit 0 bit 1 ... bit n-1 stop time


2000 Morgan Kaufman (Wayne Wolf)

Baud rate Number of bits per character Parity bits Length of Stop Bit

September 4, 2007

IL2206 Embedded Systems

37

September 4, 2007

IL2206 Embedded Systems

38

Memory-Mapped I/O

Memory-Mapped I/O

Peripheral Components can be connected to the processor by memory-mapped I/O The components can be reached via a separate address space Memory-mapped I/O requires extra hardware for address decoding

The output chip-enable has to be active, when the input of the decoder is a correct address Other address bits are used for register select The decoder can be implemented with a small block of programmable logic or custom hardware (VHDL)
Register Select

Addressbus CPU

Decoder

Chip Enable Read/Write Peripheral

Interface to Environment

Databus
September 4, 2007 IL2206 Embedded Systems 39 September 4, 2007 IL2206 Embedded Systems 40

Example Memory-Mapped I/O


Accessing Memory Locations in C


R0 R1 ... R7 Databus (D31-D0) (D7-D0)

A device with 8 8-bit-registers shall be connected to the address 0x1000


0x00001002
Addressbus (ADR31-ADR0) ADR3 -ADR31 Decoder ADR2 ADR1 ADR0

0 1 0

RS2 RS1 RS0

Symbolic names can be defined for memory locations #define MEM_LOCATION 0x18

1
CE

Functions can be defined to access memory


Registers

Active when ADR12=1 and all others are 0!

The registers can now be accessed in the address space 0x1000 (R0) until 0x1007 (R7) movia r1, 0x1002 movi r3, 0x08 stb r3, (r1) set bit 3 and clears all other bits device register R2

September 4, 2007 IL2206 Embedded Systems 41

peek can be used to read a memory location (byte) char peek(char *location) {return *location;} poke can be used to write to a memory location (byte) void poke(char *location, char newval) {*location = newval;}
September 4, 2007 IL2206 Embedded Systems 42

Dont do this!

Memory Locations shouldnt be accessed directly!


Busy Wait I/O


Software shall be flexible


Hardware could change


Programmers may make mistakes that the compiler would not do (e.g. memory alignment) HAL (Hardware Abstraction Layer) offers optimized device drivers to access peripheral devices and memory

Busy Wait I/O is the most basic way to communicate with an I/O-device The processor wait until the I/O-device has completed its current task Disadvantage: Processor cannot be used for other tasks during the waiting period! This method is also often called polling!

Example: Sending string via serial link


Busy Wait I/O Pseudo Code:


Characters = String; While not all characters sent Send next character; While Sender = Busy Wait; Done!

September 4, 2007

IL2206 Embedded Systems

43

September 4, 2007

IL2206 Embedded Systems

44

C-Programming Testing of Bits


In order to test specific bits, it is needed to mask the other bits Example: Busy Flag: Busy = 1; Non-Busy = 0

7 0x1000 0x1001
September 4, 2007 IL2206 Embedded Systems

C-Programming Testing of Bits


define Status 0x1000 define SendBuf 0x1001 char *myString = Hello World; char *current_char;
7 Status Sender Sender Buffer
45 September 4, 2007

5 BF

0 0x1000 0x1001

5 BF

0 Status Sender Sender Buffer

IL2206 Embedded Systems

46

C-Programming Testing of Bits


Here you should use HAL functions!

Simultaneous busy/wait input and output


Example: Copying Characters from Input to Output

while (current_char != \0) { poke(SendBuf, *current_char++); while ((peek(Status) & 0x20) != 0) ; } /* Mask needed, since other bits */ /* in status register may not be zero */
7 0x1000 0x1001
September 4, 2007 IL2206 Embedded Systems

Busy Wait I/O Pseudo Code:

5 BF

0 Status Sender Sender Buffer


47

Loop While inBuffer busy Wait; Read Character Copy Character to Output Buffer Send Character While outBuffer busy Wait;

September 4, 2007

IL2206 Embedded Systems

48

Interrupt I/O

Interrupt Scheme
Interrupt Request

Busy/wait is very inefficient.


CPU cant do other work while testing device. Hard to do simultaneous I/O.

CPU

Interrupt Acknowledge Data/Address

Device

Interrupts allow a device to change the flow of control in the CPU.


Causes subroutine call to handle device.

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

49

September 4, 2007

IL2206 Embedded Systems

50

Interrupt physical interface


Interrupt behavior

CPU and device are connected by CPU bus CPU and device handshake:

device asserts interrupt request; CPU asserts interrupt acknowledge when it can handle the interrupt.

Based on subroutine call mechanism Interrupt forces next instruction to be a subroutine call to a predetermined location

Return address is saved to resume executing foreground program

2000 Wolf (Morgan Kaufman)

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

51

September 4, 2007

IL2206 Embedded Systems

52

Programming Interrupt
Foreground Program
Do something Interrupt Event

Receive-Send with Polling


Interrupt Handler

Save Registers Handle Interrupt Restore Registers Restore PC Clear interrupt disable flag

Assume a program that as part of its duties receives characters and sends them further to another device Solution with polling:
loop
Wait for new character; Do something; Send character;


Interrupt Vector

Branch to Interrupt Handler

end loop; System cannot do anything while it waits for a new character until the sender is ready System resources are utilized very inefficiently!

September 4, 2007

IL2206 Embedded Systems

53

September 4, 2007

IL2206 Embedded Systems

54

Better Receive-Send Implementation with Interrupt


Better Receive-Send Implementation with Interrupt


Parallization of duties

Wait for new character (interrupt)


If character is received it is stored in a buffer Work with the stored buffer elements Check if transmitter is ready and send the first character of the buffer
IL2206 Embedded Systems 55

Do Something (foreground program)


System can do other thing while waiting for receiver or sender Buffer is needed to store elements Size of buffer must be chosen carefully

Send character if transmitter ready (interrupt)


too small => buffer overflow too large => too expensive design

September 4, 2007

September 4, 2007

IL2206 Embedded Systems

56

Typical Embedded Design Problems


Send-Receive with Circular Buffer (Wolf)


Embedded Systems are inherently parallel (concurrent), since they interact with heterogeneous environment

Parallization allows for a faster processing, since work can be done in parallel Waiting times can be avoided

Independent receive, send realized by two interrupt routines Receive-interrupt routine Puts a character into queue Send-interrupt routine Sends a character, when sender ready

The need for buffers is a logical consequence of parallization

System designer needs to find the right amount of parallization and the right buffer size!
September 4, 2007 IL2206 Embedded Systems 57

head headtail
September 4, 2007

tail
IL2206 Embedded Systems 58

Send-Receive with Circular Buffer (Wolf)


Send-Receive sequence diagram (Wolf)


:foreground :input :output :queue empty a empty

A circular buffer can be realised in a memory with a pointer for head and tail If a pointer is at the end of the buffer, the next position is the start of the buffer
i f g h

b bc

tail
September 4, 2007

head
IL2206 Embedded Systems 59 September 4, 2007 IL2206 Embedded Systems

c
2000 Wolf (Morgan Kaufman)
60

Debugging interrupt code


Prioritized Interrupts

What if you forget to change registers?


Foreground program can exhibit mysterious bugs Bugs will be hard to repeat---depend on interrupt timing It is difficult to debug an interrupt routine!

Some CPUs (as Nios II) support several interrupt levels by their hardware Otherwise extra hardware (priority decoder) can be used to create several levels of interrupt

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

61

September 4, 2007

IL2206 Embedded Systems

62

Interrupt prioritization

Example: Prioritized I/O


:interrupts B C A A,B
2000 Wolf (Morgan Kaufman)

Masking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete. Non-maskable interrupt (NMI): highestpriority, never masked.

:foreground

:A

:B

:C

Often used for power-down.

2000 Wolf (Morgan Kaufman)

September 4, 2007

IL2206 Embedded Systems

63

September 4, 2007

IL2206 Embedded Systems

64

Sources of interrupt overhead


Summary

Handler execution time Interrupt mechanism overhead Register save/restore Pipeline-related penalties Cache-related penalties

Peripherals can be made accessible for software by memory mapped I/O Two basic approaches for communication with I/O device

polling processor checks, if data has arrived interrupt processor is notified, if data has arrived

Interrupt is not always better than polling!


September 4, 2007 IL2206 Embedded Systems 65 September 4, 2007 IL2206 Embedded Systems 66

Anda mungkin juga menyukai