Anda di halaman 1dari 13

The Exam 2 Ultimate Study Guide

Relocation

OS is highest in memory, meaning that the process starts at zero with a maximum address of
(Memory size -OS size).
Base address (the first, aka smallest physical address), aka the relocation address
Limit address (the largest physical address the process can access), aka bound.
When a process is loaded it is placed in a segment of contiguous memory, if it doesn't fit then the OS
waits for a process to terminate.
Dynamic Relocation: OS can move processes while they run
Static Relocation: OS adjusts the
Privileged Bound register and base register are added (context

Uniprogramming:

only one program into memory

switches must save (in PCB) and restore (to CPU) the bound
and base registers). Checks the address and adds base in
parallel.

processes at load time, processes cannot


be moved when program runs.

Compilation: lexical locations change to


labels and instructions are converted to
assembly

Assembly: the labels become addresses


in a logical address space.

Linking: addresses are adjusted in a new


logical address space because of library
routines. (addresses shift)

Loading: addresses are translated into


actual physical addresses in the system
(addresses determined by where the
system has room for the process)

Virtual Memory Goals

Advantages

OS can easily move


process during
execution
Processes can grow
over time
Protection is easy
Simple, fast
hardware

Disadvantages

Requires contiguous allocation


Sharing is hard
Degree of multiprogramming is limited
(all active processes must fit in
memory)
Slows down hardware because of add
on every memory reference
Complicates memory management

MEMORY MANAGEMENT
The OS tracks which memory is available and what is utilized.
Given a memory request from a starting process the OS must
chose which gap to use for the process.
External Fragmentation
Internal Fragmentation
Unused memory between units
Unused memory within a unit
of allocation, gaps between
of allocation
processes

Eliminating External Fragmentation:

Compaction: relocating programs to coalesce holes, move gaps


to be adjacent.
Swapping: preempt processes , roll them out to disk, and reclaim
their memory. Allows the total memory being used by all
processes to exceed the physical memory
n bytes, use the first available
First Fit: Tofreeallocate
block with a block size larger than n.
Requirements:

Free list sorted by address

Allocation requires search for a


good fit

De-allocation requires a check


for coalescing with adjacent free
slots.
Advantages

Simple to implement

Produces larger free blocks


towards the end of the address
space
Disadvantages

Slow allocation

External fragmentation

One process executes at a time


and is always loaded starting at
address 0. Executes in a
contiguous section of memory
OS gets fixed part of memory,
meaning that the max address is
the (memory size OS size). OS
is protected using
address checking.
Compiler generates
physical addresses
because it knows
exactly where the
program is.

Relocation allows multiple processes


to share memory, but it makes it
expensive for processes to grow.
Transparency
Processes should coexist in
memory, should not know they
share memory, not care about what
physical portion of memory they
get.
Safety
Processes must not be able to
corrupt each other or the OS.
Efficiency
Performance of the CPU and
memory should not be degraded
due to sharing

allocate n bytes, use the smallest available


Best Fit: To
free block with a block size larger than n.
Requirements:

Free list sorted by size

Allocation requires search for a


good fit

De-allocation requires a check


for coalescing with adjacent free
slots.
Advantages

Relatively simple

Works well when with small


allocations
Disadvantages

Slow de-allocation (because of


coalescing size-sorted freelist)

External fragmentation

allocate n bytes, use the largest available


Worst Fit: To
free block with a block size larger than n.

Requirements:

Free list sorted by size

Allocation is fast

De-allocation requires a check


for coalescing with adjacent free
slots.
Advantages

Works well if allocations are


medium sized
Disadvantages

Slow de-allocation (coalescing)

External fragmentation

Large partitions cant be allocated


because it breaks large free
blocks up over time.

Overlays

When process is too large to fit into memory it is


divided into pieces called overlays.

Overlay manager swaps overlays in and out

Programmer had to manually divide program


into overlays
Page Table

A kernel data structure, only one table per


process (part of the process' state). This keeps
different processes from corrupting each others
page tables and being able to access each
others memory

Stored in physical memory.

Flags: dirty bit (has it been written?), resident bit


(is the page resident in memory?), clock/
reference bit (has the page been referenced?)
Translation Lookaside Buffers (TLB)

Caches recently accessed frame/page pairings


in the MMU.

If TLB misses the translation is updated in the


TLB.

If the TLB hits it can be completed in 1 cycle.

Has a high hit ratio because they exploit locality


within the system.
Virtual Memory: The Big Picture

Process' virtual address space has its code,


data, and stack.

Code pages are stored in a user's file on disk

Data and stack pages are also stored in a file


not typically visible to users, it exists only while a
program is executing

OS determines which portions of a process'


virtual address space are mapped in memory at
any one time.

Virtual Memory
Process' view of memory

Can be much larger than the size of physical memory, no longer limited by the size
of the machine .Only portions of the virtual address space are in physical memory.
Paging

Divide a process' virtual address space into fixed sized pages. Stores a copy of that
address space on disk.

Views the physical memory as a series of equal sized frames.

Moves the pages into frames in memory (based on a policy)

Allows shared memory, the memory used by a process no longer needs to be


contiguous

A shared page may exist in different parts of the virtual address space of each
process, but the virtual addresses map to the same physical address.

No external fragmentation, but we introduce internal fragmentation.


Performance Issues with Virtual Address Translation
SPEED: virtual memory references requires two memory references (one to get the
page table entry, and one to get the data, TLB improves this by exploiting locality).
SPACE: overhead in the page table data structure and fragmentation with pages. Page
tables can be very large.

Frames

Partitioned physical memory.


Frame Offset Pair:(frame number,
frame offset)

Pages

Partitioned virtual address space.


Page Offset Pair: (page number, page
offset)

Virtual Address Translation

Page Faults

References to non-mapped pages generate page


faults.
Page faults are handled by:

Processor runs the interrupt handler

OS blocks the running process

OS starts read of the unmapped page

OS resumes or initiates another process

Read of page completes (interrupt received)

OS maps the missing page into memory

OS restarts the faulting process


Number of virtual addresses
N
2^n
Number of virtual address bits

logN
c+a

Page size

2^a

Page offset bits

logA

Number of Pages

2^c
2^(n-a)

Page number bits

log(C)
n-a

Number of physical addresses,


bytes of physical memory

2^m

Number of physical address bits

logM
d+a

Frame size

2^a

Frame offset bits

logA

Number of Frames

2^d
2^(m-a)

Frame number bits

logD
ma

Number of entries in first level

2^e

Number of bits representing first


level

logE

1. Program gives vaddr to CPU to translate


2. MMU splits the vaddr into the page number and offset number
3. Offset bits don't change between pages and frames so they are sent along without
translation.
4. The page number is translated to a frame number
1. Page number is sent simultaneously to the TLB and the page table.
1. If there is a hit in the TLB (the pair was recently used), it stops looking in
the page table and sends the frame number along. If there is a miss in the
TLB then the page table is used instead and the TLB's translation is
updated.
2. Page number is used to index into the page table. The PTBR (page table
base register) is added to the page number to get the index. If the page
exists in physical memory then the frame number is taken from the page
table. If it doesn't exist in physical memory, but it exists on disk then the
page is moved from disk into a frame and its frame number is passed
along.
5. Append the offset to the frame number to get the full address.

Caches
SRAM: L1, L2, and L3 caches between the
CPU and main memory. Cache
misses are served from DRAM
DRAM: the VM system's cache, caches virtual
pages in main memory. Slower than
SRAM, but faster than disk. Expensive
misses because misses are served
from disk.
Fully Associative Cache: each cache access
examines all blocks of cache to see if
any of them contains the location being
accessed. Not feasible for realistically
large caches
Set Associative Cache: Restrict any given
memory location to only a small set of
positions within the cache
Segmentation Fault
Mapping invalid because it's unallocated to the
Add additional levels of indirection to the page table by subdividing page number into
process
k parts. Creates a k-depth tree of page tables. No allocation necessary.
Each entry in the first level page table points to a second level page table, and each Temporal Locality
If a process accesses an item in memory, it will
entry in the second level page table points to a third level page table, and so on.
tend to reference the same item again soon.
The kth level page table entries actually hold the bit/frame number information that
is usually in a page table
Spatial Locality
Saves space because you only allocate space for the page-table levels that you
If a process accesses an item in memory, it will
need.
tend to reference an adjacent item soon

Multilevel Paging

There will always only ever be one first level page table.

Hashed/Inverted Page Tables

Make pages tables proportional to the size of the physical address space. One entry
for each physical page. Has less overhead because there is no per process data
structure required. Works well with larger virtual address spaces.
Each entry contains

Residence bit

Occupier: the page number of the page in the frame

Protection bits

Hash page number to find the corresponding frame number.

Hash function uses the PID and the page number. The result of the hash function
is added to the PTBR to get the index address of the right page table entry.

At the address of the page table the PID and the page number are checked
against the ones stored in the page table.

If they match then the right frame has been found and the frame number (the
index into the table) is sent along.

If they dont match that means the frame isn't holding the information for that
page and it needs to be swapped in.

Memory Mapping
Area: contiguous chunk of allocated virtual memory whose pages are related in some
way (examples: heap, user stack, data segment, code segment), allows the virtual
address space to have gaps, AKA segments
Memory mapping: associating the contents of a virtual memory area with an object on
disk. There are two types of object:

Regular file in file system: an area can be mapped to a contiguous section of a


regular disk file. The extra space in the file section is padded with zeros.

Anonymous file: contains all zeros, created by the kernel, no data transfer between
disk and memory. AKA demand-zero pages

when a page
Fetch Policy: Decides
is assigned a frame.
When a process starts: OS could immediately assign
frames and create mappings for all the pages and
load all of the pages.
Demand Paging: OS creates the mapping for each
page in response to a page fault when accessing that
page. Never wastes time creating a page mapping
that goes unused, but it has to have to full cost of a
page fault for each page. Linux uses demand this for
zero pages because of the low page fault cost.
GOOD WHEN: the processes has limited spatial
locality and when page fault cost is low.
1. A process arrives
2. The OS stores the process' virtual address
space on disk and does not put any of it into
memory.
3. As the process executes, pages are faulted in.
Prepaging: OS anticipates future use of pages and
pre-loads them
1. Process needing k pages arrives
2. If k frames are free the OS allocates all k pages
to the free frames If there aren't enough free
frames the OS must replace some frames.
3. The OS puts the first page in a frame and
updates the page table (put the frame number
in the first entry), and so on.
4. The OS marks all TLB entries as invalid
(flushes the TLB).
5. OS starts a process.
6. As process executes, OS loads TLB entries as
each page is accessed, replacing an existing
entry if the TLB is full.
Clustered Paging: each page fault causes a
cluster of neighboring pages to be fetched,
including the one causing the fault. In Linux, the
cluster of pages are put into RAM in the page
cache, this way subsequent page faults can
quickly find pages in the page cache. Changes
major page faults (reading disk) to minor page
faults (just updating the page table). AKA
readaround
Overlays: programmer indicates when to load and
remove pages from frames.

Page Placement Policy Selecting a page frame for a page.

Allocating to few frames to a process can cause a


massive jump in page faults.
The Working Set Model
The set of all pages that a process referenced in the
last T (window size) seconds and they are kept in
pages memory. Assumes that recently referenced
pages are likely to be referenced again (locality!). After
initial paging overhead, references to working set result
in hits with no disk traffic. Allows pre-paging.
Page Replacement Policy Selecting an inactive page to move to disk

If T is too big, you waste a lot of memory


in order to free up a page frame.

If T is too small you kick pages out too soon and


Freeing Pages in Advance:
thrashing will occur

Low Water Mark: a minimum number of free frames, if the number of frames
Thrashing
drops below the low-water mark then the replacement policy starts freeing
Memory is overcommited and pages are tossed out
pages until the high-water mark is passed.
while they are still in use. Pages are swapped in and

Advantages: last minute freeing in response to a page fault will delay the
out continuously. Can happen when the working set
faulting process, evicting dirty pages (pages that have been written to) requires
size exceeds the size of physical memory(either the
writing out to disk first, evicting early allows us to write back several pages in a
sum of all working sets, or the a single process'
single operation (maximizing efficiency).
working set, or when T is too small.
Freeing pages when you need them:
Load Control
When a page is referenced that is not in memory and memory is full:
The number of processes that can reside in memory at
1. The OS selects a page to replace (using a page replacement algorithm)
one time. Reduces thrashing. Adds a new stage to the
2. It invalidates the old page in the page table
process life cycle, suspended. When the total number
3. It starts loading the new page into physical memory from disk
of pages needed is greater tan the number of frames
4. Context switch to another process while I/O is being done
available processes are swapped out to disk. A process
5. Get interrupt when page is finished being loaded into memory
in any state can be suspended, and a suspended
6. Updates the page table entry
process can only go to ready. Modern OS dont use a
7. Continues the faulting process
load control system, they use best effort systems.
Paging Advantages
Paging Disadvantages
Belady's Anomaly

Eliminates external fragmentation


Translation from virtual to physical
Increasing the number of frames available may

Allows sharing of memory pages


addresses is time consuming

amongst processes
Requires hardware support (TLB) to be increase the number of page faults rather than
decreasing as one would expect. Optimal and LRU are

Enables processes to run when they are


efficient

only partially loaded into main memory


Requires more complex OS to maintain immune to this.
page table
Page Coloring: Assume that pages that would not conflict without without virtual
address translation should not conflict even with address translation. It leaves intact
any careful allocation done at the virtual address level. Cache conscious
Bin Hopping: assumes pages that are mapped into frames around the same time
are likely to be accessed around the same time and should therefore be given
nonconflicting frames.

Global Page Replacement


Considers all of the pages when evicting.
Unix uses this.

Local Page Replacement


Only consider the pages owned by the faulting
process. If a process has many page faults it
will have to give up its own frames.

FIFO

Optimal

Throw out oldest page, easy to implement, OS can throw out frequently
accessed pages

Look into future and throw out page that will be access farthest in the future,
impossible in real life.

LRU

Clock

Approximation of optimal, uses past to predict future, throw out the page that
has not been used in the longest time.
Option One: Keep a time stamp for each
page representing the last access. OS must
record time stamp for each memory access
and search all pages to find one to toss.
Slow! Lots of things to update often.

Option Two:Keep a doubly linked list of


pages, where the front of the list is the most
recently used and the end is the least
recently used. Move page to front on access.
Approximation of optimal, uses past to predict future, throw out the page that
Expensive, OS must modify lots of pointers
has not been used in the longest time.
on memory access.

Heap Memory Management


Explicit Memory Management: The programmer explicitly
manage s all the memory

Allocation & deallocation: malloc/new & free/delete


Anything may or may not be a pointer

User Role

Allocate memory by requesting a


number of bytes

Request deallocation of memory


when its no longer used

Runtime System Role

Receives requests for memory

Identifies good location for allocation,


if there is not enough space it
requests more memory from OS

Returns pointer

Frees allocation on request

Runtime system must:

Handle arbitrary request sequences (memory may be allocated and


freed in any order)

Make immediate responses to requests (no improving performance


with buffering or reordering requests)

Use only the heap to store

Align blocks (to be able to hold any type of data struct)

Not modify allocated blocks (can only manipulate or change free


blocks, can't move or modify blocks after they're allocated)
If you run out of heap:

Ask OS for additional memory (a large chunk)

Insert the new chunk into the free list

Try to allocate again, this time successfully


Challenges for the User

More code to maintain

If you free an object too soon core dump

If you free an object too late waste space and slow system

If you never free best case waste space, worst case run out of
memory
Organizing Free Lists

LIFO order + first fit: allocator inspects most recently used blocks
first, Freeing a block is constant time

Address order + first fit: address of each block in the list is less than
the address of it successor, better memory utilization than LIFO first
fit, but has linear time search to locate predecessor to the freed block
Buddy System: a segregated fits free list where the bins are powers of
2, requested block sizes are rounded up to the nearest power of 2. Has
fast searching and coalescing, can cause significant internal
fragmentation.
Clock Second Chance It's cheaper to replace a page that hasn't been
written since it doesn't need to be written back to disk. Modify the clock
algorithm to allow dirty pages to always survive one sweep of the clock hand
to avoid extra expensive disk writes.

Check both the reference bit and the modify bit to determine which page to
replace (reference bit, modify bit).

On page fault, OS searches for lowest page in the lowest non-empty class:

(0,0) not recently used or modified replace

(0,1) not recently used, but modified OS needs to write, but may
not be needed anymore

(1,0) recently used, not modified may be needed again soon, but
doesn't need to be written

(1,1) recently used and modified


Implementation One: OS goes around at mos three times searching for the
(0,0) class.

If the OS finds (0,0) it replaces that page

If the OS finds (0,1)

It indicates an I/O to write that page

Locks the page in memory until the I/O completes

Clears the modified bit

Continues the search in parallel with the I/O

For pages with the reference bit set, the reference bit is cleared

On the second pass (no (0,0) page found on first pass), pages that were
(0,1) or (1,0) may have changed.
Implementation Two: OS goes around at mos three times searching for the
(0,0) class.

If the OS finds (0,0) it replaces that page

If the OS finds (0,1) clear the dirty bit and move on. Remember (possibly in
a list of pages) that the page is dirty, write only if the page is evicted.

For pages with the reference bit set, the reference bit is cleared

On the second pass (no (0,0) page found on first pass), pages that were
(0,1) or (1,0) may have changed.
Algorithm

When a page is referenced its reference bit is set, if the page was written
to (modified) the dirty bit is set.

During first sweep of hand the used bit is cleared

During second sweep the dirty bit is cleared (OS keeps track of what
pages are really dirty)

Only replace pages that have (0,0), so if a page is dirty it had to wait 2 full
sweeps to be replaced.

Bump Pointer

Contiguous allocation for all requested blocks, pointer begins at start of


heap. As requested, bytes are allocated and the pointer is bumped past
allocation. No recycling of memory, can't bump back memory. Quite fast.
Pointers aren't nulled out after so they still hold their old addresses, you'll get
SEGFAULTs if you try to access them.

Free List

Divides memory into blocks, and maintains a list of free blocks (stored on
the heap). To allocate memory, find a block in the free list using (first fit, best
fit, worst fit, etc). To deallocate memory put memory back on the free list.

Free Block

Pointer to next free


block
Size of the free block
Free space

Case One: A perfectly sized free block is found

Remove element from list

Link previous element with the next element

Return the current element to the user (skipping the header)

Case Two: The free block is bigger than requested

Divide the free block into two blocks

Keep the first block in the free list

Allocate the second block to the user

Deallocation with a free list

User passes a pointer to the memory lock

The start of the entry is identified

Find the location in the free list

Add to the list, coalescing entries if needed to avoid false fragmentation

Binning: Exact Fit Have bin for each block size, except for last bin
which holds all larger free blocks (which can be broken up later!). Faster
allocation, but takes up more space. AKA simple segregated storage

Binning: Range Have bin for a range of block sizes (requires fewer
bins, but you have to search for a good sized block within the bin ), except
for final bin which holds larger free blocks. AKA segregated fits.

Non-copying Reclamation
Uses free-list allocation and reclamation, only way for explicit memory
management
Mark Sweep: free list + trace + sweep-to-free
1. Get a pointer to free space from the free-list(that uses binning)
2. If there is no memory of the right size then a collection is triggered
3. Mark Phase: Transitive closure marking all the reachable objects
4. Sweep Phase: sweep the memory for unreachable objects
populating the free list (put unreachable objects back in the free
list). Can be made incremental by organizing heap in blocks and
sweeping one block at a time on demand.

Advantages: Space efficient, incremental object reclamation is


possible, garbage collection is simple and very fast.

Disadvantages: Relatively slower allocation time (because of free


list), poor locality of contemporaneously allocated (close to each
other in time) object (because free list is sorted by size)

Copying Reclamation
(Generally) uses bump pointer allocation
En masse reclamation
Mark-Compact: bump allocation + trace + compact
1. Use the bump pointer to allocate
2. Mark Phase: Transitive closure marking all the reachable objects
3. Compact Phase: Copy all the remaining objects (reachable
objects) to one end of the of heap, this is why we can use the
bump pointer

Advantages: fast allocator that exhibits good locality, space


efficient

Disadvantages: garbage collector has expensive(slow) multipass


collection, worse total performance than mark-sweep
Semi-Space: bump allocation + trace + evacuate

Divides the heap into to space and from space.


1. Trace Phase: When the to space is full garbage collection is
triggered and the trace algorithm is run. The to space becomes
the free space and reachable objects are copied into the other
part of the heap
2. Mark Phase:
1. Copies object to the to space (the other half of the heap)
when collection first encounters it.
2. Installs forwarding pointers (tell where objects are now in
case another pointer still points to the old address).
3. Performs transitive closures, updating pointers as it goes so
to space will have only reachable objects in it.
4. Reclaims from space en masse (other half is wiped clean)
5. Start allocating again to the to space.

Advantages: Fast allocation time (same speed as mark-compact


because of bump pointer), good locality of contemporaneously
allocated objects, good locality of objects connected by pointers
(best locality of the three, copying during trace causes related
objects to be copied near each other), garbage collection is faster
than mark-compact (collection done in a single pass)

Disadvantages: Wasted space (can only use half the heap at a


time)

Automatic Memory Management


The programmer explicitly allocates memory, but the system manages it
with garbage collection.
Program and system know all pointers

Less user code than explicit memory management, meaning less


source of errors

Protects against some classes of memory errors. No free() option


so you can use free prematurely, twice, or forget to do it at all.

Not perfect, memory can still leak. Garbage collectors are


conservative.
Safe Pointers
So that programs may not access arbitrary addresses in memory, the
compiler can identify and provide to the garbage collector all of the
pointers, therefore once garbage, always garbage. Runtime system
can move objects by updating pointers.
Pause Time
Time the program is paused from doing its work while garbage is being
collected
Garbage
In theory: any object the program will never reference again (dead
objects). Cant be identified by compiler or runtime system.
Evaluating garbage collection algorithms:

Space efficiency

Efficiency of allocator

Time to allocate

Locality of contemporaneously allocated (allocated around the


same time)

Time to identify and collect garbage


Taxonomy of Garbage Collector Design Choices

Incrementally

Bounded tracing time

Conservative assumption: all objects in rest of heap are live

Remember pointers from the rest of heap (add remembered set


to roots for tracing)

Composablity (creation of hybrid collectors)

Concurrency (allocator and garbage collector operate concurrently

Parallelism (concurrency among multiple garbage collection threads

Distribution (distribution across multiple machines, implies other four


aspects)

Identifying Garbage
Reference counting:
Count the number of references to each object, if the reference
number is 0 the object is garbage
Doesn't work for circularly linked lists with no external
pointers(cycles)
Tracing:
Trace reachability from program roots (registers, stack, stack
variables) and mark reachable objects
Objects not traced are unreachable.
Heap Organization & Incremental Collection
It takes too long to trace the whole heap at once (long pause
times). Also why collect long living objects repeatedly? Incremental
Collection: divide the heap into increment and collect one at a time.
Generational Hypothesis: young objects die more quickly than
older ones, older objects are more likely to survive garbage
collection.
Generational Heap Organization: Divide the heap into a young
space and an old space.

Allocate into the young space. When young space fills up,
garbage collect it and copy into the old space (emptying the
young space). Allocate into young space again.

When old space fills up, collect the young and the old spaces
and move still live objects to the new to space

Device Hardware
The hardware associated with an I/O device consists of four
pieces:
1. Bus: allows device to communicate with the CPU, typically
shared by multiple devices
2. Device Port: typically consists of four register
1. Status Register: holds if the device is busy, if data is ready,
or if an error occurred.
2. Control Register: command to perform, what we want it to
do
3. Data-in Register: data being sent form the device to the
CPU
4. Data-out Register: data being sent from the CPU to the
device
3. Controller: receives commands from the system bus, translate
commands into device actions, and reads data from and write
data to the system bus
4. Device: the device itself

Transfer unit: does it transfer by character(modems, old


keyboard) or by block(most everything else)

Access Method: sequential(look through all data in order)


or random(most common)

Timing: Synchronous or asynchronous

Shared or dedicated(to one machine or process, potential


for deadlock)

Operations: input, output, or both


I/O Communication

The OS typically communicates with I/O devices through


their controller

OS places information in the device's registers for the


controller to send to the device

Controller places information from the device in the register


for the OS
Improving Performance of I/O

I/O Buffering: Devices typically contain a small on-board


memory where they can store data temporarily before
transferring to/from memory.
1. A disk buffer stores a block when it is read from the
disk
2. It is transferred over the bus by the DMA controller into
a buffer in physical memory
3. The DMA controller interrupts the CPU when the
transfer is complete.
Advantages: Cope with speed mismatches between device
and CPU, cope with devices that have different data
transfer sizes, minimize the time a user process is blocked
on a write (write to a kernel buffer immediately and control
is returned to user program, the write is done later)

Caching: Improve disk performance by reducing the


number of disk accesses.

Keep recently used disk blocks in main memory after


the I/O call that brought them into memory completes.

Writing to a write-through cache: write to all levels of


memory containing the block, including the disk, very
reliable

Writing to a write-back cache: write only to the fastest


memory containing the block, write to slower memories
and disk sometime later, faster.
Standard Interfaces

The OS provides a high-level interface to devices to


simplify programmer's job

Standard interfaces are provided for related devices

Device intricacies are encapsulated in device drivers, new


devices are supported by a device driver with the device
I/O services provided by the OS

Uniform naming of files and devices

Access control (protection, only send input to to input


devices, etc)

Operation appropriate to the devices

Device allocation and release

Buffering, caching, and spooling to allow efficient


communication

I/O scheduling

Error handling and failure recovery associated with devices

Device drivers that implement device-specific behaviors

Polling (AKA programmed I/O)


Handles data promptly (which makes it a good choice for modems or
keyboards where data is lost if not moved from device fast enough), but
OS has to keep stopping and checking the device.
1. OS keeps checking until the status of the I/O device is idle, how
often the OS checks is called the polling rate. If not idle the CPU
moves on to another task but keeps the operation on its todo list.
2. OS sets the command register and if its an output operation it
places value in data-out
3. OS sets the status register to command-ready
4. Controller reacts to command ready and sets the status register
to busy
5. Controller reads the command register and performs the
command and if its an input operation places a value in data-in
6. Assuming the operation succeeded the controller changes the
status to idle, otherwise set the status to error.
7. CPU observes the change to idle and if it were an input operation,
reads the date

Interrupts
Rather than have CPU continually check the device, the device can
interrupt the CPU when it completes an I/O operation. On an I/O
interrupt:
1. Determine which device caused the interrupt
2. If the last command was an input operation, retrieve the data from
the device register
3. Start the next operation for that device

Direct Memory Access (DMA)


The device uses a more sophisticated controller that can write directly to
memory. DMA controller doesn't have data in/out registers but has an
address register. The DMA controller has a buffer, registers (which keep
track of memory address and count for referencing data in memory), and
is connected to the drive.
1. CPU tells the DMA controller the location of the source and the
destination of the transfer
2. The DMA controller operates the bus and interrupts the CPU when
the entire transfer is complete, instead of when each word is ready
(this way there is no need to check to see if you're done because
you read the whole block of memory at once).
3. The DMA controller and the CPU compete for the memory bus,
slowing down the CPU somewhat (still better performance than if
the CPU did the transfer itself)

Disks

AKA secondary storage

Disks
Memory

Lasts forever(ish)
Volatile

Small (order or GB) Large (order or 100


1000s of GB)
Expensive

Cheap

How Magnetic Disks Work:


Data is stored magnetically on thin metallic film bonded to a
rotating disk of glass, ceramic, or aluminum The disk is always
spinning.
Disk Parts

Sector: the minimum unit of transfer


on the disk, tracks are split into these.
A contiguous collection of bytes on the
disk.
Block: comprised of consecutive
sectors
Tracks: concentric rings on disk with
the bits laid out serially
Surface: a concentric collection of
tracks in circular shape
Platter: thin disk that holds magnetic
material, each platter has two
surfaces.
Cylinders: matching sectors on each
surface, set of tracks that line up
along the same track index, creating a
hollow cylinder.
Spindle: runs through the center of the
disk, creates axis around which the
platters spin
Read/Write Head: at the end of arms,
performs the reads and writes. Reads
by sensing a magnetic field, writes by
creating one. Floats on air cushion
provided by spinning disk.
Arm:extends head over surface, each
surface has its own head and arm.
Comb: the arms and the read/write
heads make up this. Controls all arms
at once in unison.

Seek Time
Time to position head over track/cylinder, depends on how fast the
hardware can move the arm.

Max Seek Time: time to go from the innermost to outermost


track

Average Seek Time: average across seeks between each


possible pair of tracks

Head Switch Time: time to move from a track on one


surface to the same track on a different surface.
Rotation Time
Time for the sector to rotate underneath the head, depends on how
fast the disk can spin. Head starts reading into buffer as soon as it
settles.
Transfer Time
Time to move the bytes from disk to memory. Faster transfer time
for outermost tracks.

Surface Transfer Time: time to transfer one or more


sequential sectors to/from surface after head reads/writes
first sector. Much smaller than seek time or rotation time

Host Transfer Time: time to transfer data between host


memory and disk buffer.
Read Time
Seek time + rotation time + transfer time

Reducing Disk Overhead

To get the fastest disk response time we have to minimize seek time and

rotational latency
Place commonly used files on the

Make disks smaller


outside tracks (you can fit more data

Spin disks faster


on the outside track & it spins faster)

Schedule disk ops to

Pick a good block size: too small


minimize head movement
means low transfer rate because there

Lay out data on disk so


will need to be more seeks for the
that related data are on
same amount of data, too big means
nearby tracks (file systems) internal fragmentation

Disk Head Scheduling


The OS maximizes disk I/O throughput by minimizing head
movement.
Requests are generated by running program and them handled y
the disk driver where the queue forms. After I/O the info goes
back to the CPU.

FCFS/FIFO

Shortest Seek Time First, possible starvation(bad for response time),

SSTF queue is static (not realistic, new requests will arrive)

Move the head in one


direction until the end of the disk is
reached and then reverse. LOOK resets the head when no requests exist
between the head and the edge. Might have to wait 2 full traversals of the disk for
a request to be handled (unfair)

SCAN/elevator/LOOK

Circular scan. Move the head in one

C-SCAN/C-LOOK direction until an edge of the disk is reached


and the rest to the opposite edge. C-LOOK resets the head to the opposite edge
when there are no requests between the head and the approaching edge.
Response time is closer to 1 traversal, jump is expensive, but can be optimized.

Improving Disk Performance


Partitioning
Disks are partitioned to minimize the
largest possible seek time, each partition
is a collection of cylinders, logically
Separate, lowers seek time.
Interleaving
Allocate blocks so that they are temporally contiguous relative
to the speed with which a second disk request can be received
and the rotational speed of the disk instead of physically
contiguous allocation. Interleaves files so that they are
contiguous in time rather than space.
Buffering
Read blocks from the disk ahead of the user's request and
place in buffer on the disk controller (read those that are
spinning by the head anyway), reduces number of seeks and
exploits locality (likely to use blocks around the current one).

Flash Storage

Spinning Disk VS Flash

No moving parts, uses less power, more


DISKS
FLASH
resistant to physical damage, better
Capacity/Cost
Excellent Good
random access performance.
Sequential Bandwidth/Cost
Good
Good
Random
read/writes
per
sec/Cost
Poor
Good
NAND Flash Units
Power Consumption
Fair
Good
Physical Size
Good Excellent

The File System

Presents applications with persistent(lasting


through power cycles), named data.

Operations

Erase block

Before a page can be written it


needs to be set to a 1

Erases must be done in block


units

Takes several ms, not cheap

Read page (tens of microseconds)

Write page (tens of microseconds)


Durability

Flash memory stops reliably storing


bits

After too many erasures

After a few years without power

After a nearby cell is read many


times (read disturb)

To improve durability

Have error correcting codes,


means extra bytes in every page

Management of defective
pages/erasure blocks (firmware
marks them as bad, wear leveling
(spreads updates, remapping
helps with this wears disk out in
uniform way), spares (for both
wear leveling and mapping bad
pages and blocks).

Remapping Flash Drives: Keep


one block empty (free erasure
block, an already erased block).
Erase when writing

Time to read/erase/write a single


page:
Before remapping:
(number of pages in a
block)*(read time + write time) +
erase time
After remapping:
(erasure time / number of pages
in a block) + read time + write
time

Block vs Sector
OS may choose to use a larger
block size than the sector size, if
we only have large files we want
large blocks.
Most systems allow transferring of
many sectors between interrupts.

File: a named collection of related information


recorded on secondary storage. A sequence of
blocks. Has two parts:
Data: what a user or application puts in the
files, an array of untyped bytes,
Metadata: file attributes added and managed
by the OS (name, type, location, size, creator,
security info)
Directory: provides names for files, a list of
human readable files, a mapping of each name to
a specific underlying files or directory (hard link),
creates the namespaces of files.
Hard Link: a mapping for a name to a file or
a directory
Soft Link: a mapping from a name to
another name
Disk: an array of blocks, where a block is fixed
size data array
File System Functionality:

The file system translates from file name and


offset to data block

Manage disk layout

Pick the blocks that constitute a file

Balance locality with expandability (the file


might grow)

Manage free space

Provide the file naming organization, such as


hierarchical name space
File System Concepts:
Metadata:
Superblock: has important file system
metadata (file system type, number of
blocks in the file system)
File Header: describes where the file is
on disk and the attributes of the file
Data:
Files: contain data and have metadata like
length, creation time, etc
Directories: Map file names to file headers
File System Implementation
File Header: owner ID, size, last modified time,
and location of all data blocks.
Data blocks: Directory Data Blocks (human
readable names) and File Data Blocks (data)

Typical File System Profile


Most files are small (so we need strong
support for small files and the block size
cant be too large to avoid internal
fragmentation)
Most disk space is consumed by large
files (so we must allow large files, and
accessing large files should be
reasonably efficient)
I/O operations access/target both small
and large files (so the per-file cost must
be low, but large files must also have
good performance.

Create()

File Operations

Creates a new file with some metadata and a


name for the file in a directory
1. OS allocates disk space (checks disk
quotas and permissions, etc)
2. OS creates metadata for the file (name,
location, on disk, etc)
3. OS adds an entry to the directory that
contains the file

Link()

Creates a hard link, a new name for some


underlying file.
1. OS adds entry to the directory with new
name
2. OS increments counter in file header that
tracks the number of directory entries
pointing to that file
3. OS will point to same underlying file

Unlink()

Removes a name for a file from its directory. If


last link, file and its resources are deleted.
1. OS finds directory containing file
2. OS clears headers
3. OS frees the disk blocks used by the file
and its headers
4. OS removes the entry from the directory

Open()

Creates per file data structor referred to by file


descriptor. Returns file descriptor to the caller.
1. OS checks if the file is already open by
another process
2. If file is not already open, the OS finds the
file
3. OS copies the file descriptor into the
system wide open file table
4. OS checks position of the file against the
requested mode. If not okay, abort open.
5. OS increments the open count
6. OS creates an entry in the process' file
table pointing to the entry in the system
wide file table
7. OS initializes the current file pointer to the
start of the file
8. OS returns the index into the process' file
table (this index is used for other file
operations)

Close()

Closes the file


1. OS removes the entry for the file in the
process' file table
2. OS decrements the open file count in the
system wide file table
3. If the open count == 0 it removes the entry
from the system wide file table

Read(fileID, from, size, bufAddress)

Random access, OS reads <size> bytes from


<from> into <bufAddress>

Read(fileID, size, bufAddress)

Sequential access, OS reads <size> bytes


from current file position into <bufAddress>
and increments current file position by <size>

Write()

Similar to read() but copies the buffer to the


file

Seek()

Updates the file pointer

Fsync()

Syncs the file system, does not return until all


data is written to persistent storage.

Finding and Organizing Files on the Disk


Contiguous Allocation
OS maintains an ordered list of free disk blocks. OS allocates a
contiguous chunk of free blocks when it creates a file. All file
data stored contiguously on disk. File header only has to
contain the start location and the size. Used in CDs. First-fit, b
est-fit, worst-fit placement/allocation policies.
Advantages

Simple

Best performance for


sequential access

Disadvantages

Poor random access


performance

External fragmentation

File expansion is tricky

Linked Allocation
File stored as a linked list of blocks. File head has a pointer to
the first and last sector/block allocated to that file (Pointer to
the last block makes it easier to grow the file). Each sector has
a pointer to the next sector.
Advantages

No external fragmentation
(minimum allocation unit is
a block, so no wasted
space)

Files can expand easier


than contiguous

Disadvantages

Worse sequential access


than contiguous, but okay

Worse random access


than contiguous (we must
read every block to find the
next block)

Direct Allocation
File header points to each data block.
Advantages

Easy to create, grow, and


shrink files

Little fragmentation

Supports random access

Good for small files

Disadvantages

Index node (inode) is big or


variable sized

Indexed Allocation
Create a non-data index block for each file, it contains a list of
pointers to file blocks (number of pointers depends on the size
of the pointer and size of the block).
The file header has a pointer to the index block (file header has
no direct knowledge of where file info is stored on the disk, no
longer points to data blocks).
OS allocates an array to hold the pointers to all the blocks
when it creates the file, but allocates the blocks only on
demand.
Advantages

Supports both random and


sequential access

Not much fragmentation

Disadvantages

Maximum file size

Lots of seeks since data


isn't contiguous

Linked index blocks

Linked list of blocks

File Allocation Table (FAT)

Multilevel Index Blocks

Master File Table (MFT) is an array of 32bit entries, each element in


the array represents a data block in the system.
The index of the MFT is the file number, this is the first data block of
the file, it has the number of the next data block of the file. If a data
block x is free then the MFT[x]==0. Find free blocks by scanning MFT.
Disadvantages

Poor random access (requires sequential traversal)

Limited access control (any user can read/write any file, no


file owner data)

No support for hard links (metadata stored in directory entry

Volume and file size are limited (no more than 2^28 blocks,
files no bigger than 4GB)

No support for transactional updates

Root is the inode, it contains file's metadata (owner permissions,


setuid(file is always executed with owner's permission)) and a set of
pointers (first 10 point directly to data blocks, the last 3 point to
intermediate blocks. Total structure holds 10 + n + n^2 + n^3 blocks.

Simple to implement
because of fixed structure
Allow file growth/appends
Easy to access small files
Efficient in sequential reads

Has a file size limitation


Tree structure makes it efficient in
finding blocks
Asymmetric (efficiently supports files
big and small)

Directories
File System Layout on Disk
A file that contains a collection of mappings
MBR: master boot record
from file name to file number (inumber).
Partition Table: contains the addresses of the first and last blocks of each partition
Those mappings are directory entries. Only
Super Block: contains metadata of the file system
Free Space Management: free space info
OS can modify directories (this ensures the
Root Directory: holds root directory info
integrity of mapping, application programs
can read directories. Directories create a
name space for the files (same files or
same names can be used in different
directories)
Simple and Stupid Directories: One name
space for the entire disk.
Use a special area of the disk to hold the
directory
Directory contains <name, index> pairs
If one user uses a name, no one else can
Simple User Based Directories: Each user
has a separate directory, but all of each
user's files must still have unique names,
names can be reused by different users.
Multi-Level Directories: tree-structured
hierarchical name space, what modern OS
use.
Store directories on disk, just like files
except the file header fr directories has a
special flag bit
User programs read directories just like
any other file, but only special system
calls can write directories
Each directory contains <name, inumber>
pairs in no particular order. The inumber
is the index into the array of inodes on
disk.
There is one special root directory (stored
in a special location on disk)
How do you find the blocks of a file?
Fast File System
1. Get the inumber from the directory that
contains the file
Smart index structure
2. With the inumber find the inode (inumber Multilevel index allows to locate all blocks
is the index into the inode array)
of a file (efficient for both large and small
3. The inode contains the file blocks. Find
files)
the correct block you are looking for.
Smart Locality heuristics
Block group placement
Optimizing Directories

Divide partition into block groups


Maintain the notion of current working
directory (CWD)
(sets of nearby tracks)

Distribute free space bitmap and


OS can cache the data blocks of CWD
Finding Free Space
inode array among block groups

Need a data block use list of free data


(previously they were in as single
blocks
contiguous region which meant lots of

Represented as a bitmap
seeks when reading metadata to

One bit for each block on the disk


reading data)

Place file in block group


If bit is 1 its allocated, 0 its free

When a new file is created,


Need an inode use list of free inodes
FFS looks for inodes in the
same block as the file's
directory
When a new directory is
created, FFS places it in a
different block from the
parent's directory (distributes
the load)
Place data block, filling in all the free space you need in first-free order
(trade short term for long term locality)
Optimizes placement for when a file data and metadata, and other files
within the same directory are accessed together.
Reserved space
Give up about 10% of storage to allow flexibility needed to achieve locality
(when disk is close to full its hard to optimize locality)

NTFS: Flexible Tree with Extents


Extent: Contiguous allocations of file data, track ranges of contiguous
blocks rather than single blocks. All you need is the start and the length.
There is no fragmentation in extents (if it cant be contiguous in one extent it
uses multiple). Allocated in units of blocks
Flexible Tree: File represented by variable depth tree
Master File Record: Array of 1KB records holding the trees roots, contains
the file records.
Basic file with two extents (data couldn't be put contiguously in one extent

NTFS Metadata Files


NTFS stores most metadata in ordinary files with
well-known numbers

5 (root directory), 6 (free space bitmap), 8


(list of bad blocks), 9 (access control list for
every file, handles user permissions for
files), 0 (stores the master file table
NTFS Locality Heuristics
Uses best fit, finds the smallest region large
enough to fit file. Caches allocation status for a
small area of disk (writes that occur together get
clustered together).
Path: String that identifies a file or directory
Absolute: if it starts with /, the root directory
Relative: with respective to the current
working directory
Mount: allows multiple file systems to form a
single logical hierarchy. A mapping from some
path in existing file system to the root directory of
the mounted file system.

Small file where data is resident (no extents needed a small optimization)

File's metadeta takes up too much room in the record, attribute list holds pointers
to the other records holding the rest of the data

FS and disks importance to users


Persistence (data preserved between jobs,
power cycles, or crashes)
Speed (get to data quickly)
Size (can store lots of data)
Sharing/protection (users can share or keep
private data where appropriate)
Ease of use (user can easily find, modify, or
examine data)

Hardware provides:

Disk provides nonvolatile memory


(persistence)

Speed gained through


random access

Disks are getting bigger


and bigger

OS provides:

Redundancy allows
recovery from some
additional failures
(persistence)

Allows owner to control


privileges
(sharing/protection)

Ease of use for user

RAID(Redundant array of inexpensive disks)

Large file spanning many records

RAID-0
Disk striping (disk blocks broken down and stored on
different disks, they will be accessed concurrently.
Gives us higher disk bandwidth but poor reliability
(failure of a single disk would cause data loss),
RAID-1
Mirrored disk, write same thing to both disks. On
failure use surviving disk. Expensive (must write each
change twice)
RAID-3
Byte striped with parity (bytes written to same spot on
each disk), parity allows us to detect and correct
errors in one of the disks, writing the parity disk every
time a change is made.
RAID-4
Block striped with parity (blocks written to same spot
on each disk), writing the parity disk every time a
change is made
RAID-5
Block interleaved distributed parity, there is no single
parity disk, parity and data is distributed across all
disks
RAID-10
Stripes using RAID-0(disk striping) across reliable
logical disks (reliable because of RAID-1, mirrored
disks)
RAID-50
Stripes using RAID-0(disk striping) across groups of
disks with block interleaved distributed parity (RAID5)

Caching FS Data Structures


OS caches FS data structures (bitmap, directories, file headers,
indirect blocks, data blocks) to get good performance.

High
Large cost to
Organize storage
Write Through: write changes immediately back to disk,
Performance
initiate I/O
to access data in
consistent, slow (we have to wait for the write to hit the disk and
large sequential
generate an interrupt)
units
Write Back: delay writing modified data (for example delay until

Use caching
page is replaced in memory), better performance, can cause
inconsistencies (data can be lost in a crash)

Named Data
Large capacity
Support files and
File System Inconsistency: metadata structures dont match (for

Survives
directories
crashes
meaningful names example bitmaps and inode structures), we dont care about user
data!

Shared across
Updating multiple data structures with file system operations
programs
Move a file between directories

Controlled
Device may
Include metadata

Delete file from old directory


Sharing
store data from
for access control Add file to new directory
many users
Create a new file

Allocate space on disk for header and data

Reliability
Crash can
Use transactions

Write a new header to disk

occur during
Use redundancy

Add new file to a directory


updates
to detect and

Write data to disk

Storage
correct failures
devices can fail Migrate data to
Example: Appending a data block to the file

Flash memory
even the wear
Add new data block to data block struct

wears out
Update inode struct

Update data block bitmap


Transactions
What if only a single write succeeds?
Group actions together so that they are

Just the data block is written to disk

Atomic (all happen or none happen)

Data is written, but there is no way to get it (looks like

Serializable (transactions appear to happen one after


the block is free in the data block bitmap), as if write
the other)
never occurred

Durable (once it happens it sticks, goes all the way out

Just the updated inode is written to disk


to disk)

Pointer value in inode points to garbage, data block


Advantages
bitmap says data is free but inode says its used

Reliability

Just the updated bitmap is written to disk

Asynchronous write behind

Data block bitmap says block is used but no inode


Disadvantage
points to it, no idea which file should contain the data

All data is written twice


block
To get durability we must
What if two writes succeed?

Commit: indicate when a transaction is finished

Inode and data block bitmap succeed

Roll Back: recover from an aborted transaction

File system is consistent, but reading new block

Keep write-ahead log on disk of all changes in the


returns garbage
transaction. The log records everything the OS tries to

Inode and data block struct succeed


do

File system inconsistency,must be fixed (update

Once OS writes both changes on the log, the


bitmap)
transaction is committed.

Data
block bitmap and data block struct succeed

Write Behind changes to the disk, logging all the writes

File system inconsistency, no idea which file should

If the crash comes after a commit, the log is replayed.


contain the data block

Most file systems use write-ahead logging, AKA


UNIX's approach to FS consistency
journaling file systems. Most reliable way to recover.

Unix uses synchronous write through (block until change is


Write all metadata changes to a transaction log
out to disk, expensive) to keep metadata consistent.
before sending any changes to disk

If a crash occurs run fsck to scan entire disk from root


File changes are update directory, allocate
for consistency
blocks

Prior to update of inode bitmap: writes disappear

Transactions are create directory, delete file

(dont know if they're trash or not)


Eliminates the need for fsck after a crash

Data block referenced in inode but not in data


In the event of a crash, read the log

bitmap: update data bitmap from inode info.


If no log (then all updates made it to disk), do

File created but not in any directory: delete file


nothing

Check for in progress operations and fix problems


If the log is not complete (no commit,partial

Unix uses synchronous write back for user data (less


transaction), do nothing

consistent than synchronous write back). Doesn't guarantee


If the log is completely written (committed),
blocks are written to disk in any particular order.
apply any changes that that are left to disk

Write back forced after fixed time intervals

Can lose data within time interval

User can issue a sync command to force the OS to send


all outstanding writes to disk

ISSUES: Need to get reasoning exactly right, synchronous


writes lead to poor performance, recovery (fsck) is slow, if
you want atomicity for many file operations use transactions.
Goal

Physical
Characteristics

Design Implication

Anda mungkin juga menyukai