Anda di halaman 1dari 10

CS 300 Midterm II

CMT 300 April 20th 2013

Name and Student #:


by writing my name i swear by the honor code

Read all of the following information before starting the exam: Aids: This is an openbook test. You are welcome to use the textbook, my slides, handwritten notes. No sharing of notes/slides/textbook between students. NO INTERNET USAGE. NO LAPTOPS. NO SMARTPHONES. Show all work, clearly and in order, if you want to get full credit. I reserve the right to take o points if I cannot see how you logically got to the answer (even if your nal answer is correct). Circle or otherwise indicate your nal answers. Please keep your written answers brief; be clear and to the point. I will take points o for rambling and for incorrect or irrelevant statements. This test has 4 problems and is worth 65 points. There is a 5th problem worth 10 bonus points Good luck!

Errata: Typo in 3f). The question meant to ask for shortest seek time rst; disk scheduling does not have a clock algorithm. +1.5 points for everyone.

1.

(15 points )

Synchronization.

Consider a set of queues as shown in the above gure, and the following code that moves an item from a queue (denoted source) to another queue (denoted destination). Each queue can be both a source and a destination. void AtomicMove ( Queue s o u r c e , Queue d e s t i n a t i o n ) { Item t h i n g ; / t h i n g b e i n g t r a n s f e r r e d / i f ( s o u r c e == d e s t i n a t i o n ) { return ; // same queue ; n o t h i n g t o move } s o u r c e > l o c k . A c q u i r e ( ) ; d e s t i n a t i o n > l o c k . A c q u i r e ( ) ; t h i n g = s o u r c e >Dequeue ( ) ; i f ( t h i n g != NULL) { d e s t i n a t i o n >Enqueue ( t h i n g ) ; } d e s t i n a t i o n > l o c k . R e l e a s e ( ) ; s o u r c e > l o c k . R e l e a s e ( ) ; } Assume there are multiple threads that call AtomicMove() concurrently. a. (5 pts ) Give an example involving no more than three queues illustrating a scenario in which AtomicMove() does not work correctly. If one thread transfers from A to B, and another transfers from B to C and another from C to A, then you can get deadlock if they all acquire the lock on the rst buer before any of them acquire the second.

b. (5 pts )

Modify AtomicMove() to work correctly.

One solution to solve the problem is to impose a total order on how locks are acquired/released. The following code uses the source/destination object addresses to impose such an order, i.e., the source/destination object with a lower address acquire the lock rst (the modied code is in bold): void AtomicMove ( Queue s o u r c e , Queue d e s t i n a t i o n ) { Item t h i n g ; / t h i n g b e i n g t r a n s f e r r e d / i f ( s o u r c e == d e s t i n a t i o n ) { return ; // same queue ; n o t h i n g t o move } i f ( source > destination ) { s o u r c e > l o c k . A c q u i r e ( ) ; d e s t i n a t i o n > l o c k . A c q u i r e ( ) ; } e l s e { // d e s t i n a t i o n < s o u r c e d e s t i n a t i o n > l o c k . A c q u i r e ( ) ; s o u r c e > l o c k . A c q u i r e ( ) ; } t h i n g = s o u r c e >Dequeue ( ) ; i f ( t h i n g != NULL) { d e s t i n a t i o n >Enqueue ( t h i n g ) ; }v i f ( source > destination ) { s o u r c e > l o c k . R e l e a s e ( ) ; d e s t i n a t i o n > l o c k . R e l e a s e ( ) ; } e l s e { // d e s t i n a t i o n < s o u r c e d e s t i n a t i o n > l o c k . R e l e a s e ( ) ; s o u r c e > l o c k . R e l e a s e ( ) ; } } c. (5 pts ) Assume now that a queue can be either a source or a destination, but not both. Is AtomicMove() working correctly in this case? Use no more than two sentences to explain why, or why not. If not, give a simple example illustrating a scenario in which AtomicMove() does not work correctly. The code presented at point (a) will work correctly in this case, as it cannot lead to deadlock. This is because AtomicMove() will always acquire the lock of the source, rst and the lock of the destination second, (Next, we give a proof; this proof wasnt required for receiving full score.) The fact that AtomicMove() always acquires the source lock rst guarantees that you cannot end up with a cycle. Indeed, assume this is not the case, i.e., thread T1 holds the lock of queue1 and requests the lock of queue2, T2 holds the lock for queue2, and requests the lock of queue3, ., Tn holds the lock for queuen and waits for the lock of queue1. Since T1 holds the lock of queue1 but not queue2, it follows that queue1 is a source queue, while queue2 is a destination queue. Furthermore, since Tn holds the lock of queuen but not queue1, it follows that queue1 is a destination queue. But queue cannot be at the same time source and destination, which invalidates the hypothesis that the pseudocode can lead to deadlock. 3

(20 points ) Paging and 64-bit virtual address split as follows:

2.

Virtual Memory:

Suppose that we have a

6 Bits [ 11 Bits [ Segment Table ID] ID]


a. (2 pts )

11 Bits [ 11 Bits [ 11 Bits [ Table ID ] Table ID ] Page ID ]

14 Bits [ Oset ]

How big is a page in this system? Explain in one sentence.

Since the oset is 14 bits, the page is 21 4=16384 bytes or 16KB 1pt for the answer, 1pt for some explanation (which may have just been showing their math) -1/2 pt for 16Kb instead of 16KB. b. (2 pts ) How big is a page in this system? Explain in one sentence. Since there is a 6-bit segment ID, there are 26 =64 possible segments. 1pt for the answer, 1pt for some explanation (which may have just been showing their math) c. (2 pts ) Assume that the page tables are divided into page-sized chunks (so that they can be paged to disk). How much space have we allowed for a PTE in this system? Explain in one sentence. Since leaves of page table contain 11 bits to point at pages (the eld marked Page ID), a 21 4 bit page must contain 21 1 PTEs, which means that a PTE is simply 21411 =8 bytes in size. 1pt for the answer, 1pt for some explanation (which may have just been showing their math) -1/2 pt for correct math but wrong units (bits instead of bytes) d. (2 pts ) Show the format of a page table entry, complete with bits required to support the clock algorithm and copy-on-write optimizations. Need as many bits of physical page address as virtual (non-oset) page address. So, physical page is 50 bits. For clock algorithm, need valid (V), Use (U), Dirty (D). For copy-on-write, need Read-only bit (R). Note that V and D are needed for pretty much any paging algorithm. 1pt for PPN and size, 1 pt for other 4 bits -1/2 pt for PPN that is too small (I saw 4, 8, 11, etc frequently) -1/2 pt for one or two of the 4 bits being wrong e. (2 pts ) Assume that a particular user is given a maximum-sized segment full of data. How much space is taken up by the page tables for this segment? Explain. Note: you should leave this number as sums and products of powers of 2! Full page table will be a tree-like structure. Top-page entry (1) will point at 211 page entries, each of which will point at 211 page entries, each of which will point at 211 page entries, each of which nally points at 211 211 pages. Of course, all of these are a full page in size (21 4). So, Answer (just page tables) = 214 ( 1 + 211 + 211 211 + 211 211 21 1) = v1/2 pt for using pagesize = 16KB, 1/2 pt for counting lowest level of pages (233) 1/2 pt for counting the rest of the pages (1 + 211 + 222) , 1/2 pt slop for other random things

f. (4 pts ) Suppose the system has 16 Gigabytes of DRAM and that we use an inverted page table instead of a forward page table. Also, assume a 14-bit process ID. If we use a minimumsized page table for this system, how many entries are there in this table? Explain. What does a page-table entry look like? (round to nearest byte, but support clock algorithm and copy on write). A minimum-sized page table requires one entry per physical page. 16GB=234 so # of pages = 22 0. Thus, a minimum, we need enough entries to cover one per page, namely 22 0. A page table entry needs to have a TAG (for matching the virtual page against) and a PID (to support multiprogramming). Thus, we need at least this: A complete solution would require some decision on how to implement the actual hash table. For instance, you may need another 20bit eld to allow you to link entries together (to connect the various entries in a hash bit). Etc. 1pt for 14-bit PID, 1pt for 50-bit VPN, 1pt for 20-bit PPN, 1pt for other 4 bits lost up to a total of 1pt for wrong eld sizes putting [PID, VPN] and calling that the hash key, or something along those lines, was sucient for those 2 points. Also, page tag was not acceptable unless they put VPN g. (2 pts ) A common attack point for Internet worms is to exploit a buer-overrun. (Wikipedia: A buer overow occurs when data written to a buer also corrupts data values in memory addresses adjacent to the destination buer due to insucient bounds checking). This can occur when copying data from one buer to another without rst checking that the data ts within the destination buer. What can we add to the PTE to prevent this problem and how should we use it (use no more than two sentences)? Add an Execute bit to the PTE. Unless this bit is set, the processor will refuse to execute code in the given page. We prevent the buer-overrun attack by clearing the Execute bit for any PTEs associated with stack pages. 1pt for execute bit, 1pt for set to false for stack pages h. (4 pts ) What is the FIFO page replacement algorithm? Explain in one or two sentences why this is a bad algorithm for page replacement. Explain what makes the clock algorithm dierent from a FIFO page replacement algorithm (even though pages are examined for replacement in order) and how it is superior to FIFO. The FIFO page replacement algorithm treats all pages as entries in a FIFO queue. When brining in a new page from disk, you replace the page at the head of the FIFO (oldest page) and put the new page at the tail. FIFO is a bad algorithm because the oldest page might be frequently used thus it replaces a frequent page. The Clock algorithm is dierent because, although it goes through pages one at a time looking for victims (like FIFO), it has the ability to skip over a page that was used recently. Thus, it wont replace frequently used pages like FIFO. 1pt for description of FIFO, 1pt for why FIFO is bad, 1pt for how clock is dierent, 1pt for how clock is better

(20 points ) Disk Subsystem: Suppose that we build a disk subsystem to handle a high rate of I/O by coupling many disks together. Properties of this system are as follows:

3.

Uses 10GB disks that rotate at 10,000 RPM, have a data transfer rate of 10 MBytes/s (for each disk), and have an 8 ms average seek time, 32 KByte block size Has a SCSI interface with a 2ms controller command time. Is limited only by the disks (assume that no other factors aect performance). Has a total of 20 disks Each disk can handle only one request at a time, but each disk in the system can be handling a dierent request. The data is not striped (all I/O for each request has to go to one disk) a. (4 pts ) What is the average service time to retrieve a single disk block from a random location on a single disk, assuming no queuing time (i.e. the unloaded request time)? Hint: there are four terms in this service time! Service = Controller + Seek + Rotational + Transfer Contoller = 2ms Seek = 8ms Rotational 1 2 x [(60 s/M) / (10000 RPM) ] = 0.003 s = 3ms 6 Transfer = (32x1024 bytes) / (1010 bytes/sec) = 0.0032768 s = 3.2768ms Service = 2 + 8 + 3 + 3.28 (1 point for each term) b. (3 pts ) Assume that the OS is not particularly clever about disk scheduling and passes requests to the disk in the same order that it receives them from the application (FIFO). If the application requests are randomly distributed over a single disk, what is the bandwidth (bytes/sec) that can be achieved? It can get 1 block every Service period. So: BW = 32768 bytes/0.01628s = 2.013MB/s c. (3 pts ) Suppose that the application has requests outstanding for all disks (but they are still randomly distributed, handed FIFO to disks), what is the maximum number of I/Os per second (IOPS) for the whole disk subsystem (an I/O here is a block request)? For one disk, we can get 1 I/O every Time. For 20 disks we multiply part d by 20.

d. (3 pts ) Assume that the application cannot alter the random nature of its disk requests. Explain how the operating system could use scheduling to increase the IOPS. What does the application have to do in order to allow this type of optimization? The Operating system needs to queue a bunch of requests so that it can use something like the elevator algorithm (SCAN) to reschedule the requests. In order to allow this optimization, the application must be able to have multiple requests outstanding (i.e. it needs to give a bunch of requests to the operating system without getting any responses). e. (4 pts ) Assume that you have a RAID-5 system with ve disks. One of the disks fails. Explain how the operating system can continue to satisfy block reads for that disk. Can the system deal with two failures? Why or why not? One idea would be that the OS waits for so many requests that the disk never has to seek or wait for rotation (or does so very rarely) Service = Controller + Transfer OR Service = Controller + Transfer + Rotational. IOPS = (20/Service) f. (3 pts ) Disk requests come in to the driver for cylinders 8, 24, 20, 5, 41, 8, in that order. A seek takes 6ms per cylinder. Calculate the total seek time for the above sequence of requests assuming the following disk scheduling policy a) FCFS and b) Clock (In all cases, the disk arm is initially at cylinder 20). Explain your answer FCFS = 696, Errata : There is no clock algorithm for disk scheduling. This was a typo. So everyone gets atleast 1.5 points for this question.

4.

(10 points ) I/O a. (4 pts ) Suppose that the primary access pattern for les is sequential, large le access. If the primary goal is to access these les at the highest possible rate, explain (1) how les should be constructed from blocks and (2) how blocks of the le should be laid out on the disk. (1) Since you are only interested in reading the to link each block to the next. [putting pointers to the index blocks as you read the data] (2) lay highest bandwidth. b. (3 pts ) EXPLAIN. Memory mapped threads. les sequentially, the simplest thing to do is in index blocks leads to needing to seek back out blocks sequentially on the disk to get the I/O devices cannot be accessed by user-level

TRUE

FALSE.

The kernel can map the I/O space of the device into the memory space of a user program, thereby allowing user threads to access the device. c. (3 pts ) EXPLAIN. In a modern operating system using memory protection through virtual memory, the hardware registers of a memory-mapped I/O device can only be accessed by the kernel.

TRUE
False.

FALSE.

(10 points ) EXTRA CREDIT: FLASH Suppose that a new disk technology provided access times that are of the same order of magnitude as memory access times. What, if anything, must be changed in the following three OS components to take advantage of the quicker access time? If something doesnt change, be very specic why it doesnt change. If it will change, contrast these changes with current implementation and be as specic as possible in your answers (i.e. identify what would change and why). Process Scheduler:

5.

Scheduler deals with processes at task level with arrival and run times so while run time might change the scheduler does not need change. Memory Management: Write through instead of write back since its cheaper to go to disk. Less need to prefetch and instead save on the cost of getting unused memory location. Do more on-demand paging instead of having to rely on predicting access pattern and missing. Mainly, allows memory management to balance hit cost and miss cost instead of focusing on increasing hit rates only. Device Driver: Use polling instead of interrupt for new disk device driver since it will be in heavy use due to lower access. Or no change to the structure of the device driver from OS point of view.

Scrap Page
(please do not remove this page from the test packet)

10