Cycle 2

[Type the document title]
CYCLE-II
ELECTRONICS AND COMMUNICATION ENGINEERING
Page1
CASE STUDIES
Page2
[Type the document title] 1. DESIGN OF RTOS KERNEL

INTRODUCTION
A real-time operating system (RTOS) is an operating system (OS) intended for real-time applications. Such operating systems serve different application requests nearlyreal-time. A real-time operating system offers programmers more control over process priorities. An applications process priority level may exceed that of a system process.Real-time operating systems minimize critical system code, so that the applications interruption is nearly critical. A key characteristic of an RTOS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task; the variability is jitter. A hard real-time operating system has less jitter than a soft real-time operating system. The chief design goal is not high throughput, but rather a guarantee of a soft or hard performance category. An RTOS that can usually or generally meet a deadline is a soft real-time OS, but if it can meet a deadline deterministically it is a hard real-time OS. An RTOS has an advanced algorithm for scheduling. Scheduler flexibility enables a wider, computersystem orchestration of process priorities, but a real-time OS is more frequently dedicated to a narrow set of applications. Key factors in a real-time OS are minimal interrupt latency and minimal thread switching latency; a real-time OS is valued more for how quickly or how predictably it can respond than for the amount of work it can perform in a given period of time. Two basic designs exist:
Event-driven which switches tasks only when an event of higher priority needs servicing, called preemptive priority, or priority scheduling. Time-sharing designs switch tasks on a regular clock interrupt, and on events, called round robin. Time-sharing designs switch tasks more often than strictly needed, but give smoother multitaskinggiving the illusion that a process or user has sole use of a machine Early CPU designs needed many cycles to switch tasks, during which the CPU could do nothing else useful. For example, with a 20 MHz 68000 processor (typical of late 1980s), task switch times are roughly 20 microseconds. (In contrast, a 100 MHz ARM CPU (from 2008) switches in less than 3microseconds.) Because of this, early OSes tried to minimize wasting CPU time by avoiding unnecessary task switching. In typical designs, a task has three states: 1. Running (executing on the CPU); 2. Ready (ready to be executed); 3. Blocked (waiting for an event, I/O for example).
Page3

Most tasks are blocked or ready most of the time because generally only one task can run at a time per CPU. The number of items in the ready queue can greatly vary, depending on the number of tasks the system needs to perform and the type of scheduler that the system uses. On simpler nonpreemptive but still multitasking systems, a task has to give up its time on the CPU to other tasks, which can cause the ready queue to have a greater number of overall tasks in the ready to be executed state. Usually the data structure of the ready list in the scheduler is designed to minimize the worst-case length of time spent in the scheduler's critical section, during which preemption is inhibited, and, in some cases, all interrupts are disabled. But the choice of data structure depends also on the maximum number of tasks that can be on the ready list. If there are never more than a few tasks on the ready list, then a doubly linked list of ready tasks is likely optimal. If the ready list usually contains only a few tasks but occasionally contains more, then the list should be sorted by priority. That way, finding the highest priority task to run does not require iterating through the entire list. Inserting a task then requires walking the ready list until reaching either the end of the list, or a task of lower priority than that of the task being inserted. Care must be taken not to inhibit preemption during this search. Longer critical sections should be divided into small pieces. If an interrupt occurs that makes a high priority task ready during the insertion of a low priority task, that high priority task can be inserted and run immediately before the low priority task is inserted. The critical response time, sometimes called the flyback time, is the time it takes to queue a new ready task and restore the state of the highest priority task to running. In a well-designed RTOS, readying a new task will take 3 to 20 instructions per ready-queue entry, and restoration of the highest-priority ready task will take 5 to 30 instructions. In more advanced systems, real-time tasks share computing resources with many non-real-time tasks, and the ready list can be arbitrarily long. In such systems, a scheduler ready list implemented as a linked list would be inadequate. Algorithms Some commonly used RTOS scheduling algorithms are: Cooperative scheduling o Round-robin scheduling Preemptive scheduling o Fixed priority pre-emptive scheduling, an implementation of preemptive time slicing o Fixed-Priority Scheduling with Deferred Preemption o Fixed-Priority Non-preemptive Scheduling o Critical section preemptive scheduling o Static time scheduling Earliest Deadline First approach Advanced scheduling using the stochastic and MTG
Page4
TRUCTURE OF RTOS
REAL-TIME SYSTEM COMPONENTS To build a predictable system, all of its components, hardware and software, plus good design are contributing to this predictability. Having both good hardware and a good RTOS is a minimal but not sufficient requirement for building a correct real-time system. A wrongly designed system with excellent hardware and software building blocks may still lead to disaster. The document deals with the RTOS building block.in general a good RTOS can be defined as one that has a bounded (predictable) behavior under all system load scenarios. REAL-TIME SYSTEM TYPES An embedded system does not necessarily need a predictable behavior, and in that case it is not a real-time system. However, a quick overview of all possible embedded system shows that you will rapidly find the need for some predictable behavior and therefore most of embedded systems need to be real-time for at least some of their functionality. In a well-designed RT system, each individual deadline should be met with the actual state of the art, it is sometimes hard also costly to achieve this requirement. Therefore people invented different types of real-time systems. ELECTRONICS AND COMMUNICATION ENGINEERING Page5
HARD REAL-TIME: Missing an individual deadline results in catastrophic failure of the system (and people will hopefully invest sufficient money in this project in order to avoid this catastrophic failure). It also means that the cost of the failure is very high. FRIM REAL-TIME: Missing a deadline entails an unacceptable quality reduction, technically there is no difference with HARD RT but economically, the disaster risk and associated cost is limited compared to previous case. SOFTREAL-TIME: Deadlines may be missed and can be recovered from. The reduction in system quality and performance is acceptable and does not introduce another than technical or feature cost. NON REAL-TIME: No defines have to be met. These systems are defined in terms of average performance. ECONOMICAL REAL-TIME: We recently introduced this notion because it is in practice impossible to design a system that will never miss any deadline. The real question is therefore what is the price or economic damage one wants to pay for a missed deadline introducing a safely hazards or a quality of serving reduction. RTOS KERNEL
RTOS are quite large and complex pieces of software that provides a wide variety of services together form an abstraction layer that allows embedded applications programmers to do many ELECTRONICS AND COMMUNICATION ENGINEERING Page6

usefulthings by simply making request to the RTOS. Services include networkcommunication, file system s management, distributed systemmanagement, redundancy management and dynamic loading of application software. The most basic service reside in a part of RTOS called as KERNEL. Application softer developers also rely on RTOS kernels for the basic services such as scheduling and inter-task communications. RTOS KERNEL: Tasks 1. A task is basic unit of execution in RTOS. 2. RTOS scheduler needs to be deterministic ~ O(1) or O(n). 3. Scheduling policies that are available in a RTOS are: a. Clock driven b. Priority driven (RMS & EDF)
TASK CONTROL BLOCK
Page7
Page8
RTOS KERNEL: Memory Memory is premium in environments where RTOS work. Supports Virtual Memory (MMU) and Memory Protection (MPU) models. User space and Kernel space memory
Page9
Participation of User space programs with kernel for services and as a central pool of memory for specialized applications
XXX_Kmap XXX_PassToUserSpace XXX_Mmap XXX_PurgeMemory/Kfree XXX_Kmalloc
RTOS KERNEL: Timers Timer is software entity derived from hardware clock. Timer provides mechanism to introduce task-delays and/or to help synchronize and provide time off-course. Watchdog Timers, Programmable Timers ELECTRONICS AND COMMUNICATION ENGINEERING Page10
Based upon these hardware-programmable timers, RTOS kernel can use to create software structures of timers associated with tasks. Scheduling, Synchronization, time-stamping XXX_SetTimer XXX_AddtoTimerQueue XXX_isExpired XXX_RunAtExpiry XXX_PurgeTimerQueue
RTOS KERNEL: I/O I/O is slow as compared to CPU. I/O: Interrupt-driven, Polling, DMA. I/O map: Memory Space & IO Space. XXX_IORead/IOWrite XXX_IOMap/Unmap XXX_BindInterrupt
Page11
RTOS KERNEL: Inter-process Communication Most of the time tasks cannot run in isolation. They require to talk to each other. Synchronization, Protection and Sharing are goals of IPC.
Semaphores (Binary, Mutual) Message Queues Pipes/Named Pipes Shared Memory Page12

Signals/Slots Mail slots Sockets/XTI A common shared data-structure residing in kernel or user space. Mechanism to access it. RTOS KERNEL: Device Drivers A piece of software that enables devices connected to particular processor, via various interfaces. Controls, manages and configures devices connected to system.
A host controller driver enables system to accept a particular type of device. Client drivers are device specific. Protocol layer converts device request to form that is understood by corresponding host controllers through their drivers. Expectations from RTOS Deadline-driven Work with Dearth of Resources ELECTRONICS AND COMMUNICATION ENGINEERING Page13

Intricate I/O interfaces (Touch panels, Push buttons ) Fail-safe and Robust Availability Example of RTOS:
VxWORKS
VxWorks was developed from an old operating system VRTX. VRTX did not function properly as an operating system so Wind River acquired the rights to re-sell VRTX and developed it into their own workable operating system VxWorks (some say this means VRTX now works). Wind River then went on to develop a new kernel for VxWorks and replaced the VRTX kernel with this. This enabled VxWorks to become one of the leading operating systems created for the purpose of real time applications.Memory Management in VxWorks VxWorks memory management system does not use swapping or paging. This is because the system allocates memory within the physical address space without the need of swapping data in and out of this space due to memory constraints. VxWorks assumes that there is enough physical memory ELECTRONICS AND COMMUNICATION ENGINEERING Page14

available to operate its kernel and the applications that will run on the operating system. Therefore VxWorks does not have a directly supported virtual memory system. 1 The amount of memory available to a VxWorks system is dependent upon the platforms hardware and the memory management units imposed constraints. This amount is usually determined dynamically by the platform depending on how much memory is available, but in some architectures it is a hard coded value. This value is returned by the sysMemTop() method which will set the amount of memory available to the operating system for this session.There is an extra virtual memory support component available, The VxVMI Option, which is an architecture-independent interface to the MMU. It is packaged separately as an add-on. 0xFFF Routines which provide an interface to the system-level functions (most of kernel) are loaded into system memory from ROM and are linked together into one module which cannot be relocated. This section of the operating system is loaded into the bottom part of the memory, starting at address 0. The rest of the available memory is known as the System Memory Pool and essentially is the heap available to the application developer and the dynamic memory routines for manipulating the heap. the
Dynamic Partition Dynamic Partition
System Memory Pool 0
Kernel
In most cases, the VxWorks memory model is flat and the operating system has just the limits of the available physical memory to work with. When memory needs to be allocated, the dynamic memory management routines use the ANSI standard C function malloc(), a first-fit algorithm. It searches for the first available block of memory that the program requires, splits the correct amount of memory required for the task, and leaves the rest. When the memory needs to be freed up, the ANSI-C function free() is used. Using the first-fit method can lead to the fragmentation of all of the memory very quickly, and obviously if lots of different sized tasks are loaded into memory between lots of free spaces this can lead to no space being available at all when the next allocation request is given. Also, searching through many fragments of memory will take longer when trying to find space for allocation. It is advised to reduce fragmentation that the dynamic partition sizes are allocated on system initialisation dependent on the size of the tasks/task data. Another way to reduce fragmentation in all of the system memory is to partition the memory off into two or more sections at start-up and ask the offending tasks to use the other partitions. Processes in VxWorks A feature of a real-time operating system like VxWorks is that the programmer developing applications for the operating system can control the process scheduling. This means that the programmer is in control of how and when the processes run rather than leaving the operating system to work it out. In terms of other operating systems a VxWorks task is similar to a thread, with the exception that VxWorks has one process in which the threads are run. In a RTOS, these tasks must endeavor to ELECTRONICS AND COMMUNICATION ENGINEERING Page15

appear to run concurrently when obviously with only a single CPU this cannot happen, so clever scheduling algorithms must be enforced. Pre-emptive Task Scheduling VxWorks schedules processes using priority pre-emptive scheduling. This means that a priority is associated with each task, ranging from 0 to 255, (0 being the highest priority and 255 being the lowest). The CPU is allocated the task with the highest priority and this task will run until it completes or waits, or is pre-empted by a task with a higher priority. There are different factors which the programmer should take into consideration when assigning priorities to processes: 1. Priority is a method of ensuring continuity of tasks in the system and preventing a specific task from hogging the CPU while other important tasks are waiting. Therefore, priority levels are not a method of synchronisation (making tasks run in a specific order). 2. Tasks that have to meet strict deadlines should be assigned a high priority and the number of high priority tasks should be kept to a minimum. This is because if there are lots of high priority tasks they will all be competing for CPU time and the lower priority tasks may be left out. 3. Tasks that are CPU intensive should be assigned a lower priority. If these tasks need to perform to deadlines and need a higher priority, task locking can be used as described below. 4. VxWorks kernel tasks must be of a higher priority to application tasks in order to ensure continuity of the core operating system processes. Round Robin Scheduling There is another (optional) method of process scheduling which is used when a task (or a set of tasks) of equal priority is competing for CPU time. Round Robin scheduling is where each task is given a small piece of CPU time to run, called a quantum, and then allow the next task to run for its quantum. VxWorks will provide this time-slicing mechanism to a number of processes until a process completes, and then pass the next process (or set of processes with the same priority) to the CPU. The only problem with frequently relying on Round Robin scheduling is that if higher priority tasks are consuming all of the CPU time, the lower priority tasks would never be able to access the CPU. Inter-process Communication (IPC) with VxWorks As with processes in all operating systems tasks in VxWorks often need to communicate with one another, and this is known as IPC. A typical synchronisation problem encountered while processes are communicating is a race condition. This is basically where the result of shared data is determined by its timing but the timing has gone wrong. The OS must avoid these race conditions and must achieve mutual exclusion to ensure a process is not accessing shared data (while a process is in its critical region) at the same time as another. ELECTRONICS AND COMMUNICATION ENGINEERING Page16

VxWorks makes use of a data type called Semaphores to achieve mutual exclusion. This works by providing an integer variable which counts waiting processes, resources or anything else that is shared. There are three types of semaphore available to the VxWorks programmer. 1. The binary semaphore This only has two states, full or empty. The empty variable either waits until a give command is made, and a full semaphore waits for the take command. Binary semaphores are used with Interrupt Status Registers, but are not advanced enough to handle mutual exclusion. 2. The counting semaphore This variable starts at 0, which indicates that nothing is waiting, and a positive integer above 0 indicates the number of things waiting. This is generally used where there are fixed numbers of resources but multiple requests for the resources. 3. The mutual exclusion (mutex) semaphore This is an advanced implementation of the binary semaphore which has been extended further to ensure the critical region of a task is not interfered with. A key difference with Mutex semaphores is that it protects against priority inversion, a problem which can be encountered at some point when using a priority pre-emptive scheduler, as explained below. Shared memory (e.g. in the form of a global information bus) is also used to pass information to other tasks, as are message queues which are used when the tasks are both running. Priority Inversion A quick note on priority inversion, which although is rare, can mean success or failure for a real time operating system. Consider this: A low priority task holds a shared resource which is required by a high priority task. Pre-emptive scheduling dictates that the high priority task should be executed, but it is blocked until the low priority task releases the resource. This effectively inverts the priorities of the tasks. If the higher priority process is starved of a resource the system could hang or corrective measures could be called upon, like a system reset. VxWorks experienced a real problem with priority inversion when the operating system was used for the NASAs Mars Pathfinder Lander in 1997. A high priority information management task moved shared data around the system ran frequently. Each time something was moved, it was synchronised by locking a mutex semaphore. A low priority meteorological data gathering task would publish data to the information management bus, again acquiring a mutex while writing. Unfortunately an interrupt would sometimes cause the medium priority communications task to be scheduled. If this was while the higher priority task was blocked from running because it was
Page17

waiting for the lower priority task to finish, the communications task would stop the lower priority task but preventing the higher priority task from running at all. This caused a timeout, which was recognised by a safety algorithm. This in turn caused system resets which appeared random to the supervisors back on Earth.
Information Bus
Mutex Semaphore High Priority
3
Medium Priority
2
Bus Managemen
Low Priority
1 `
4
Meteorological
Power Management Operating systems have to deal with the power management of its resources within the system they are operating. The operating system controls all of the devices within the system and thus has to decide which ones need to be switched on and operating and which ones are not currently needed and should be sleeping or switched off. It is entirely possible for the operating system to shut down a device, which will be needed again quickly, and this can cause a delay while the device restarts. However it is also possible for a device to be kept on too long and it will be wasting power. The only way around this issue is to use algorithms which enable the operating system to make appropriate decisions about which device to shut down and what conditions dictate when it should be powered up again. Examples of such management are slowing disks down during periods of inactivity, and reducing CPU power during idle periods. This is called dynamic power management. The problem now however is that the processors are now very power efficient and therefore are not likely to be the primary consumer of energy in systems. This now means that dynamic power management is becoming outdated and a new method of power management will need to be ELECTRONICS AND COMMUNICATION ENGINEERING Page18

introduced into the operating system to tackle those system devices, which are still inefficiently consuming energy. Footnote on Power Management The information provided above is general to most operating systems but not specifically VxWorks. Dynamic device power management in a RTOS is essential, however, especially in an embedded system that for instance runs on batteries or needs to be economical with regard to heat output. Usually the operating system uses routines that are part of the architecture, for instance Intels ACPI interfaces. VxWorks can make use of the built-in interfaces on specific PowerPC architectures. Conclusion VxWorks is a versatile Real-Time Operating System which offers the developer a great deal of control over synchronisation and scheduling of tasks. This is definitely a stronger point of the operating system and in a RTOS this is essential in order to make sure things happen when they should and that they dont interfere with one another, i.e. with a safety-critical system in aerospace or healthcare. Allocation of memory in VxWorks is one of the weaker points with fragmentation common because of the first-fit algorithm used to allocate memory space. Although there are ways to get around the fragmentation I believe something to compact the memory at a specific point would be helpful but this may bring about its own problems with finding the correct time to do this if it takes time it may interfere with deadlines of critical tasks in the RTOS.
Page19
[Type the document title] 2. STUDY OF REAL TIME OPERATING SYSTEMS

A real-time operating system (RTOS) is an operating system (OS) intended to serve real-time application requests. A key characteristic of an RTOS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task; the variability is jitter.A hard real-time operating system has less jitter than a soft real-time operating system. The chief design goal is not high throughput, but rather a guarantee of a soft or hard performance category. An RTOS that can usually or generally meet a deadline is a soft real-time OS, but if it can meet a deadline deterministically it is a hard realtime OS. An RTOS has an advanced algorithm for scheduling. Scheduler flexibility enables a wider, computersystem orchestration of process priorities, but a real-time OS is more frequently dedicated to a narrow set of applications. Key factors in a real-time OS are minimal interrupt latency and minimal thread switching latency; a real-time OS is valued more for how quickly or how predictably it can respond than for the amount of work it can perform in a given period of time. Design philosophies The most common designs are: Event-driven which switches tasks only when an event of higher priority needs servicing, called preemptive priority, or priority scheduling.
Time-sharing designs switch tasks on a regular clock interrupt, and on events, called round robin.Time-sharing designs switch tasks more often than strictly needed, but give smoother multitasking, giving the illusion that a process or user has sole use of a machine.
Early CPU designs needed many cycles to switch tasks, during which the CPU could do nothing else useful. For example, with a 20 MHz 68000 processor (typical of late 1980s), task switch times are roughly 20 microseconds. (In contrast, a 100 MHz ARM CPU (from 2008) switches in less than 3 microseconds.) Because of this, early OSes tried to minimize wasting CPU time by avoiding unnecessary task switching. Scheduling In typical designs a task has three states: 1. 2. 3. Running (executing on the CPU); Ready (ready to be executed); Blocked (waiting for an event, I/O for example).
Most tasks are blocked or ready most of the time because generally only one task can run at a time per CPU. The number of items in the ready queue can greatly vary, depending on the number of tasks ELECTRONICS AND COMMUNICATION ENGINEERING Page20

the system needs to perform and the type of scheduler that the system uses. On simpler nonpreemptive but still multitasking systems, a task has to give up its time on the CPU to other tasks, which can cause the ready queue to have a greater number of overall tasks in the ready to be executed state.Usually the data structure of the ready list in the scheduler is designed to minimize the worstcase length of time spent in the scheduler's critical section, during which preemption is inhibited, and, in some cases, all interrupts are disabled. But the choice of data structure depends also on the maximum number of tasks that can be on the ready list. If there are never more than a few tasks on the ready list, then a doubly linked list of ready tasks is likely optimal. If the ready list usually contains only a few tasks but occasionally contains more, then the list should be sorted by priority. That way, finding the highest priority task to run does not require iterating through the entire list. Inserting a task then requires walking the ready list until reaching either the end of the list, or a task of lower priority than that of the task being inserted.Care must be taken not to inhibit preemption during this search. Longer critical sections should be divided into small pieces. If an interrupt occurs that makes a high priority task ready during the insertion of a low priority task, that high priority task can be inserted and run immediately before the low priority task is inserted. The critical response time, sometimes called the flyback time, is the time it takes to queue a new ready task and restore the state of the highest priority task to running. In a well-designed RTOS, readying a new task will take 3 to 20 instructions per ready-queue entry, and restoration of the highest-priority ready task will take 5 to 30 instructions. In more advanced systems, real-time tasks share computing resources with many non-real-time tasks, and the ready list can be arbitrarily long. In such systems, a scheduler ready list implemented as a linked list would be inadequate. Algorithms Some commonly used RTOS scheduling algorithms are: Cooperative scheduling o Round-robin scheduling Preemptive scheduling o Fixed priority pre-emptive scheduling, an implementation of preemptive time slicing o Fixed-Priority Scheduling with Deferred Preemption o Fixed-Priority Non-preemptive Scheduling o Critical section preemptive scheduling o Static time scheduling Earliest Deadline First approach Advanced scheduling using the stochastic and MTG Intertask communication and resource sharingMultitasking systems must manage sharing data and hardware resources among multiple tasks. It is usually "unsafe" for two tasks to access the same specific data or hardware resource simultaneously. "Unsafe" means the results are inconsistent or unpredictable. There are three common approaches to resolve this problem: ELECTRONICS AND COMMUNICATION ENGINEERING Page21

Temporarily masking/disabling interrupts General-purpose operating systems usually do not allow user programs to mask (disable) interrupts, because the user program could control the CPU for as long as it wishes. Modern CPUs don't allow user mode code to disable interrupts as such control is considered a key operating system resource. Many embedded systems and RTOSs, however, allow the application itself to run in kernel mode for greater system call efficiency and also to permit the application to have greater control of the operating environment without requiring OS intervention. On single-processor systems, if the application runs in kernel mode and can mask interrupts, often interrupt disablement is the best (lowest overhead) solution to prevent simultaneous access to a shared resource. While interrupts are masked, the current task has exclusive use of the CPU since no other task or interrupt can take control, so the critical section is protected. When the task exits its critical section, it must unmask interrupts; pending interrupts, if any, will then execute. Temporarily masking interrupts should only be done when the longest path through the critical section is shorter than the desired maximum interrupt latency, or else this method increases the system's maximum interrupt latency. Typically this method of protection is used only when the critical section is just a few instructions and contains no loops. This method is ideal for protecting hardware bit-mapped registers when the bits are controlled by different tasks. Binary semaphores When the critical section is longer than a few source code lines or involves lengthy looping, an embedded/real-time algorithm must resort to using mechanisms identical or similar to those available on general-purpose operating systems, such as semaphores and OS-supervised interprocess messaging. Such mechanisms involve system calls, and usually invoke the OS's dispatcher code on exit, so they typically take hundreds of CPU instructions to execute, while masking interrupts may take as few as one instruction on some processors. But for longer critical sections, there may be no choice; interrupts cannot be masked for long periods without increasing the system's interrupt latency. A binary semaphore is either locked or unlocked. When it is locked, tasks must wait for the semaphore to unlock. A binary semaphore is therefore equivalent to a mutex. Typically a task will set a timeout on its wait for a semaphore. There are several well-known problems with semaphore based designs such as priority inversion and deadlocks. In priority inversion a high priority task waits because a low priority task has a semaphore. A typical solution is to have the task that owns a semaphore run at (inherit) the priority of the highest waiting task. But this simplistic approach fails when there are multiple levels of waiting: task A waits for a binary semaphore locked by task B, which waits for a binary semaphore locked by task C. Handling multiple levels of inheritance without introducing instability in cycles is complex and problematic. In a deadlock, two or more tasks lock semaphores without timeouts and then wait forever for the other task's semaphore, creating a cyclic dependency. The simplest deadlock scenario occurs when two tasks alternately lock two semaphores, but in the opposite order. Deadlock is prevented by ELECTRONICS AND COMMUNICATION ENGINEERING Page22

careful design or by having floored semaphores, which pass control of a semaphore to the higher priority task on defined conditions. Message passing The other approach to resource sharing is for tasks to send messages in an organized message passing scheme. In this paradigm, the resource is managed directly by only one task. When another task wants to interrogate or manipulate the resource, it sends a message to the managing task. Although their real-time behavior is less crisp than semaphore systems, simple message-based systems avoid most protocol deadlock hazards, and are generally better-behaved than semaphore systems. However, problems like those of semaphores are possible. Priority inversion can occur when a task is working on a low-priority message and ignores a higher-priority message (or a message originating indirectly from a high priority task) in its incoming message queue. Protocol deadlocks can occur when two or more tasks wait for each other to send response messages. Interrupt handlers and the scheduler Since an interrupt handler blocks the highest priority task from running, and since real time operating systems are designed to keep thread latency to a minimum, interrupt handlers are typically kept as short as possible. The interrupt handler defers all interaction with the hardware as long as possible; typically all that is necessary is to acknowledge or disable the interrupt (so that it won't occur again when the interrupt handler returns). The interrupt handler then queues work to be done at a lower priority level, such as unblocking a driver task through releasing a semaphore or sending a message. A scheduler often provides the ability to unblock a task from interrupt handler context. An OS maintains catalogs of objects it manages such as threads, mutexes, memory, and so on. Updates to this catalog must be strictly controlled. For this reason it can be problematic when an interrupt handler calls an OS function while the application is in the act of also doing so. The OS function called from an interrupt handler could find the object database to be in an inconsistent state because of the application's update. There are two major approaches to deal with this problem: the unified architecture and the segmented architecture. RTOSs implementing the unified architecture solve the problem by simply disabling interrupts while the internal catalog is updated. The downside of this is that interrupt latency increases, potentially losing interrupts. The segmented architecture does not make direct OS calls but delegates the OS related work to a separate handler. This handler runs at a higher priority than any thread but lower than the interrupt handlers. The advantage of this architecture is that it adds very few cycles to interrupt latency. As a result, OSes which implement the segmented architecture are more predictable and can deal with higher interrupt rates compared to the unified architecture. Memory allocation Memory allocation is more critical in an RTOS than in other operating systems. First, speed of allocation is important. A standard memory allocation scheme scans a linked list of indeterminate length to find a suitable free memory block.[citation needed] This is unacceptable in an RTOS since memory allocation has to occur within a certain amount of time. The simple fixed-size-blocks algorithm works quite well for simple embedded systems because of its low overhead. ELECTRONICS AND COMMUNICATION ENGINEERING Page23

Examples Here are a few popular RTOSes Integrity Nucleus RTXC Quadros ThreadX VxWorks
Here are some of the open sourceRTOSes/OSes eCOS uClinux
Integrity The INTEGRITY RTOS was designed so that embedded developers could ensure their applications met the highest possible requirements for security, reliability, and performance. To achieve this, INTEGRITY uses hardware memory protection to isolate and protect embedded applications. Secure partitions guarantee each task the resources it needs to run correctly and fully protect the operating system and user tasks from errant and malicious codeincluding denial-ofservice attacks, worms, and Trojan horses. Unlike other memory-protected operating systems, INTEGRITY never sacrifices real-time performance for security and protection. Nucleus Nucleus OS is a real-time operating system (RTOS) and toolset created by the Embedded Systems Division of Mentor Graphics for various central processing unit (CPU) platforms. Nucleus OS is an embedded software solution and is in an estimated 2.11 billion devices worldwide. Development is typically done on a host computer running Windows or Linux. Applications are compiled to run on various target CPU architectures and tested using the actual target boards or in simulation environment. The Nucleus RTOS is designed for embedded systems applications including consumer electronics, set-top boxes, cellular phones, and other portable and handheld devices. For limited memory systems Nucleus RTOS can be scaled down to a memory footprint as small as 13 KB for both code and data.
Page24
ThreadX ThreadX, developed and marketed by Express Logic, Inc. of San Diego, California, USA, is a realtime operating system (RTOS). Similar RTOSes are available from other vendors such as VxWorks, Nucleus RTOS, OSE, QNX, LynxOS, etc. The author of ThreadX (as well as Nucleus) is William Lamie, who is the President and CEO of Express Logic, Inc. The name ThreadX is derived from the fact that threads are used as the executable modules and the letter X represents switching, i.e., it switches threads. ThreadX can be seen as the "QThreads" of SystemC implemented in preemptive fashion. Like most RTOSes, ThreadX uses a multitasking kernel with preemptive scheduling, fastinterrupt response, memory management, interthread communication, mutual exclusion, event notification, and thread synchronization features. Major distinguishing characteristics of ThreadX include priority inheritance, preemption-threshold, efficient timer management, picokernel design, event-chaining, fast software timers, and compact size. ThreadX is distributed using a marketing model in which source code is provided and licenses are royalty-free. ThreadX is generally used in real-time embedded systems, especially in deeply embedded systems. Developing embedded systems using ThreadX is usually done on a host machine running Linux or Microsoft Windows, using cross-compiling target software to run on various target processor architectures. Several ThreadX-aware development tools are available, such as Wind River Workbench, ARMRealView, GreenHillsSoftware's MULTI, Metrowerks CodeWarrior, IAR CSPY, Lauterbach TRACE32, and visionCLICK. Hewlett-Packard has licensed the use of ThreadX for all Inkjet, Laserjet and all-in-one devices recently. Earlier they were using lynxOS for Multifunctional laserjet printers and still many printers use lynxOS. ThreadX is widely used in a variety of consumer electronics, medical devices, data networking applications, and SoC development. VxWorks VxWorks is a real-time operating system developed as proprietary software by Wind River Systems of Alameda, California, USA. First released in 1987, VxWorks is designed for use in embedded systems VxWorks memory management system does not use swapping or paging. This is because the system allocates memory within the physical address space without the need of swapping data in and out of this space due to memory constraints. VxWorks assumes that there is enough physical memory available to operate its kernel and the applications that will run on the operating system. Therefore VxWorks does not have a directly supported virtual memory system.
Page25
3. DEVELOPMENT OF DEVICE DRIVERS FOR RTLINUX

RTLinux Overview This section is intended to give users a top-level understanding of RTLinux. It is not designed as an in-depth technical discussion of the system's architecture. Readers interested in the topic can start with Michael Barabanov's Master's Thesis. (A postscript version is available for download atwww.rtlinux.org/documents/papers/thesis.ps ). The basic premise underlying the design of RTLinux is that it is not feasible to identify and eliminate all aspects of kernel operation that lead to unpredictability. These sources of unpredictability include the Linux scheduling algorithm (which is optimized to maximize throughput), device drivers, uninterrruptible system calls, the use of interrupt disabling and virtual memory operations. The best way to avoid these problems is to construct a small, predictable kernel separate from the Linux kernel, and to make it simple enough that operations can be measured and shown to have predictable execution. This has been the course taken by the developers of RTLinux. This approach has the added benefit of maintainability - prior to the development of RTLinux, every time new device drivers or other enhancements to Linux were needed, a study would have to be performed to determine that the change would not introduce unpredictability. The basic Linux kernel without hard realtime support. You will see that the Linux kernel separates the hardware from user-level tasks. The kernel has the ability to suspend any user-level task, once that task has outrun the ``slice of time'' allotted to it by the CPU. Assume, for example, that a user task controls a robotic arm. The standard Linux kernel could potentially preempt the task and give the CPU to one which is less critical (e.g. one that boots up Netscape). Consequently, the arm will not meet strict timing requirements. Thus, in trying to be ``fair'' to all tasks, the kernel can prevent critical events from occurring. A Linux kernel modified to support hard realtime. An additional layer of abstraction - termed a ``virtual machine'' in the literature - has been added between the standard Linux kernel and the computer hardware. As far as the standard Linux kernel is concedrned, this new layer appears to be actual hardware. More importantly, this new layer introduces its own fixed-priority scheduler. This scheduler assigns the lowest priority to the standard Linux kernel, which then runs as an independent task. Then it allows the user to both introduce and set priorities for any number of real-time tasks. The abstraction layer introduced by RTLinux works by intercepting all hardware interrupts. Hardware interrupts not related to realtime activities are held and then passed to the Linux kernel as software interrupts when the RTLinux kernel is idle and the standard Linux kernel runs. Otherwise, the appropriate realtime interrupt service routine (ISR) is run. The RTLinux executive is itself nonpreemptible. Unpredictable delays within the RTLinux executive are eliminated by its small size and limited operations. Realtime tasks have two special attributes: they are ``privileged'' (that is, they have direct access to hardware), and they do not use virtual memory. Real-time tasks are written as special Linux modules that can be dynamically loaded into memory. They are are not expected to execute Linux system calls. The initialization code for a realtime tasks initializes the real-time task ELECTRONICS AND COMMUNICATION ENGINEERING Page26

structure and informs RTLinux of its deadline, period, and release-time constraints. Non-periodic tasks are supported through the use of interrupts.In contrast with some other approaches to realtime,RTLinux leaves the Linux kernel essentially untouched. Via a set of relatively simple modifications, it manages to convert the existing Linux kernel into a hard real-time environment without hindering future Linux development. The Basic API: Writing RTLinux Modules This chapter Introduces critical concepts that must be grasped in order to successfully write RTLinux modules. It also presents the basic Application Programming Interface (API) used in all RTLinux programs. Then it steps the user through the creation of a basic ``Hello World'' programming example, which is intended to help the user in developing their very first RTLinux program. Understanding an RTLinux Program In the latest versions of RTLinux, programs are not created as standalone applications. Rather, they are modelled as modules which are loaded into the Linux kernel space. A Linux module is nothing but an object file, usually created with the -c flag argument to gcc. The module itself is created by compiling an ordinary C language file in which the main() function is replaced by a pair of init/cleanup functions: int init_module(); void cleanup_module(); As its name implies, the init_module() function is called when the module is first loaded into the kernel. It should return 0 on success and a negative value on failure. Similarly, the cleanup_module is called when the module is unloaded. The Basic API Now that we understand the general structure of modules, and how to load and unload them, we are ready to look at the RTLinux API. Creating RTLinux POSIX Threads A realtime application is usually composed of several ``threads'' of execution. Threads are lightweight processes which share a common address space. Conceptually, Linux kernel control threads are also RTLinux threads (with one for each CPU in the system). In RTLinux, all threads share the Linux kernel address space. To create a new realtime thread, we use the pthread_create(3) function. This function must only be called from the Linux kernel thread (i.e., usinginit_module()): #include <pthread.h> int pthread_create(pthread_t * thread, pthread_attr_t * attr, void *(*start_routine)(void *), void * arg); ELECTRONICS AND COMMUNICATION ENGINEERING Page27

The thread is created using the attributes specified in the `` attr'' thread attributes object. If attr is NULL, default attributes are used. For more detailed information, refer to the POSIX functions: pthread_attr_init(3), pthread_attr_setschedparam(3), and pthread_attr_getschedparam(3) as well as these RTL-specific functions:
pthread_attr_getcpu_np(3)
, and
pthread_attr_setcpu_np(3)
which are used to get and set general attributes for the scheduling parameters and the CPUs in which the thread is intended to run.The ID of the newly created thread is stored in the location pointed to by ``thread''. The function pointed to by start_routine is taken to be the thread code. It is passed the ``arg'' argument.To cancel a thread, use the POSIX function:pthread_cancel(pthread thread); Time Facilities RTLinux provides several clocks that can be used for timing functionality, such as as referencing for thread scheduling and obtaining timestamps. Here is the general timing API: #include <rtl_time.h> int clock_gettime(clockid_t clock_id, struct timespec *ts); hrtime_t clock_gethrtime(clockid_t clock); struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; To obtain the current clock reading, use the clock_gettime(3) function where clock_id is the clock to be read and ts is a structure which stores the value obtained. A Simpl ``Hello World'' RTLinux program We'll now write a small program that uses all of the API that we've learned thus far. This program will execute two times per second, and during each iteration it will print the message: I'm here, my arg is 0 Code Listing Save the following code under the filename hello.c: #include <rtl.h> #include <time.h> ELECTRONICS AND COMMUNICATION ENGINEERING Page28

#include <pthread.h> pthread_t thread; void * start_routine(void *arg) { struct sched_param p; p . sched_priority = 1; pthread_setschedparam (pthread_self(), SCHED_FIFO, &p); pthread_make_periodic_np (pthread_self(), gethrtime(), 500000000); while (1) { pthread_wait_np(); rtl_printf("I'm here; my arg is %x\n", (unsigned) arg); } return 0; } int init_module(void) { return pthread_create (&thread, NULL, start_routine, 0); } void cleanup_module(void) { pthread_cancel (thread); pthread_join (thread, NULL); } Compiling and Executing ``Hello World'' In order to execute our program, we must first do the following: Compile the source code and create a module . We can normally accomplish this by using the Linux GCC compiler directly from the command line. To simplify things, however, we'll create a Makefile. Then we'll only need to type ``make'' to compile our code.
1.
Locate and copy the rtl.mk file. The rtl.mk file is an include file which contains all the flags needed to compile our code. For simplicity, we'll copy it from the RTLinux source tree and place it alongside of our hello.c file.
2.
Insert the module into the running RTLinux kernel. The resulting object binary must be ``plugged in'' to the kernel, where it will be executed by RTLinux.
3.
The Advanced API: Getting More Out of Your RTLinux Modules RTLinux has a rich assortment of functions which can be used to solve most realtime application problems. This chapter describes some of the more advanced concepts. ELECTRONICS AND COMMUNICATION ENGINEERING Page29

Using Floating Point Operations in RTLinux POSIX Threads The use of floating-point operations in RTL POSIX threads is prohibited by default. The RTLspecific function pthread_setfp_np(3) is used to change the status of floating-point operations. int pthread_setfp_np (pthread_tthread, int flag); To enable FP operations in the thread, set the flag to 1. To disable FP operations, pass 0. The examples/fp directory contains several examples of tasks which use floating point and the math library. RTLinux Inter-Process Communication (IPC) The general philosophy of RTLinux requires the realtime component of an application to be lightweight, small and simple. Applications should be split in such a way that, as long as timing restrictions are met, most of the work is done in user space. This approach makes for easier debugging and better understanding of the realtime part of the system. Consequently, communication mechanisms are necessary to interface RTLinux tasks and Linux. RTLinux provides several mechanisms which allow communication between realtime threads and user space Linux processes. The most important are realtime FIFOs and shared memory. Using Shared Memory
For shared memory, you can use the excellent mbuff driver by Tomasz Motylewski (motyl@chemie.unibas.ch. It is included with the RTLinux distribution and is installed in the drivers/mbuff directory. A manual is included with the package. Here, we'll just briefly describe the basic mode of operation. First, the mbuff.o module must be loaded in the kernel. Two functions are used to allocate blocks of shared memory, connect to them and eventually deallocate them.
#include <mbuff.h> void * mbuff_alloc(const char *name, int size); void mbuff_free(const char *name, void * mbuf);
The first time mbuff_alloc is called with a given name, a shared memory block of the specified size is allocated. The reference count for this block is set to 1. On success, the pointer to the newly allocated block is returned. NULL is returned on failure. If the block with the specified name already exists, this function returns a pointer that can be used to access this block and increases the reference count.
Waking and Suspending RTLinux Threads

Interrupt-driven RTLinux threads can be created using the thread wakeup and suspend functions:
int pthread_wakeup_np(pthread_t thread); int pthread_suspend_np(void);
Page30

RTLinux Serial Driver (rt_com) rt_com(3) is a driver for 8250 and 16550 families of UARTs commonly used in PCs (COM1, COM2, etc.). The available API is as follows: #include <rt_com.h> #include <rt_comP.h> void rt_com_write(unsigned int com, char *pointer, int cnt); int rt_com_read(unsigned int com, char *pointer, int cnt); int rt_com_setup(unsigned int com, unsigned int baud, unsigned int parity, unsigned int stopbits, unsigned int wordlength); #define RT_COM_CNT n struct rt_com_struct { int magic; // unused int baud-base; // base-rate; 11520 // (BASE_BAUD in rt_comP.h; // for standard ports. int port; // port number int irq; // interrupt number (IRQ) // for the port int flag; // flags set for this port void (*isr)(void) // address of the interrupt // service routine int type; // int ier; // a copy of the IER register struct rt_buf_struct ibuf; // address of the port input // buffer struct rt_buf_struct obuf; // address of the port output // buffer } rt_com_table [RT_COM_CNT]; where rt_com_write(3) - writes cnt characters from buffer ptr to the realtime serial port com. rt_com_read(3) - attempts to read cnt characters to buffer ptr from the realtime serial port com. rt_com_setup(3) - is used to dynamically change the parameters of each realtime serial port. Interfacing RTLinux Components to Linux
RTLinux threads, sharing a common address space with the Linux kernel, can in principle call Linux kernel functions. This is usually not a safe thing to do, however, because RTLinux threads may run even while Linux has interrupts disabled. Only functions that do not modify Linux kernel data structures (e.g., vsprintf) should be called from RTLinux threads.
Page31

RTLinux provides two delayed execution mechanisms to overcome this limitation: soft interrupts and task queues.
Writing RTLinux Schedulers

Most users will never be required to write a scheduler. Future versions of RTLinux are expected to have a fully customizable scheduler, but in the meantime, here are some points to help the rest of you along: i386
The scheduler is implemented in the scheduler/rtl_sched.c file The scheduler's and scheduler/i386 architecture-dependent files are located in include/arch-
The scheduling decision is taken in the rtl_schedule() function. Thus, by modifying this function, it is possible to change the scheduling policy. Further questions in this area may be
addressed directly to the FSM Labs Crew.
Conclusions When choosing a real-time operating system it is important to know what features you need. Our comparison of FreeRTOS and RTLinux has shown that two real-time operating systems can in fact be very dissimilar.One of the biggest dierences between FreeRTOS and RTLinux are their sizes. One thing thatmay need to be considered is what platforms are available for the project. The platform choiceis often directly related to the processing power needed or restrictions when it comes to physicalsize. FreeRTOS supports many small processors and microcontrollers. If a microcontroller cando the job at hand, then FreeRTOS is probably the way to go. On the other hand, if muchprocessing power is needed, the platform of choice is probably something larger like an x86 system. Then RTLinux is a good choice.Beside from their platform support, the two operating systems are very dierent in what featuresthey have. FreeRTOS is very simple, and though that can be considered a bad thing, a smallproject can probably benet from the simplicity. Large projects, however, will prot from theextra functions in RTLinux. Many real-time projects often include one part that is time-critical and one that isn't. If the non time-critical part is a relatively large part of the project then RTLinux has many thing to oer. If a network connection, a graphical interface or something else fairly complex is needed, RTLinux denitely has the upper hand. One last thing to consider is the economics. In a commercial project, the ability to use the GPL license doesn't always exist. In this case it can be worth noting that FreeRTOS can be used in a commercial project without paying royalties, while RTLinux can't. The scheduler in the two system is very dierent in the aspect that FreeRTOS is smaller and more simple while RTLinux of course has a bigger and more complex scheduler. Although FreeRTOSs scheduler works good with limited amount of predened-tasks (an embedded system). It or you will probably run in to troubles when trying to do thing close to the system limit. This is because there is no more advanced scheduler than highest priority rst. Both systems might run into problems when the idle-process get starved. Though not necessarily problems with the RT-task but problems like logging and controlling. Especially if RTLinux is used for data pro ELECTRONICS AND COMMUNICATION ENGINEERING Page32
Page33
[Type the document title] 4. SOFTWARE DEVELOPMENT FOR DSP APPLICATIONS

Introduction Digital signal processing (DSP) refers to various techniques for improving the accuracy and reliability of digital communications. The theory behind DSP is quite complex. Basically, DSP works by clarifying, or standardizing, the levels or states of a digital signal. ADSP circuit is able to differentiate between human-made signals,which are orderly, and noise, which is inherently chaotic. All communications circuits contain some noise.This is true whether the signals areanalog or digital,and regardless of the type of information conveyed. Noise is the eternal bane of communications engineers, who are always striving to find new ways to improve thesignal-to-noise ratioin communications systems. Traditional methods of optimizing S/N ratio include increasing the transmitted signal power and increasing the receiver sensitivity. (Inwireless systems,specialized antenna systems can also help.) Digital signal processing dramatically improves the sensitivity of a receiving unit. The effect is most noticeable when noise competes with a desired signal. A good DSP circuit can sometimes seem like an electronic miracle worker. But there are limits to what it can do. If the noise is so strong that all traces of the signal are obliterated, a DSP circuit cannot find any order in the chaos,and no signal will be received. DSP Applications The main applications of DSP are audiosignal processing, audio compression, digital image processing, videocompression, speechprocessing, speechrecognition, digitalcommunications, RADA R, SONAR, seismology and biomedicine. Specific examples are speech compression and transmission in digital mobile phones, room correction of sound in hi-fi and sound reinforcement applications, weather forecasting, economic forecasting, seismic data processing, analysis and control of industrial processes, medical imaging such as CATscans and MRI, MP3 compression, computer graphics, image manipulation, hifi loudspeaker crossovers and equalization, and audio effects for use with electric guitar amplifiers. Digital signal processor A digital signal processor (DSP) is a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.[1][2] DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time: Super Harvard architecture Harvard architecture Modified von Neumann architecture Use of direct memory access Memory-address calculation unit
Page34

From Analog to Digital Classical DSP applications work with real-world signals, such as sound and radio waves that originate in analog form. Analog means the signals are continuous; they change smoothly from one state to another. Digital computers, on the other hand, treat information discontinuously, as a discrete series of binary numbers. This permits an exactness of measurement and control impossible in analog systems. The goal of digital signal processing is to use the power of digital computation to manage and modify the signal data. Therefore, the first stage in many DSP systems is to translate smooth real-world signals into a "bumpy" digital approximation. While a sound wave can be depicted as an undulating line, its digital representation looks more like an ascending and descending staircase. This translation is accomplished by an Analog-to-Digital Converter (ADC). In essence, ADCs work like a movie camera, clicking off a series of snapshots that, when strung together, approximate the continuous flow of actual events. The "snapshots" taken by ADCs are actually a series of voltage measurements that trace the rise and fall of the analog signal, like points in a connect-the-dots drawing. If the ADC has done its job well, the data points give a detailed and accurate rendering of the signal. After a certain amount of clean-up work (to remove extraneous frequencies, among other things), the ADC passes its digitized signal information to a DSP, which does the bulk of the processing. Eventually, when the DSP has finished its chores, the digital data may be turned back into an analog signal, albeit one that is quite different from and much improved over the original. For instance, a DSP can filter the noise from a signal, remove unwanted interference, amplify certain frequencies and suppress others, encrypt information, or analyze a complex wave form into its spectral components. In plainer language, DSPs can clean the crackle and hiss from music recordings, remove the echo from communications lines, make internal organs stand out more clearly in medical CAT scans, scramble cellular phone conversations to protect privacy, and assess seismic data to pinpoint new oil reserves. Of course there are also DSP applications that don't require Analog-toDigital translation. The data is digital from the start, and can be manipulated directly by the DSP. An example of this is computer graphics where DSPs create mathematical models of things like weather systems, images and scientific simulations. Different DSPs For Different Jobs One way to classify DSP devices and applications is by their dynamic range. The dynamic range is the spread of numbers, from small to large, that must be processed in the course of an application. It takes a certain range of values, for instance, to describe the entire waveform of a particular signal, from deepest valley to highest peak. The range may get even wider as calculations are performed, generating larger and smaller numbers through multiplication and division. The DSP device must have the capacity to handle the numbers so generated. If it doesn't, the numbers may "overflow," skewing the results of the computation. The processor's capacity is a function of its data width (i.e. the number of bits it manipulates) and the type of arithmetic it performs (i.e., fixed or floating point). A 32-bit processor has a wider dynamic range than a 24-bit processor, which has a wider range than 16-bit processor. And floating-point chips have wider ranges than fixed-point devices. Each type of processor is ideal for a particular ELECTRONICS AND COMMUNICATION ENGINEERING Page35

range ofapplications. Sixteen-bit fixed-point DSPs such as Motorola's DSP56100 family are justright for voice-grade systems such as phones, since they work with a relatively narrow range of sound frequencies. Hi-fidelity stereo sound has a wider range, calling for a 16- bit ADC, and a 24-bit fixed point DSP like the DSP56002. (The ADC's 16-bit width is needed to capture the complete highfidelity signal; the DSP must be 24 bits to accommodate the larger values that result when the signal data is manipulated.) Image processing, 3-D graphics and scientific simulations have a much wider dynamic range and require a 32-bit floating-point processorS. Micro -Line Embedded Dsp Boards The micro-line family of embedded DSP and FPGA boards provide a range of TMS320C6000 DSP processor, and FPGA capabilities for commercial and industrial applications. They can be utilized as stand-alone embedded DSP or FPGA systems, or integrated as mezzanine plug-in modules with customer or application-specific hardware designs. The boards provide extensive access to the many device level interfaces supported natively by the DSP and FPGA resources. To complement this, a variety of hardware and software options are available for configuring and tailoring complete systems to application-specific needs. Options include support for analog and digital I/O, Ethernet, FireWire, video framecapture and output (via FireWire), and non-volatile data storage using on-board FLASH memory or FireWire enabled hard drives and CompactFLASH memory. Integer DSP processor boards: C641xCPU 400/500MHz TMS320C6410/6413/6418 DSP +400k/1MGate Spartan-3E FPGA C6412Compact 720MHz TMS320C6412 DSP +1M/5M/4MGateSpartan-3E FPGA +USB 2.0 Interface +10/100Base-T Ethernet Interface +IEEE1394A Fire Wire Interface
C6412Compact
Page36

Floating-pointDSP processor boards: C6713CPU 300MHz TMS320C6713 DSP +400k/1MGate Spartan-3E FPGA C6713Compact 300MHz TMS320C6713 DSP +250k/500k/1MGateVirtes-II FPGA +IEEE1394A Fire Wire Interface
C6713CPU Development Tools & Software Micro-line embedded DSP hardware is complemented by a fully integrated software framework for developing application software. This provides the means to develop and debug software for the TMS320C6000 DSP processor and to load the final application code to the on-board FLASH ROM, allowing the DSP to boot autonomously with application software. Developers utilizing the FireWire capabilities of micro-line will have access to the TMS320C6000 based FireWire API and can optionally select a Windows based Host API for integrating the microline hardware with a host PC. Texas Instruments Code Composer Studio: IDE with advanced C Code Generation Tools, JTAG based C-Source Debugging and pre-emptive multi-tasking support for TMS320C6000 DSPs USB JTAG In-Circuit Emulator Micro-line FLASH Filesystem and supporting Host PC resident software tools for downloading and managing executable software via RS-232 link micro-line FireWire API for embedded micro-line DSPs Unibrain FireAPI host computer software interface for integrating micro-line hardware with Host Computers
Page37

Texas Instruments Code Composer Studio Code Composer Studio is an open, powerful TMS320C6000 integrated development environment for Windows based computers that uses an intuitive system of advanced code generation and development tools that can slash overall micro-line DSP System coding time and eliminate many real-time problems in minutes. Multi-tasking on TMS320C6000 DSPs Bundled with Code Composer Studio is DSP/BIOS, a universal layer of scalable, real-time foundation software for TI DSPs. DSP/BIOS can be embedded within each TMS320C6000 processor in micro-line target applications to provide a variety of basic run-time services. Preemptive multitasking Interrupt Management

Timer and Periodic Function Management Memory Management On the fly run-time analysis using RTDX Real-Time Data eXchange Interface
PMP1000 - Texas Instruments C6414 72 GIPS Parallel DSP Board The PMP1000 is a parallel processing DSP board that features industry leading processing power and data throughput capabilities. The processing power is provided by multiples of Texas Instruments highest performance TMS320C6414 DSPs and the high data throughput is facilitated by the internal mechanization of the PMP1000, which utilizes multiple high-speed buses combined with cross-point port switching to connect "anything to anything"; at full parallel bus bandwidth. Due to the throughput and processing power of the PMP1000, many processes that were previously performed off-line can now be performed in REAL TIME. The unique PMP1000 architecture incorporates a master DSP called the Program Execution Processor (PEP) and options of four or eight slave DSPs. In the PMP1000 mechanization, all slave DSPs execute program threads managed by the PEP. Theslave DSPs are mounted four to a daughter card called a "Quad DSP Array" or QDA. Each of these slave DSPs has 64 megabytes of external SDRAM (mapped into each of the DSPs) in addition to more than one megabyte of internal RAM. Operating at a clock rate of 1 GHz, each DSP is capable of executing up to 8 instructions per clock cycle for a peak processing performance of 8 GIPS per DSP or 72 GIPS for the entire board. ELECTRONICS AND COMMUNICATION ENGINEERING Page38

PMP1000 Features Parallel Processing with up to 9 Texas Instruments TMS320C6414 DSPs Up to 72 GIPS Peak Processing Power 64MB of memory mapped into each of the 8 main processing DSPs (>512 MB of total DSP memory) Continuous Input Data Processing up to 400 MB/s (Depending on Application) >2,000 MB/s Internal I/O via Switching Network Unique Program Execution Processor for Dynamic Thread Allocation Advanced Parallel DSP OS Simplifies User Programming Designed to create true high-speed, real-time systems in otherwise non real-time environments (such as the PC) 400 MB/s Continuous Transfer Over PCI-X Bus
Real-Time Parallel DSP Software Development System Writing software to operate parallel processors has traditionally been a complex undertaking. The DSP Software Development System is designed to make it as simple as possible to create an executable application. The DSP Software Integrator provides unique development and optimization environments with innovative tools for creating and debugging an application. Despite the high-performance nature of Signatec's parallel DSP board PMP1000's hardware configuration, it surprisingly lends itself to a software mechanization that is unique, easy-to-use, and efficient, especially for a parallel processing system. The PMP1000 operation consists of 8 (or 4 when using a single QDA module) DSPs that process program threads as directed and a master DSP designated as the Program Execution Processor (PEP). The primary tasks for the PEP are to dynamically distribute the threads and to manage all data flow. The PEP executes main, which consists of processor function calls (program threads) and data transfer functions. Signatec use the term thread to be consistent with generally accepted usage. In the PMP1000 world, a thread is a C function. In some applications, this function will be the entire computational process to be applied to a data set. Software Integrator The PMP1000 Software Integrator ties together a number of software components from Signatec and Texas Instruments. It provides a true Windows interface for the TI Tools and supplies a quick link to the text editing and compiler/linker facilities of a C program-editing environment. The Integrator ELECTRONICS AND COMMUNICATION ENGINEERING Page39

consists of a Development Environment, a Debugging Environment, and an Optimization Environment. Development Environment User programs are written in C and contain source code for both the PMP1000 and for the PC. The PMP1000 code looks very similar to any other C program that runs on a single processor. It consists mostly of data transfer function calls (from the DSP Library) and processing function calls to the DSPs. The Preprocessor The Preprocessor is a key element of the development environment. It splits and translates user source code into 2 types: PEP code and DSP code. The PEP and DSP source code is compiled by the C6414 compiler contained in Code Composer Studio and linked to the DSP library to create the executable code for the PEP and the DSPs. The Preprocessor performs extensive code translation of the user code to convert it into a format where executable functions are allocated to available core DSPs in a manageable fashion. Directives may also be placed into the source code to provide control over the code generation. Software Development System Contents The PMP1000 software supports code development and board operation under Windows 2000/XP. Most applications require some level of interaction between code executing on the PMP1000 and code executing on the PC. All PC software supplied by Signatec for the PMP1000 is written in Microsoft Visual C/C++. All DSP software is written for and compiled with Texas Instruments Code Composer Studio for the C6414 DSP. The complete software system contains the following items: 1.) Signatec Support: Software Integrator PMP1000 Example Programs with Source Code DSP Function Library PC Function Library Windows Device Drivers Linux Device Drivers 2.) TI Code Composer Studio with Compiler, Linker, Debugger 3.) Microsoft Visual C/C++ (for PC Code) Real-Time DSP Drivers & Libraries Signatec's DSP products are all provided with the following software at no addtional charge: A full product installation utility Product drivers for 32-bit/64-bit operating systems ELECTRONICS AND COMMUNICATION ENGINEERING Page40

A complete library of functions for developing user specific applications Source code examples that illustrate how to use the function library for building applications
Four-core MSC8154 processor and Development Support: Freescale Semiconductor introduces the MSC8154 processor a four-core version of Freescales award-winning, high-performance MSC8156 digital signal processor (DSP) The MSC8154 is a single device that features four fully-programmable 1GHz DSP cores delivering 4GHz of DSP processing power, plus the innovative MAPLE-B multi-standard baseband specific accelerator. This combination, delivered in a single system-on-chip, is ideal for use in cost-optimized systems. The industry-leading SC3850 DSP core enables the MSC8154 to deliver up to 32 GMACS of 16-bit performance. The processor features high speed standard interfaces (Dual sRIO, Dual SGMII, Dual DDR-3 and PCI Express technologies) and large embedded multilevel memory with high speed DDR interfaces. The Freescale MSC8154AMC reference development system is a high density, single width, full height DSP platform, based around three MSC8154 DSPs. With a high level of performance and integration, the MSC8154AMC is an ideal enablement platform for customers and third parties who are developing and debugging solutions for the next generation of wireless standards such as 3GLTE, WiMAX, HSDPA+ and TDD-LTE. Each MSC8154 has 1GB of associated 64-bit-wide DDR3 memory split into two banks. High throughput RapidIO interface links connect the MSC8154 processors to each other and to the data backplane. The RapidIO interfaces interconnect via the IDT CPS10Q sRIO switch. For the control/data plane, each of the two RGMII gigabyte Ethernet ports links to the backplane ports and a front panel port through an Ethernet switch. A Module Management Controller (MMC) performs hot swapping and board control. Freescale also offers a full set of development tools and enablement software for the MSC8154 and MSC8156 DSPs including a highly optimized 3G-LTE software library. Freescales MSC8156 DSP wireless infrastructure customers are already utilizing this 3GPP LTE software kernel library, which provides highly optimized modules for both the uplink and downlink shared channels. The code is delivered with comprehensive test harnesses, documentation and recommendations of how to architect code for interfacing withthe MSC8154 or the MSC8156. DSP device drivers Digital signal processors (DSPs) are now often integrated on-chip with numerous peripheral devices, such as serial ports, UARTs, PCI, or USB ports. As a result, developing device drivers for DSPs requires significantly more time and effort than ever before. A DSP device-driver architecture that reduces overall driver development time by reusing code across multiple devices. We'll also look in-depth at an audio codec driver created using this architecture. The design and code examples are based on drivers developed for the Texas Instruments' DSP/BIOS operating system, though the same approach will work in any system. ELECTRONICS AND COMMUNICATION ENGINEERING Page41

Writing a codec driver for a DSP actually involves programming three different peripheralsthe codec itself, the serial port, and the DMA controller. Figure 1 shows the data flow between the different peripherals, the DSP, and the DSP's internal memory. Later, we'll show a muchmore detailed implementation of a DSP codec driver, but first let's discuss some driver-architectural issues that enable better code reuse both at the application and driver levels.
A DSP codec driver often involves configuring more than just the codec Device-driver architecture: A device driver performs two main functions: device configuration and initialization data movement Device configuration is, by definition, specific to a particular device. Data movement, on the other hand, is more generic. In the case of a streaming data peripheral like a codec, the application ultimately expects to send or receive a stream of buffers. The application shouldn't have to worry about how the buffers are managed or what type of codec is being used, beyond issues such as data precision (number of bits in the sample).any driver can be divided into two parts: a class driver that handles the application interface and OS-specifics and a mini-driver that addresses the hardware specifics of a particular device.
Any driver can be divided into two parts: a class driver and a mini-driver
Page42
Modern DSPs: Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSP's provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300 Texas Instruments produces the C6000 series DSPs, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (instructions per second), use VLIW (very long instruction word), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSP's, and the newest generation C6000 chips support floating point as well as fixed point processing. Freescale produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz. XMOS produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80 Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGA's CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR) modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs. Analog Devices produce the SHARC-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALUs, SIMD instructions and audio processingspecific components and peripherals. The Blackfin family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple operating systems like CLinux, velOSity and Nucleus RTOS while operating on realtime data. ELECTRONICS AND COMMUNICATION ENGINEERING Page43

NXP Semiconductors produce DSP's based on TriMedia VLIW technology, optimized for audio and video processing. In some products the DSP core is hidden as a fixed-function block into aSoC, but NXP also provides a range of flexible single core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding. Most DSP's use fixed-point arithmetic, because in real world signal processing the additional range provided by floating point is not needed, and there is a large speed benefit and cost benefit due to reduced hardware complexity. Floating point DSP's may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating point DSP's to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point. Generally, DSP's are dedicated integrated circuits; however DSP functionality can also be produced by using field-programmable gate array chips (FPGAs). Embedded general-purpose RISC processors are becoming increasingly DSP like in functionality. For example, the ARM Cortex-A8 and the OMAP3 processors include a Cortex-A8 and C6000 DSP. Things that have DSPs Some typical and well-known items which contain one (or many) embedded DSPs: the biggie: cell phones fax machines DVD players and other home audio equipment your car (for example: the anti-lock braking system) computer disk drives satellites the "switch" at your local telephone company (more than a lot) digital radios high-resolution printers digital cameras SymNet 8x8 DSP
The SymNet 8x8 DSP is the original model in the SymLink series of network audio processors. It is the hardware platform used to execute system designs created in SymNet Designer software. Up to sixteen 8x8 DSPs, or other SymLink hardware models, can be networked together in a ring topology via the low latency, 64-channel SymLink Bus to provide high channel-count processing systems for use in convention centers, arenas, university and corporate campuses, large houses of worship, theaters, hotels, and casinos. Features ELECTRONICS AND COMMUNICATION ENGINEERING Page44

Single rack space, Open architecture, Over 300 DSP modules including: Echo cancelling Automixing Matrixing Room EQ Feedback elimination Speaker management Expandable via SymLink (64 channel local high speed audio and data bus) Eight (8) mic/line inputs with phantom powerEight (8) outputs Two (2) RS-232 ports and one (1) RS-485 port Eight (8) analog control inputs Six (6) open collector outputs Three (3) relay outputs. TMS320C6424 DSP This section provides the timing specification for the DDR2 interface as a PCB design and manufacturingspecification. The design rules constrain PCB trace length, PCB trace skew, signal integrity, cross-talk,and signal timing. These rules, when followed, result in a reliable DDR2 memory system without the needfor a complex timing closure process. For more information regarding guidelines for using this DDR2specification, see Understanding TI's PCB Routing Rule-Based DDR Timing Specification (SPRAAV0).The DDR2 interface schematic for a x32 DDR2 memory system. The x16 DDR2 systemschematic is identical except that the high word DDR2 device is deleted. Pin numbers for the C6424 canbe obtained from the pin description section of their respective data sheets and the DDR2 device pinnumbers can be obtained from their device-specific data sheets.
Page45
5. SERIAL COMMUNICATION DRIVERS FOR ARM PROCESSOR

ARM PROCESSOR ARM is the industry's leading provider of 32-bit embedded microprocessors, offering a wide range of processors based on a common architecture that deliver high performance, industry leading power efficiency and reduced system cost. Combined with the broadest ecosystem in the industry with over 900 Partners delivering silicon, tools and software, the wide portfolio of more than 20 processors are able to meet every application challenge. With more than 25 billion processors already created and in excess of 16 million shipped every day, ARM truly is The Architecture for the Digital World. ARM processor features include:

Load/store architecture An orthogonal instruction set Mostly single-cycle execution A 16x32-bit register Enhanced power-saving design.
ARM processor from their beginnings as the proprietary solution for a particular set of problems in a particular company to their current status as a highly successful, flexible and customizable set of processors available on the open market. While some aspects of this story are of purely anecdotal interest, others shed light on some ARM design decisions, which were taken in an unusual set of circumstances to meet specific goals, now seen to meet the demands of an innovative and exciting market place requiring good performance and low power consumption, balanced with low cost. British readers will probably be familiar with Acorn Computers Ltd, its products and its history of phenomenal success in the UK computer market of the early 1980s. Other readers may not have had access to as much information on the vibrant home computer market in the UK then, or to Acorn's record for technical innovation. The story starts with the original development of the ARM processor, and ends with the establishment of ARM Ltd as a global force in the microprocessor industry. In between, it sheds some light on various design decisions which were taken in the genesis of the ARM design. ELECTRONICS AND COMMUNICATION ENGINEERING Page46

The development of the ARM chip at Acorn The history of the ARM processor family is closely intertwined with that of the British personal computer industry, and reflects differences between the development of the British and American computer industries. A number of different manufacturers achieved prominence in this briefly flowering market, but then never gained a great deal of success beyond the UK and Europe. The smaller size of the UK market (compared to the US) also ensured that even the most successful companies could not achieve the size of American rivals, affecting their ability to invest in research and development and to ride out the ups and downs of the market for personal and home computers. Acorn's background The first ARM chip, the Acorn RISC Machine, was developed between 1983 and 1985 by the advanced research and development team at Acorn Computers, a pioneering developer of microcomputers in the UK. During this time Acorn was one of the leading names in the British personal computer market. Other significant players were Sinclair, another Cambridge start-up, and to a lesser extent the American companies Apple, Commodore and Tandy, along with a host of smaller British developers producing a wide range of machines targeted at the booming home computer market. Acorn's initial success was sealed when the British Broadcasting Corporation (BBC) commissioned a new home computer model from the company to be sold as the BBC Microcomputer, to tie in with a public computer education programme shown on BBC television in the UK. The release of the BBC Micro in 1982 caught the crest of the home computer wave in Britain, and the BBC name gave Acorn's design added credibility compared with competing machines from the many other developers in this market. Sales exceeded all expectations: original estimates by the BBC and Acorn were that at best tens of thousands of units would be sold. In fact, to date nearly two million BBC Micro-compatible computers have been sold by Acorn, and it quickly grew from a small company with tens of staff into a medium-sized company employing hundreds with an annual turnover of tens of millions of pounds. The BBC Micro was based around the 8-bit 6502 processor from Rockwell, the same chip that powered the Apple II. Initial models featured colour graphics and 32 kbyte of random access memory. Data was stored on audio cassettes; hard and floppy disk drive interfaces were also available, and Acorn was an early proponent of local area networking with its Econet system. Another important feature of the BBC Micro was its capacity to accept a second processor attached via an expansion port known as the Tube. Connectivity, interoperability and networking were familiar concepts to many BBC Micro users long before they were established in the rest of the personal computer world, via such options as the Tube. This required a degree of interoperability between host and second processor, as well as Acorn's Econet local area networking standard. Acorn was to continue to release 6502-based variants of the BBC Micro for four more years. Production of the most successful model, the Master, only ceased in May 1993, and these computers form the backbone of computing provision in many British schools. However it was clear to the advanced research and development team that there was no clear step forward to the next generation ELECTRONICS AND COMMUNICATION ENGINEERING Page47

of processors, no obvious 16-bit processor to use in future Acorn systems. One Acorn model, the Communicator, used a 16-bit 6502 derivative, the 65C816 processor, the same device as used in the Apple IIGS, but Acorn's designers were not convinced that this chip represented the advance they were looking for. The team tried all of the 16- and 32-bit processors then on the market but found none to be satisfactory for their purposes; in particular, the data bandwidth was not sufficiently greater than that offered by the 6502 to justify basing the next generation of Acorn computers upon them. Processors were tested by building BBC Micro `second processor' units based upon them, and it became clear that no chip would be found to fit the very precise requirements on which the Acorn design team had settled. Acorn's processor requirements Acorn's aim at that time was to produce personal computers which met the needs of the business community by providing office automation facilities. Clearly, more power was needed than was offered by the 6502. In the fine tradition of the computer hobbyist, the design team decided to develop their own processor, which would provide an environment with some similarities to the familiar 6502 instruction set but lead Acorn and its products directly into the world of 32-bit computing. Acorn has always been renowned for the calibre of its research and development staff. It was able to pick the cream of graduates from Cambridge University, home of a highly regarded computer science faculty, as well as attracting staff from around the world. To them, designing a processor from scratch to meet their carefully specified criteria was an obvious thing to do. Acorn's phenomenal success with its 8-bit computers had created a research and development environment where staff could afford to pursue advanced projects which would not necessarily result in immediately saleable products, and were actively encouraged to do so. Genesis of ARM in comparison with other RISC processors In fact, many of the commercially available RISC processors intended for use as the CPU of a personal computer or workstation were designed or developed in-house by system developers, when microprocessor developers were either concentrating on improving their CISC designs or designing RISC chips for supporting roles or as embedded controllers. For example, Sun developed the SPARC RISC chip and architecture for its own computer workstations, while notable RISC processors from established chip producers include Intel's i860 graphics processor and AMD's 29000, which has mainly been used as a graphics accelerator or in printers. However, both Sun's and MIPS' efforts were based on earlier research efforts at Stanford and Berkeley universities respectively, while Acorn's project was effectively begun from scratch, although reports on the Berkeley and Stanford research were read by the Acorn team and were part of the inspiration behind designing a RISC processor. One of the reasons the ARM was designed as a small-scale processor was that the resources to design it were not sufficient to allow the creation of a large and complex device. While this is now presented ELECTRONICS AND COMMUNICATION ENGINEERING Page48

as (and genuinely is) a technical plus for the ARM processor core, it began as a necessity for a processor designed by a team of talented but inexperienced designers (outside of university projects, most team members were programmers and board-level circuit designers) using new tools, some of which were far from state-of-the-art. With these restrictions on design and testing, it is hardly a surprise that a small device was developed. While the ARM was developed as a custom device for a highly specific purpose, the team designing it felt that the best way to produce a good custom chip was to produce a chip with good all-round performance. Designing the first ARM Work on the development of what was to become the ARM began in 1983. Working samples were received in 1985. The team developing it included Steve Furber, now ICL Professor of Computer Engineering at Manchester University, and Roger Wilson, both of whom had worked on the design of the BBC Micro, as well asRobert Heaton, now of Obsidian Technology, who led the VLSI design group within Acorn. The design team worked in secret to create a chip which met their requirements. As described earlier, these were for a processor which retained the ethos of the 6502 but in a 32-bit RISC environment, and implemented this in a small device which it would be possible to design and test easily, and to fabricate cheaply. First the instruction set was specified by Wilson, based on his knowledge gained as the author of much of the original software for the BBC Micro, including its BASIC interpreter. The important initial decisions were to use a fixed instruction length and a load/store model. Other design decisions were taken on an instruction by instruction basis. Modelling the ARM1 instruction set The first model of the ARM instruction set was written in BASIC, an approach which made it easy to set everything out and develop a prototype quickly, but proved less flexible when the hardware design needed to be tested and precise timings derived. The subsequent model of the ARM hardware was also written in BASIC. It required a BBC Micro fitted with a 6502 second processor to run, and no further testing was required to verify the design. A team of four people worked on the design, with the two VLSI designers working on the device sharing a single workstation. The actual physical design of the chips was achieved using VLSI Technology's custom design tools. An event-driven simulator was designed, also in BASIC, which allowed the support chips, the video controller VIDC and memory controller MEMC (which both had slightly more complex timing requirements), and the I/O controller IOC, to be designed and tested. A development of this simulator, since rewritten in Modula-2 and then in C and known as ASIM, is still used by both Acorn and ARM Ltd. for design and testing today. The world's first commercial RISC processor
Page49

The first ARM processor, ARM1, yielded working silicon the first time it was fabricated, in April 1985 at VLSI Technology. It bettered the stated design goals while using fewer than 25 000 transistors. These samples were fabricated using a 3 \xb5 m process. There was a great deal of excitement at and confidence in the new chip. The ARM was used internally at Acorn and by Acorn developers when it was made available as a second processor addon for the BBC Micro; this device used the ARM1 as an additional coprocessor and accelerator for the 6502-based BBC micro. In fact, this second processor was used to improve the performance of the simulation tools the team had designed to finish the support chips and also to develop the next ARM processor. The second processor add-on also enabled third-party developers to start working with the processor and contemplating the development of software to exploit its advanced features. The purpose of releasing the second processor was to ensure that when a complete ARM-based system was released, potential users and developers had some experience of ARM and were not deterred from developing application software for it by the novelty of the technology and the lack of wide support for it in the market. Improving on ARM1 The experience of designing ARM1, and of programming the sample chips, showed that there were some areas where the instruction set could be improved in order to maximize the performance of systems based around it. In particular, the Multiply and Multiply and Accumulate instructions were added in order to improve performance by eliminating the use of slow subroutines for this purpose. Without this addition, the ARM could have been `horribly slow' in some circumstances, according to Furber.This addition would facilitate real-time digital signal processing, which was to be used to generate
sounds, an important feature of home and educational computers.
A coprocessor interface was also added to the ARM at this stage, which would enable a floating point accelerator and other coprocessors to be used with the ARM. Even after all these additions the ARM2 maintained its small die size and low transistor count; the die was 5.4 mm square and the transistor count around 25 000. This second device was also improved by being fabricated in a 2 \xb5 m process. That this was an extraordinary achievement, and that the ARM is an unusual processor in terms of size/performance, is shown more clearly in Figure 1.1 which shows the relative die size of the ARM and other processors Operating modes In all states there are seven modes of operation: User mode is the usual ARM program execution state, and is used for executing most application program Fast interrupt (FIQ) mode is used for handling fast interrupts
Interrupt (IRQ) mode is used for general-purpose interrupt handling Page50

Supervisor mode is a protected mode for the operating system Abort mode is entered after a data or instruction Prefetch Abort System mode is a privileged user mode for the operating system Undefined mode is entered when an Undefined instruction exception occurs.
Modes other than User mode are collectively known as privileged modes. Privileged modes are used to service interrupts or exceptions, or to access protected resources. Architecture
Page51
The ARM architecture forms the basis around which every ARM processor is built. Over time the ARM architecture has evolved to include architectural features to meet the growing demand for new functionality, high performance and the needs of new and emerging markets. The ARM architecture supports implementations across a wide range of performance points, and is established as the leading architecture in many market segments. The ARM architecture supports a very broad range of performance points leading to very small implementations of ARM processors, and very efficient implementations of advanced designs using state of the art micro-architecture techniques. Implementation size, performance, and low power consumption are key attributes of the ARM architecture. Architecture extensions were developed to provide support for Java acceleration (Jazelle), security (TrustZone), SIMD, and Advanced SIMD (NEON) technologies. The ARMv8-A architecture adds a Cryptographic extension as an optional feature. ELECTRONICS AND COMMUNICATION ENGINEERING Page52

The ARM architecture is generally described as a Reduced Instruction Set Computer (RISC) architecture, as it incorporates these typical RISC architecture features: A uniform register file load/store architecture, where data-processing operates only on register contents, not directly on memory contents. Simple addressing modes, with all load/store addresses determined from register contents and instruction fields only.
Enhancements to a basic RISC architecture enable ARM processors to achieve a good balance of high performance, small code size, low power consumption and small silicon area.
Serial communication drivers ARM PrimeCell UART (PL011) The UART (PL011) is an Advanced Microcontroller Bus Architecture (AMBA) compliant Systemon-Chip (SoC) peripheral that is developed, tested, and licensed by ARM.
Page53

The UART is an AMBA slave module that connects to the Advanced Peripheral Bus(APB). The UART includes an Infrared Data Association (IrDA) Serial InfraRed (SIR) protocol ENcoder/DECoder (ENDEC). The features of the UART are covered under the following headings: Features Programmable parameters on page 1-3 Variations from the 16C550 UART Programmable parameters The following key parameters are programmable: communication baud rate, integer, and fractional parts number of data bits number of stop bits parity mode FIFO enable (16 deep) or disable (1 deep) FIFO trigger levels selectable between 1/8, 1/4, 1/2, 3/4, and 7/8. internal nominal 1.8432MHz clock frequency (1.422.12MHz) to generate shorter bit duration hardware flow control. Additional test registers and modes are implemented for integration testing. Variations from the 16C550 UART The UART varies from the industry-standard 16C550 UART device as follows: receive FIFO trigger levels are 1/8, 1/4, 1/2, 3/4, and 7/8 the internal register map address space, and the bit function of each register differ the deltas of the modem status signals are not available. The following 16C550 UART features are not supported: 1.5 stop bits (1 or 2 stop bits only are supported) independent receive clock. UART hardware flow control The hardware flow control feature is fully selectable, and enables you to control the serial data flow by using the nUARTRTS output and nUARTCTS input signals. Figure shows how two devices can communicate with each other using hardware flow control.
low-power mode
Page54
When the RTS flow control is enabled, the nUARTRTS signal is asserted until the receive FIFO is filled up to the programmed watermark level. When the CTS flow control is enabled, the transmitter can only transmit data when the nUARTCTS signal is asserted. The hardware flow control is selectable through bits 14 (RTSEn) and 15 (CTSEn) in the UART control register (UARTCR). Table 2-3 shows how you must set the bits to enable RTS and CTS flow control both simultaneously, and independently Operation The operation of the UART is described in the following sections: Interface reset Clock signals UART operation IrDA SIR operation UART character frame IrDA data modulation. Interface reset The UART and IrDA SIR ENDEC are reset by the global reset signal PRESETn and a block-specific reset signal nUARTRST. An external reset controller must use PRESETn to assert nUARTRST asynchronously and negate it synchronously to UARTCLK. PRESETn must be asserted LOW for a period long enough to reset the slowest block in the on-chip system, and then be taken HIGH again. The UART requires PRESETn to be asserted LOW for at least one period of PCLK. Clock signals The frequency selected for UARTCLK must accommodate the desired range of baud rates: FUARTCLK (min) >= 16 x baud_rate (max) FUARTCLK(max) <= 16 x 65535 x baud_rate (min) ELECTRONICS AND COMMUNICATION ENGINEERING Page55

For example, for a range of baud rates from 110 baud to 460800 baud the UARTCLKfrequency must be within the range 7.3728MHz to 115MHz. The frequency of UARTCLK must also be within the required error limits for all baud rates to be used. There is also a constraint on the ratio of clock frequencies for PCLK to UARTCLK. The frequency of UARTCLK must be no more than 5/3 times faster than the frequency of PCLK:FUARTCLK <= 5/3 x FPCLK This allows sufficient time to write the received data to the receive FIFO About the programmers model The base address of the UART is not fixed, and can be different for any particular system implementation. However, the offset of any particular register from the base address is fixed. The following locations are reserved, and must not be used during normal operation: locations at offsets 0x008 through 0x014, 0x01C are reserved and must not be accessed locations at offsets 0x04C through 0x07C are reserved for possible future extensions locations at offsets 0x080 through 0x08C are reserved for test purposes locations at offsets 0x90 through 0xFCC are reserved for future test purposes location at offsets 0xFD0 through 0xFDC are used for future identification registers location at offsets 0xFE0 through 0xFFC are used for identification registers.
USB How do we port USB drivers on ARM. What part of code nee to be changed. Do we need to make changes in code, if yes where? Does USB Host controller code needs to be changed?
Page56

If you move a driver to a new hardware platform then any platform dependent code needs to be modified. Here are some areas you need to look at : 1) Driver binding. This happens during the initialization phase of the driver. You might have to make changes here. First try to get the driver to bind. If you can do that then you are heading the right direction. 2) Send/Recieve - When you send data out or recieve data those routines need to be modified. 3) Interrupt Routine - How intterupts are enabled/disabled, checked for. USB interface Universal Serial Bus (USB) is an industry standard developed in the mid-1990s that defines the cables, connectors and communications protocols used in a bus for connection, communication and power supply between computers and electronic devices. USB was designed to standardize the connection of computer peripherals, such as keyboards, pointing devices, digital cameras, printers, portable media players, disk drivesand network adapters to personal computers, both to communicate and to supply electric power. It has become commonplace on other devices, such as smartphones, PDAs andvideo game consoles. USB has effectively replaced a variety of earlier interfaces, such asserial and parallel ports, as well as separate power chargers for portable devices. Power The board is usually powered by the USB connection and can also be powered by an external; 5v regulated supply. on-board regulators generate 3.3v and 1.8v supplies. I2C Controller for Serial EEPROMS The I2C bus provides a simple two-wire means of communication. This protocol supports multimasters and provides a low-speed connection between intelligent control devices, such as microprocessors, and general-purposecircuits, such as memories. This reference design documents an I2C controller designed to interface with serial EEPROM devices. The designcan be used with a microprocessor to read the configuration data from a serial EEPROM that supports an I2C protocol. It is intended to be a simple I2C master using 7-bit addresses and providing random reads cycles only. Typically, serial EEPROMs are programmed at the time of board assembly. They store configuration information whichis read by a microprocessor during power-up.This design is implemented in Verilog and VHDL. Lattice design tools are used for synthesis, place and route andsimulation. The design can be targeted to multiple Lattice device families. Its small size makes it portable acrossdifferent FPGA/CPLD architectures. Applicable documents National Semiconductor NM24C16 16,384-Bit Serial EEPROM Philips I2C Specification ELECTRONICS AND COMMUNICATION ENGINEERING Page57

Lattice Semiconductor Data Sheets Theory of Operation Overview This I2C controller provides an interface between standard microprocessors and I2C serial EEPROM devices. Itacts as an I2C master to support random read cycles from the serial EEPROM. The design consists of the followingmodules: I2c Top-level module I2c_clk Clock generation module I2c_rreg Read register module I2c_st State machine module Interrupt handler The driver required an interupt handler to manage reads and writes. We needed both a top half for master reads, and a bottom half for slave receives. Master Reads In i2c-algo-pxa, master_xfer() calls i2c_pxa_do_xfer() which calls adap>wait_for_interrupt(). wait_for_interrupt points to i2c_pxa_handler(). So the way a master read works is: Wait until bus is free Write the slave address to the IDBR (I2C Data Buffer Register) Send a start condition Transfer a byte (the slave address) by calling adap->transfer() Call wait_for_interrupt() At this point, the kernel switches to other tasks. It waits a really long time (relatively speaking) until an interupt happens Call i2c_pxa_readbytes() to read the bytes into a buffer Loop Call adap->transfer() to initiate the transfer of a byte Call adap->wait_for_interrupt() to wait for it to actually show up Call adap->read_byte() read the value of the byte, and then stuff it in a buffer Send a stop condition The wait_for_interrupt() puts the current task on a wait queue. It will stay on the wait queue until an I2C interrupt occurs. Slave Receive ELECTRONICS AND COMMUNICATION ENGINEERING Page58

The approach taken for master reads won't work for slave receives. We can't call wait_for_interrupt() from within an interrupt handler (top half). What's needed is a bottom half for the interrupt handler. The bottom half is called after returning from the interrupt. Using drivers/block/floppy.c as a model, the interrupt handler does: DECLARE_WORK(i2c_pxa_work, NULL, NULL) i2c_pxa_work is created by this macro PREPARE_WORK(&i2c_pxa_work, (void *)i2c_pxa_slave_receive); Point the work structure to the function which needs to be run Schedule_work(&i2c_pxa_work); This puts the work on the work queue
Page59
Page60

SPI Bus Controller Driver This is a completely new driver type used by Nut/OS. While normal device drivers offer a few functions similar to the C runtime library I/O, the bus controller driver offers his service to other device drivers, which control a specific device attached to a bus. As explained above, the bus controller deals with the SPI hardware, removing this burden from the SPI device driver. However, only parts of the controller driver are platform dependent. The file /dev/spibus.c contains those parts, which are not. Beside some optional routines, it contains the function NutRegisterSpiDevice, which updates the NUTSPINODE structure and then calls the bus controller initialization and the device initialization. We will look to all this one after the other. Let's first return to the NUTSPIBUS structure. It mainly contains a number of function pointers, to be used by the device driver to communicate with the hardware chip that is connected to the bus. A certain sequence has to be followed here. First, the device driver must allocate the bus to make sure that no other driver will use it concurrently. If successful, it can call the bus transfer routine, specifying a read buffer, write buffer and the number of bytes to transfer. This call may be repeated and finally the driver must release the bus. Access to the SPI controller hardware as well as bus arbitration is implemented in the bus controller driver. SPI Bus Device Driver In general an SPI bus device driver is designed like any other Nut/OS device driver, providing functions for open, read, write, close and ioctl. However, the new structure NUTSPINODE had been defined. It contains several items, which let the device driver and the bus controller driver communicate with each other. Some items of the NUTSPINODE structure are statically set, like the SPI mode, the SPI data rate etc. Others are not, like the chip select or the pointer to the NUTSPIBUS structure. These items are set in when calling NutRegisterSpiDevice. Note, that the statically set values are initial values only and may change during runtime. The NUTSPINODE structure is actually part of the NUTDEVICE structure. Without this structure, which is attached to the interface control block pointer dev_icb of NUTDEVICE, the driver is not ready for the new framework. In many drivers this pointer is not used and set to NULL. Network drivers use it to store some network specific values.
Page61

Cycle 2

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Cycle 2

Diunggah oleh

Hak Cipta:

Format Tersedia

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title] 1. DESIGN OF RTOS KERNEL

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

[Type the document title]

[Type the document title]

TASK CONTROL BLOCK

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

XXX_Kmap XXX_PassToUserSpace XXX_Mmap XXX_PurgeMemory/Kfree XXX_Kmalloc

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

[Type the document title]

[Type the document title]

System Memory Pool 0

[Type the document title]

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

Mutex Semaphore High Priority

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title] 2. STUDY OF REAL TIME OPERATING SYSTEMS

[Type the document title]

[Type the document title]

[Type the document title]

[Type the document title]

Here are some of the open sourceRTOSes/OSes eCOS uClinux

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

3. DEVELOPMENT OF DEVICE DRIVERS FOR RTLINUX

[Type the document title]

[Type the document title]

[Type the document title]

[Type the document title]

Waking and Suspending RTLinux Threads

int pthread_wakeup_np(pthread_t thread); int pthread_suspend_np(void);

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

Writing RTLinux Schedulers

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title] 4. SOFTWARE DEVELOPMENT FOR DSP APPLICATIONS

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]

[Type the document title]

ELECTRONICS AND COMMUNICATION ENGINEERING

[Type the document title]