CYCLE-II
Page1
CASE STUDIES
Page2
Event-driven which switches tasks only when an event of higher priority needs servicing, called preemptive priority, or priority scheduling. Time-sharing designs switch tasks on a regular clock interrupt, and on events, called round robin. Time-sharing designs switch tasks more often than strictly needed, but give smoother multitaskinggiving the illusion that a process or user has sole use of a machine Early CPU designs needed many cycles to switch tasks, during which the CPU could do nothing else useful. For example, with a 20 MHz 68000 processor (typical of late 1980s), task switch times are roughly 20 microseconds. (In contrast, a 100 MHz ARM CPU (from 2008) switches in less than 3microseconds.) Because of this, early OSes tried to minimize wasting CPU time by avoiding unnecessary task switching. In typical designs, a task has three states: 1. Running (executing on the CPU); 2. Ready (ready to be executed); 3. Blocked (waiting for an event, I/O for example).
Page3
Page4
TRUCTURE OF RTOS
REAL-TIME SYSTEM COMPONENTS To build a predictable system, all of its components, hardware and software, plus good design are contributing to this predictability. Having both good hardware and a good RTOS is a minimal but not sufficient requirement for building a correct real-time system. A wrongly designed system with excellent hardware and software building blocks may still lead to disaster. The document deals with the RTOS building block.in general a good RTOS can be defined as one that has a bounded (predictable) behavior under all system load scenarios. REAL-TIME SYSTEM TYPES An embedded system does not necessarily need a predictable behavior, and in that case it is not a real-time system. However, a quick overview of all possible embedded system shows that you will rapidly find the need for some predictable behavior and therefore most of embedded systems need to be real-time for at least some of their functionality. In a well-designed RT system, each individual deadline should be met with the actual state of the art, it is sometimes hard also costly to achieve this requirement. Therefore people invented different types of real-time systems. ELECTRONICS AND COMMUNICATION ENGINEERING Page5
HARD REAL-TIME: Missing an individual deadline results in catastrophic failure of the system (and people will hopefully invest sufficient money in this project in order to avoid this catastrophic failure). It also means that the cost of the failure is very high. FRIM REAL-TIME: Missing a deadline entails an unacceptable quality reduction, technically there is no difference with HARD RT but economically, the disaster risk and associated cost is limited compared to previous case. SOFTREAL-TIME: Deadlines may be missed and can be recovered from. The reduction in system quality and performance is acceptable and does not introduce another than technical or feature cost. NON REAL-TIME: No defines have to be met. These systems are defined in terms of average performance. ECONOMICAL REAL-TIME: We recently introduced this notion because it is in practice impossible to design a system that will never miss any deadline. The real question is therefore what is the price or economic damage one wants to pay for a missed deadline introducing a safely hazards or a quality of serving reduction. RTOS KERNEL
RTOS are quite large and complex pieces of software that provides a wide variety of services together form an abstraction layer that allows embedded applications programmers to do many ELECTRONICS AND COMMUNICATION ENGINEERING Page6
Page7
Page8
RTOS KERNEL: Memory Memory is premium in environments where RTOS work. Supports Virtual Memory (MMU) and Memory Protection (MPU) models. User space and Kernel space memory
Page9
Participation of User space programs with kernel for services and as a central pool of memory for specialized applications
RTOS KERNEL: Timers Timer is software entity derived from hardware clock. Timer provides mechanism to introduce task-delays and/or to help synchronize and provide time off-course. Watchdog Timers, Programmable Timers ELECTRONICS AND COMMUNICATION ENGINEERING Page10
Based upon these hardware-programmable timers, RTOS kernel can use to create software structures of timers associated with tasks. Scheduling, Synchronization, time-stamping XXX_SetTimer XXX_AddtoTimerQueue XXX_isExpired XXX_RunAtExpiry XXX_PurgeTimerQueue
RTOS KERNEL: I/O I/O is slow as compared to CPU. I/O: Interrupt-driven, Polling, DMA. I/O map: Memory Space & IO Space. XXX_IORead/IOWrite XXX_IOMap/Unmap XXX_BindInterrupt
Page11
RTOS KERNEL: Inter-process Communication Most of the time tasks cannot run in isolation. They require to talk to each other. Synchronization, Protection and Sharing are goals of IPC.
Semaphores (Binary, Mutual) Message Queues Pipes/Named Pipes Shared Memory Page12
A host controller driver enables system to accept a particular type of device. Client drivers are device specific. Protocol layer converts device request to form that is understood by corresponding host controllers through their drivers. Expectations from RTOS Deadline-driven Work with Dearth of Resources ELECTRONICS AND COMMUNICATION ENGINEERING Page13
VxWORKS
VxWorks was developed from an old operating system VRTX. VRTX did not function properly as an operating system so Wind River acquired the rights to re-sell VRTX and developed it into their own workable operating system VxWorks (some say this means VRTX now works). Wind River then went on to develop a new kernel for VxWorks and replaced the VRTX kernel with this. This enabled VxWorks to become one of the leading operating systems created for the purpose of real time applications.Memory Management in VxWorks VxWorks memory management system does not use swapping or paging. This is because the system allocates memory within the physical address space without the need of swapping data in and out of this space due to memory constraints. VxWorks assumes that there is enough physical memory ELECTRONICS AND COMMUNICATION ENGINEERING Page14
Kernel
In most cases, the VxWorks memory model is flat and the operating system has just the limits of the available physical memory to work with. When memory needs to be allocated, the dynamic memory management routines use the ANSI standard C function malloc(), a first-fit algorithm. It searches for the first available block of memory that the program requires, splits the correct amount of memory required for the task, and leaves the rest. When the memory needs to be freed up, the ANSI-C function free() is used. Using the first-fit method can lead to the fragmentation of all of the memory very quickly, and obviously if lots of different sized tasks are loaded into memory between lots of free spaces this can lead to no space being available at all when the next allocation request is given. Also, searching through many fragments of memory will take longer when trying to find space for allocation. It is advised to reduce fragmentation that the dynamic partition sizes are allocated on system initialisation dependent on the size of the tasks/task data. Another way to reduce fragmentation in all of the system memory is to partition the memory off into two or more sections at start-up and ask the offending tasks to use the other partitions. Processes in VxWorks A feature of a real-time operating system like VxWorks is that the programmer developing applications for the operating system can control the process scheduling. This means that the programmer is in control of how and when the processes run rather than leaving the operating system to work it out. In terms of other operating systems a VxWorks task is similar to a thread, with the exception that VxWorks has one process in which the threads are run. In a RTOS, these tasks must endeavor to ELECTRONICS AND COMMUNICATION ENGINEERING Page15
Page17
3
Medium Priority
2
Bus Managemen
Low Priority
1 `
4
Meteorological
Power Management Operating systems have to deal with the power management of its resources within the system they are operating. The operating system controls all of the devices within the system and thus has to decide which ones need to be switched on and operating and which ones are not currently needed and should be sleeping or switched off. It is entirely possible for the operating system to shut down a device, which will be needed again quickly, and this can cause a delay while the device restarts. However it is also possible for a device to be kept on too long and it will be wasting power. The only way around this issue is to use algorithms which enable the operating system to make appropriate decisions about which device to shut down and what conditions dictate when it should be powered up again. Examples of such management are slowing disks down during periods of inactivity, and reducing CPU power during idle periods. This is called dynamic power management. The problem now however is that the processors are now very power efficient and therefore are not likely to be the primary consumer of energy in systems. This now means that dynamic power management is becoming outdated and a new method of power management will need to be ELECTRONICS AND COMMUNICATION ENGINEERING Page18
Page19
Time-sharing designs switch tasks on a regular clock interrupt, and on events, called round robin.Time-sharing designs switch tasks more often than strictly needed, but give smoother multitasking, giving the illusion that a process or user has sole use of a machine.
Early CPU designs needed many cycles to switch tasks, during which the CPU could do nothing else useful. For example, with a 20 MHz 68000 processor (typical of late 1980s), task switch times are roughly 20 microseconds. (In contrast, a 100 MHz ARM CPU (from 2008) switches in less than 3 microseconds.) Because of this, early OSes tried to minimize wasting CPU time by avoiding unnecessary task switching. Scheduling In typical designs a task has three states: 1. 2. 3. Running (executing on the CPU); Ready (ready to be executed); Blocked (waiting for an event, I/O for example).
Most tasks are blocked or ready most of the time because generally only one task can run at a time per CPU. The number of items in the ready queue can greatly vary, depending on the number of tasks ELECTRONICS AND COMMUNICATION ENGINEERING Page20
Integrity The INTEGRITY RTOS was designed so that embedded developers could ensure their applications met the highest possible requirements for security, reliability, and performance. To achieve this, INTEGRITY uses hardware memory protection to isolate and protect embedded applications. Secure partitions guarantee each task the resources it needs to run correctly and fully protect the operating system and user tasks from errant and malicious codeincluding denial-ofservice attacks, worms, and Trojan horses. Unlike other memory-protected operating systems, INTEGRITY never sacrifices real-time performance for security and protection. Nucleus Nucleus OS is a real-time operating system (RTOS) and toolset created by the Embedded Systems Division of Mentor Graphics for various central processing unit (CPU) platforms. Nucleus OS is an embedded software solution and is in an estimated 2.11 billion devices worldwide. Development is typically done on a host computer running Windows or Linux. Applications are compiled to run on various target CPU architectures and tested using the actual target boards or in simulation environment. The Nucleus RTOS is designed for embedded systems applications including consumer electronics, set-top boxes, cellular phones, and other portable and handheld devices. For limited memory systems Nucleus RTOS can be scaled down to a memory footprint as small as 13 KB for both code and data.
Page24
ThreadX ThreadX, developed and marketed by Express Logic, Inc. of San Diego, California, USA, is a realtime operating system (RTOS). Similar RTOSes are available from other vendors such as VxWorks, Nucleus RTOS, OSE, QNX, LynxOS, etc. The author of ThreadX (as well as Nucleus) is William Lamie, who is the President and CEO of Express Logic, Inc. The name ThreadX is derived from the fact that threads are used as the executable modules and the letter X represents switching, i.e., it switches threads. ThreadX can be seen as the "QThreads" of SystemC implemented in preemptive fashion. Like most RTOSes, ThreadX uses a multitasking kernel with preemptive scheduling, fastinterrupt response, memory management, interthread communication, mutual exclusion, event notification, and thread synchronization features. Major distinguishing characteristics of ThreadX include priority inheritance, preemption-threshold, efficient timer management, picokernel design, event-chaining, fast software timers, and compact size. ThreadX is distributed using a marketing model in which source code is provided and licenses are royalty-free. ThreadX is generally used in real-time embedded systems, especially in deeply embedded systems. Developing embedded systems using ThreadX is usually done on a host machine running Linux or Microsoft Windows, using cross-compiling target software to run on various target processor architectures. Several ThreadX-aware development tools are available, such as Wind River Workbench, ARMRealView, GreenHillsSoftware's MULTI, Metrowerks CodeWarrior, IAR CSPY, Lauterbach TRACE32, and visionCLICK. Hewlett-Packard has licensed the use of ThreadX for all Inkjet, Laserjet and all-in-one devices recently. Earlier they were using lynxOS for Multifunctional laserjet printers and still many printers use lynxOS. ThreadX is widely used in a variety of consumer electronics, medical devices, data networking applications, and SoC development. VxWorks VxWorks is a real-time operating system developed as proprietary software by Wind River Systems of Alameda, California, USA. First released in 1987, VxWorks is designed for use in embedded systems VxWorks memory management system does not use swapping or paging. This is because the system allocates memory within the physical address space without the need of swapping data in and out of this space due to memory constraints. VxWorks assumes that there is enough physical memory available to operate its kernel and the applications that will run on the operating system. Therefore VxWorks does not have a directly supported virtual memory system.
Page25
, and
pthread_attr_setcpu_np(3)
which are used to get and set general attributes for the scheduling parameters and the CPUs in which the thread is intended to run.The ID of the newly created thread is stored in the location pointed to by ``thread''. The function pointed to by start_routine is taken to be the thread code. It is passed the ``arg'' argument.To cancel a thread, use the POSIX function:pthread_cancel(pthread thread); Time Facilities RTLinux provides several clocks that can be used for timing functionality, such as as referencing for thread scheduling and obtaining timestamps. Here is the general timing API: #include <rtl_time.h> int clock_gettime(clockid_t clock_id, struct timespec *ts); hrtime_t clock_gethrtime(clockid_t clock); struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; To obtain the current clock reading, use the clock_gettime(3) function where clock_id is the clock to be read and ts is a structure which stores the value obtained. A Simpl ``Hello World'' RTLinux program We'll now write a small program that uses all of the API that we've learned thus far. This program will execute two times per second, and during each iteration it will print the message: I'm here, my arg is 0 Code Listing Save the following code under the filename hello.c: #include <rtl.h> #include <time.h> ELECTRONICS AND COMMUNICATION ENGINEERING Page28
Locate and copy the rtl.mk file. The rtl.mk file is an include file which contains all the flags needed to compile our code. For simplicity, we'll copy it from the RTLinux source tree and place it alongside of our hello.c file.
2.
Insert the module into the running RTLinux kernel. The resulting object binary must be ``plugged in'' to the kernel, where it will be executed by RTLinux.
3.
The Advanced API: Getting More Out of Your RTLinux Modules RTLinux has a rich assortment of functions which can be used to solve most realtime application problems. This chapter describes some of the more advanced concepts. ELECTRONICS AND COMMUNICATION ENGINEERING Page29
#include <mbuff.h> void * mbuff_alloc(const char *name, int size); void mbuff_free(const char *name, void * mbuf);
The first time mbuff_alloc is called with a given name, a shared memory block of the specified size is allocated. The reference count for this block is set to 1. On success, the pointer to the newly allocated block is returned. NULL is returned on failure. If the block with the specified name already exists, this function returns a pointer that can be used to access this block and increases the reference count.
Page30
Page31
The scheduler is implemented in the scheduler/rtl_sched.c file The scheduler's and scheduler/i386 architecture-dependent files are located in include/arch-
The scheduling decision is taken in the rtl_schedule() function. Thus, by modifying this function, it is possible to change the scheduling policy. Further questions in this area may be
addressed directly to the FSM Labs Crew.
Conclusions When choosing a real-time operating system it is important to know what features you need. Our comparison of FreeRTOS and RTLinux has shown that two real-time operating systems can in fact be very dissimilar.One of the biggest dierences between FreeRTOS and RTLinux are their sizes. One thing thatmay need to be considered is what platforms are available for the project. The platform choiceis often directly related to the processing power needed or restrictions when it comes to physicalsize. FreeRTOS supports many small processors and microcontrollers. If a microcontroller cando the job at hand, then FreeRTOS is probably the way to go. On the other hand, if muchprocessing power is needed, the platform of choice is probably something larger like an x86 system. Then RTLinux is a good choice.Beside from their platform support, the two operating systems are very dierent in what featuresthey have. FreeRTOS is very simple, and though that can be considered a bad thing, a smallproject can probably benet from the simplicity. Large projects, however, will prot from theextra functions in RTLinux. Many real-time projects often include one part that is time-critical and one that isn't. If the non time-critical part is a relatively large part of the project then RTLinux has many thing to oer. If a network connection, a graphical interface or something else fairly complex is needed, RTLinux denitely has the upper hand. One last thing to consider is the economics. In a commercial project, the ability to use the GPL license doesn't always exist. In this case it can be worth noting that FreeRTOS can be used in a commercial project without paying royalties, while RTLinux can't. The scheduler in the two system is very dierent in the aspect that FreeRTOS is smaller and more simple while RTLinux of course has a bigger and more complex scheduler. Although FreeRTOSs scheduler works good with limited amount of predened-tasks (an embedded system). It or you will probably run in to troubles when trying to do thing close to the system limit. This is because there is no more advanced scheduler than highest priority rst. Both systems might run into problems when the idle-process get starved. Though not necessarily problems with the RT-task but problems like logging and controlling. Especially if RTLinux is used for data pro ELECTRONICS AND COMMUNICATION ENGINEERING Page32
Page33
Page34
C6412Compact
Page36
C6713CPU Development Tools & Software Micro-line embedded DSP hardware is complemented by a fully integrated software framework for developing application software. This provides the means to develop and debug software for the TMS320C6000 DSP processor and to load the final application code to the on-board FLASH ROM, allowing the DSP to boot autonomously with application software. Developers utilizing the FireWire capabilities of micro-line will have access to the TMS320C6000 based FireWire API and can optionally select a Windows based Host API for integrating the microline hardware with a host PC. Texas Instruments Code Composer Studio: IDE with advanced C Code Generation Tools, JTAG based C-Source Debugging and pre-emptive multi-tasking support for TMS320C6000 DSPs USB JTAG In-Circuit Emulator Micro-line FLASH Filesystem and supporting Host PC resident software tools for downloading and managing executable software via RS-232 link micro-line FireWire API for embedded micro-line DSPs Unibrain FireAPI host computer software interface for integrating micro-line hardware with Host Computers
Page37
Timer and Periodic Function Management Memory Management On the fly run-time analysis using RTDX Real-Time Data eXchange Interface
PMP1000 - Texas Instruments C6414 72 GIPS Parallel DSP Board The PMP1000 is a parallel processing DSP board that features industry leading processing power and data throughput capabilities. The processing power is provided by multiples of Texas Instruments highest performance TMS320C6414 DSPs and the high data throughput is facilitated by the internal mechanization of the PMP1000, which utilizes multiple high-speed buses combined with cross-point port switching to connect "anything to anything"; at full parallel bus bandwidth. Due to the throughput and processing power of the PMP1000, many processes that were previously performed off-line can now be performed in REAL TIME. The unique PMP1000 architecture incorporates a master DSP called the Program Execution Processor (PEP) and options of four or eight slave DSPs. In the PMP1000 mechanization, all slave DSPs execute program threads managed by the PEP. Theslave DSPs are mounted four to a daughter card called a "Quad DSP Array" or QDA. Each of these slave DSPs has 64 megabytes of external SDRAM (mapped into each of the DSPs) in addition to more than one megabyte of internal RAM. Operating at a clock rate of 1 GHz, each DSP is capable of executing up to 8 instructions per clock cycle for a peak processing performance of 8 GIPS per DSP or 72 GIPS for the entire board. ELECTRONICS AND COMMUNICATION ENGINEERING Page38
Real-Time Parallel DSP Software Development System Writing software to operate parallel processors has traditionally been a complex undertaking. The DSP Software Development System is designed to make it as simple as possible to create an executable application. The DSP Software Integrator provides unique development and optimization environments with innovative tools for creating and debugging an application. Despite the high-performance nature of Signatec's parallel DSP board PMP1000's hardware configuration, it surprisingly lends itself to a software mechanization that is unique, easy-to-use, and efficient, especially for a parallel processing system. The PMP1000 operation consists of 8 (or 4 when using a single QDA module) DSPs that process program threads as directed and a master DSP designated as the Program Execution Processor (PEP). The primary tasks for the PEP are to dynamically distribute the threads and to manage all data flow. The PEP executes main, which consists of processor function calls (program threads) and data transfer functions. Signatec use the term thread to be consistent with generally accepted usage. In the PMP1000 world, a thread is a C function. In some applications, this function will be the entire computational process to be applied to a data set. Software Integrator The PMP1000 Software Integrator ties together a number of software components from Signatec and Texas Instruments. It provides a true Windows interface for the TI Tools and supplies a quick link to the text editing and compiler/linker facilities of a C program-editing environment. The Integrator ELECTRONICS AND COMMUNICATION ENGINEERING Page39
Four-core MSC8154 processor and Development Support: Freescale Semiconductor introduces the MSC8154 processor a four-core version of Freescales award-winning, high-performance MSC8156 digital signal processor (DSP) The MSC8154 is a single device that features four fully-programmable 1GHz DSP cores delivering 4GHz of DSP processing power, plus the innovative MAPLE-B multi-standard baseband specific accelerator. This combination, delivered in a single system-on-chip, is ideal for use in cost-optimized systems. The industry-leading SC3850 DSP core enables the MSC8154 to deliver up to 32 GMACS of 16-bit performance. The processor features high speed standard interfaces (Dual sRIO, Dual SGMII, Dual DDR-3 and PCI Express technologies) and large embedded multilevel memory with high speed DDR interfaces. The Freescale MSC8154AMC reference development system is a high density, single width, full height DSP platform, based around three MSC8154 DSPs. With a high level of performance and integration, the MSC8154AMC is an ideal enablement platform for customers and third parties who are developing and debugging solutions for the next generation of wireless standards such as 3GLTE, WiMAX, HSDPA+ and TDD-LTE. Each MSC8154 has 1GB of associated 64-bit-wide DDR3 memory split into two banks. High throughput RapidIO interface links connect the MSC8154 processors to each other and to the data backplane. The RapidIO interfaces interconnect via the IDT CPS10Q sRIO switch. For the control/data plane, each of the two RGMII gigabyte Ethernet ports links to the backplane ports and a front panel port through an Ethernet switch. A Module Management Controller (MMC) performs hot swapping and board control. Freescale also offers a full set of development tools and enablement software for the MSC8154 and MSC8156 DSPs including a highly optimized 3G-LTE software library. Freescales MSC8156 DSP wireless infrastructure customers are already utilizing this 3GPP LTE software kernel library, which provides highly optimized modules for both the uplink and downlink shared channels. The code is delivered with comprehensive test harnesses, documentation and recommendations of how to architect code for interfacing withthe MSC8154 or the MSC8156. DSP device drivers Digital signal processors (DSPs) are now often integrated on-chip with numerous peripheral devices, such as serial ports, UARTs, PCI, or USB ports. As a result, developing device drivers for DSPs requires significantly more time and effort than ever before. A DSP device-driver architecture that reduces overall driver development time by reusing code across multiple devices. We'll also look in-depth at an audio codec driver created using this architecture. The design and code examples are based on drivers developed for the Texas Instruments' DSP/BIOS operating system, though the same approach will work in any system. ELECTRONICS AND COMMUNICATION ENGINEERING Page41
A DSP codec driver often involves configuring more than just the codec Device-driver architecture: A device driver performs two main functions: device configuration and initialization data movement Device configuration is, by definition, specific to a particular device. Data movement, on the other hand, is more generic. In the case of a streaming data peripheral like a codec, the application ultimately expects to send or receive a stream of buffers. The application shouldn't have to worry about how the buffers are managed or what type of codec is being used, beyond issues such as data precision (number of bits in the sample).any driver can be divided into two parts: a class driver that handles the application interface and OS-specifics and a mini-driver that addresses the hardware specifics of a particular device.
Any driver can be divided into two parts: a class driver and a mini-driver
Page42
Modern DSPs: Modern signal processors yield greater performance; this is due in part to both technological and architectural advancements like lower design rules, fast-access two-level cache, (E)DMA circuitry and a wider bus system. Not all DSP's provide the same speed and many kinds of signal processors exist, each one of them being better suited for a specific task, ranging in price from about US$1.50 to US$300 Texas Instruments produces the C6000 series DSPs, which have clock speeds of 1.2 GHz and implement separate instruction and data caches. They also have an 8 MiB 2nd level cache and 64 EDMA channels. The top models are capable of as many as 8000 MIPS (instructions per second), use VLIW (very long instruction word), perform eight operations per clock-cycle and are compatible with a broad range of external peripherals and various buses (PCI/serial/etc). TMS320C6474 chips each have three such DSP's, and the newest generation C6000 chips support floating point as well as fixed point processing. Freescale produces a multi-core DSP family, the MSC81xx. The MSC81xx is based on StarCore Architecture processors and the latest MSC8144 DSP combines four programmable SC3400 StarCore DSP cores. Each SC3400 StarCore DSP core has a clock speed of 1 GHz. XMOS produces a multi-core multi-threaded line of processor well suited to DSP operations, They come in various speeds ranging from 400 to 1600 MIPS. The processors have a multi-threaded architecture that allows up to 8 real-time threads per core, meaning that a 4 core device would support up to 32 real time threads. Threads communicate between each other with buffered channels that are capable of up to 80 Mbit/s. The devices are easily programmable in C and aim at bridging the gap between conventional micro-controllers and FPGA's CEVA, Inc. produces and licenses three distinct families of DSPs. Perhaps the best known and most widely deployed is the CEVA-TeakLite DSP family, a classic memory-based architecture, with 16bit or 32-bit word-widths and single or dual MACs. The CEVA-X DSP family offers a combination of VLIW and SIMD architectures, with different members of the family offering dual or quad 16-bit MACs. The CEVA-XC DSP family targets Software-defined Radio (SDR) modem designs and leverages a unique combination of VLIW and Vector architectures with 32 16-bit MACs. Analog Devices produce the SHARC-based DSP and range in performance from 66 MHz/198 MFLOPS (million floating-point operations per second) to 400 MHz/2400 MFLOPS. Some models support multiple multipliers and ALUs, SIMD instructions and audio processingspecific components and peripherals. The Blackfin family of embedded digital signal processors combine the features of a DSP with those of a general use processor. As a result, these processors can run simple operating systems like CLinux, velOSity and Nucleus RTOS while operating on realtime data. ELECTRONICS AND COMMUNICATION ENGINEERING Page43
The SymNet 8x8 DSP is the original model in the SymLink series of network audio processors. It is the hardware platform used to execute system designs created in SymNet Designer software. Up to sixteen 8x8 DSPs, or other SymLink hardware models, can be networked together in a ring topology via the low latency, 64-channel SymLink Bus to provide high channel-count processing systems for use in convention centers, arenas, university and corporate campuses, large houses of worship, theaters, hotels, and casinos. Features ELECTRONICS AND COMMUNICATION ENGINEERING Page44
Page45
Load/store architecture An orthogonal instruction set Mostly single-cycle execution A 16x32-bit register Enhanced power-saving design.
ARM processor from their beginnings as the proprietary solution for a particular set of problems in a particular company to their current status as a highly successful, flexible and customizable set of processors available on the open market. While some aspects of this story are of purely anecdotal interest, others shed light on some ARM design decisions, which were taken in an unusual set of circumstances to meet specific goals, now seen to meet the demands of an innovative and exciting market place requiring good performance and low power consumption, balanced with low cost. British readers will probably be familiar with Acorn Computers Ltd, its products and its history of phenomenal success in the UK computer market of the early 1980s. Other readers may not have had access to as much information on the vibrant home computer market in the UK then, or to Acorn's record for technical innovation. The story starts with the original development of the ARM processor, and ends with the establishment of ARM Ltd as a global force in the microprocessor industry. In between, it sheds some light on various design decisions which were taken in the genesis of the ARM design. ELECTRONICS AND COMMUNICATION ENGINEERING Page46
Page49
A coprocessor interface was also added to the ARM at this stage, which would enable a floating point accelerator and other coprocessors to be used with the ARM. Even after all these additions the ARM2 maintained its small die size and low transistor count; the die was 5.4 mm square and the transistor count around 25 000. This second device was also improved by being fabricated in a 2 \xb5 m process. That this was an extraordinary achievement, and that the ARM is an unusual processor in terms of size/performance, is shown more clearly in Figure 1.1 which shows the relative die size of the ARM and other processors Operating modes In all states there are seven modes of operation: User mode is the usual ARM program execution state, and is used for executing most application program Fast interrupt (FIQ) mode is used for handling fast interrupts
Modes other than User mode are collectively known as privileged modes. Privileged modes are used to service interrupts or exceptions, or to access protected resources. Architecture
Page51
The ARM architecture forms the basis around which every ARM processor is built. Over time the ARM architecture has evolved to include architectural features to meet the growing demand for new functionality, high performance and the needs of new and emerging markets. The ARM architecture supports implementations across a wide range of performance points, and is established as the leading architecture in many market segments. The ARM architecture supports a very broad range of performance points leading to very small implementations of ARM processors, and very efficient implementations of advanced designs using state of the art micro-architecture techniques. Implementation size, performance, and low power consumption are key attributes of the ARM architecture. Architecture extensions were developed to provide support for Java acceleration (Jazelle), security (TrustZone), SIMD, and Advanced SIMD (NEON) technologies. The ARMv8-A architecture adds a Cryptographic extension as an optional feature. ELECTRONICS AND COMMUNICATION ENGINEERING Page52
Enhancements to a basic RISC architecture enable ARM processors to achieve a good balance of high performance, small code size, low power consumption and small silicon area.
Serial communication drivers ARM PrimeCell UART (PL011) The UART (PL011) is an Advanced Microcontroller Bus Architecture (AMBA) compliant Systemon-Chip (SoC) peripheral that is developed, tested, and licensed by ARM.
Page53
low-power mode
Page54
When the RTS flow control is enabled, the nUARTRTS signal is asserted until the receive FIFO is filled up to the programmed watermark level. When the CTS flow control is enabled, the transmitter can only transmit data when the nUARTCTS signal is asserted. The hardware flow control is selectable through bits 14 (RTSEn) and 15 (CTSEn) in the UART control register (UARTCR). Table 2-3 shows how you must set the bits to enable RTS and CTS flow control both simultaneously, and independently Operation The operation of the UART is described in the following sections: Interface reset Clock signals UART operation IrDA SIR operation UART character frame IrDA data modulation. Interface reset The UART and IrDA SIR ENDEC are reset by the global reset signal PRESETn and a block-specific reset signal nUARTRST. An external reset controller must use PRESETn to assert nUARTRST asynchronously and negate it synchronously to UARTCLK. PRESETn must be asserted LOW for a period long enough to reset the slowest block in the on-chip system, and then be taken HIGH again. The UART requires PRESETn to be asserted LOW for at least one period of PCLK. Clock signals The frequency selected for UARTCLK must accommodate the desired range of baud rates: FUARTCLK (min) >= 16 x baud_rate (max) FUARTCLK(max) <= 16 x 65535 x baud_rate (min) ELECTRONICS AND COMMUNICATION ENGINEERING Page55
USB How do we port USB drivers on ARM. What part of code nee to be changed. Do we need to make changes in code, if yes where? Does USB Host controller code needs to be changed?
Page56
Page59
Page60
Page61