Anda di halaman 1dari 6

TerjeMathisen. Pentium Secrets. Internet: http://www.gamedev.

net/page/resources/_/technical/general-programming/pentiumsecrets-r213 Comments: Basic, good, easy-to-understand, tool P5Stat, Pentium Summary: In this document Terje Mathisen tells us about his journey in understanding the machine specific registers (MSRs). He observes that Intel hides information from the users in their user manuals so the best place to collect information for code optimization is to look in the documented RDMSR and WRMSR. These are instructions which work on a set of 64-bit MSRs. To use these instructions, we have to move the register identifier of the desired MSR to the register ECX. Now by invoking the RDMSR the content of the MSR, whose identifier is loaded in the ECX, will be transferred into the paired registered EDX: EAX. While WRMSR will copy the content of these two registers into the internal registers. Intel hide information in the Pentium user manual about some specific registers so he decided to read these MSRs by using a test programs. By reading the values of all MSRs, he got to know that all the registers value is almost static except the MSR 10h. By little research he got to know that basically MSR 10h is a running counter which counts every second since your Pentium processor is powered on. And this counter is the most precise counter available to 80X86 programs. So if we use this counter to read the time of the execution of a program, it will give the most accurate execution time. RDTSC also provide the same result as that RDMSR but it will load data slightly faster than RDMSR. RDMSR and RDTSC are kernel mode commands. Then he found a utility that dynamically display the two internal counters from the 38 counters. He just got the executable file .He dissembled it into listing and find out how these registers work. Lower 32 bits of MSR 11h is the controller of the Pentium hardware counters. Lower 16 bits determine the content of MSR 12h and next 16 bit determines the content of MSR 13h.These two are the hardware performance counters. Lack of MSRs is the reason because of which we cant determine state of more than two counters at the same time. The encoding of MSR 11h is same for lower and next 16 bits. 0-5 bits are an index to list of

available hardware counters. By enabling the bit 6, counter will be accessible in ring 0, 1 and 2. If we want to make it accessible in ring 3, we have to enable the bit 7. The Bit 8 is used whether to measure the number of hardware events or the CPU cycle of particular event. So by measuring these two quantities at the same time in two different counters we can measure the average time of a particular task. From all this information he writes a program P5stat which profile about the different hardware counters. By running this program 20 times we can access all 38 counter in 2 counters each step. By using these counters crucial bottle necks of the program can be found out and can be improved by rewriting some of the inner loops. Difficult points/statements/keywords/words: The table of Pentium counters is different from the table of counters available in Embedded Pentium Processor Family Developers Manual, why so? Important points if any*: Tools developed to access the Hardware performance counter are useful in code optimization and also it can identify the crucial bottlenecks. Pros: MSR 11h is the most precise counter available to 80X86 programs so it can be used to find the execution time of a program. By setting the bit 6 or 7 MSR 11h, we can make a command kernel mode or user mode. By using the hardware performance counter, average time of a particular task can be measured accurately. By using these counters crucial bottle necks of the program can be found out and can be improved by rewriting some of the inner loops.

Cons: Intel hides information from the users in their user manuals RDMSR and RDTSC are kernel mode instructions so we cant use it directly in user mode. Pentium only support two counters, that is why we can read only two counters at a time.

Otto Bruggeman. Intel Performance Counter Monitor - A better way to measure CPU utilization. internet: http://software.intel.com/en-us/articles/intel-performance-countermonitor/ Comments: basic, good, little difficult -to-understand ,tool PMC (Intel) Summary: It provides an overview of Intel Performance Counter Monitor. Day by day the complexity of the computers is increasing in order to boost up the performance and capacity of modern processors. In order to measure the performances of modern processors, we need software which dynamically adjusts the resource utilization so that it has power and performance advantages. The Intel performance counter Monitor (PCM) provides the routines and utilities to estimate the internal resources by using the performance counters. This will eventually increase the performance of the processor. CPU utilization, which is shown by the command of top (in Linux) and the window task manager repot (in Windows), is basically a time slot which the CPU scheduler assign to execution of running programs . CPU utilization measured in this way is a good prediction of remaining CPU capacity for the compute bound processes running on 80ies processors, but this value is not reliable for the modern processors which have features of multi-core, multi-processors, multithreading, hyper-threading etc .CPU utilization also fails to indicate the true CPU utilization in case when most of the workloads are memory throughput intensive because it saturates the capacity of memory controller with few threads even more cores are available. Intel Processor provides a tool performance counter monitor (PCM) which gives more precise results of performance counters .This tool utilize the dynamic data obtained from the performance monitoring units (PMU). Intel implements a set of routines to which is callable from user C++ application and it support core and uncore PMUs. Core metrics contain instruction retired, elapsed core clock ticks, core frequency, cache misses and hits. uncore matrices contain read and write bytes from memory controller and data transferred by the Intel Quickpath interconnect links. PMC also includes an easy to use command line and graphical utilities. In Linux, MSR kernel

module is provided with the Linux kernel .Linux also provide a graphical tool named as Ksysguard (KDE system guard). Using this we can see the various metrics in the graphical form.These utilities can be used to provide the fundamental performance bottlenecks of the processes. Intel PCM package contains a Window services that will create performance counters that can be shown by the Perfmon program (provided with Microsoft Windows). Different Perfmon counter is created for all Intel processors versions. PCM also provides a easy to use library .To use this we have to first initialize the performance counters, after that we can capture the state of any counter before and after the code section. So in this way we can determine the performance bottlenecks of the program. By using PCM CPU scheduler can actually know about the nature of the background processes and then it will be able to schedule the task in an efficient way. Important points if any*: Software that understands and dynamically adjusts resource utilization of modern processors has performance and power advantages. CPU scheduler should actually know about the nature of the background processes so that it will be able to schedule the task in an efficient way. Pros: The Intel performance counter Monitor (PCM) provides the routines and utilities to estimate the internal resources by using the performance counters. eventually increase the performance of the processor. Intel implements a set of routines to which is callable from user C++ application and it support core and uncore PMUs PMC also includes an easy to use command line and graphical utilities. These utilities can be used to provide the fundamental performance bottlenecks of the processes PCM also provides an easy to use library. This will

Cons: CPU utilization measured by the operating system is not reliable for the modern processors which have features of multi-core, multi-processors, multithreading, hyper-threading etc CPU utilization also fails to indicate the true CPU utilization in case when most of the workloads are memory throughput intensive because it saturates the capacity of memory controller with few threads even more cores are available

Anda mungkin juga menyukai