© Copyright IBM Corporation 2016. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
ABSTRACT
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Affinity and Virtualization are two of the late components of Workload Characterization to be
understood, monitored and tuned when Optimizing Power/AIX Performance
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
• Think-Think
• Less about any storage I/O and more about intensive processing of the data
• More focused on SRAD, L1/L2/L3/L4 cache, SMT-1/2/4/8 and eCPU/vCPU tuning
tactics
uptime ; vmstat –s
07:45AM up 5 days, 14:35, 7 users, load average: 68.90, 55.77, 53.68
13783904583 total address trans. faults the count of virtual-to-real memory address translations
18464514733 page ins Generally JFS/JFS2 file system VMM-initiated READs
2091137989 page outs Generally JFS/JFS2 file system VMM-initiated WRITEs
0 paging space page ins
0 paging space page outs Rule#1: Acceptable Tolerance is 5-digits/90days Uptime
0 total reclaims
6953205800 zero filled pages faults Memory used to create computational memory objects
89715 executable filled pages faults Code used to create computational memory objects
38420778917 pages examined by clock Rule#3: Pages scanned by lrud when free memory is low
5 revolutions of the clock hand Rule#4: Completed lrud scans of the entire File Cache list
20036911968 pages freed by the clock Rule#3: Pages freed by the above lrud scanning
46545239 backtracks
137657525 free frame waits Rule#2: Acceptable Tolerance is 5-digits/90days Uptime
0 extend XPT waits
1779946655 pending I/O waits IOs not completed yet when the requesting process returns to CPU
20598490595 start I/Os Generally the sum of the above page ins and page outs
2333693437 iodones Generally the start I/Os coalesced into fewer merged-data IO operations
8560316549 cpu context switches Count of threads (of processes) switching on/off SMT logical CPUs
1900589120 device interrupts Count of time-sensitive hardware device operations needing a logical CPU
70007091 software interrupts Count of time-sensitive software interrupts needing a logical CPU
1230095024 decrementer interrupts Tick-tock timing interrupts of only active logical CPUs (~10ms each)
7598754 mpc-sent interrupts
7598754 mpc-received interrupts
676750577 phantom interrupts Device interrupts from other LPARs that pop-up here, thus “phantom”
0 traps
32508428126 syscalls The workload of all processing boils-down to executing system calls for services
09:54AM up 284 days, 5:41, 3 users, load average: 21.89, 22.66, 22.57
309514798826 total address trans. faults
41703570253 page ins
21072500495 page outs
0 paging space page ins
0 paging space page outs Rule#1: Acceptable Tolerance is 5-digits/90days Uptime
0 total reclaims
178428078115 zero filled pages faults
65185546 executable filled pages faults
17792267530 pages examined by clock Rule#3: Pages scanned by lrud when free memory is low
1704 revolutions of the clock hand Rule#4: Greater than one sweep per each Day Uptime
8576311882 pages freed by the clock Rule#3: Pages freed by the above lrud scanning
3860369817 backtracks
2183 free frame waits Rule#2: Acceptable Tolerance is 5-digits/90days Uptime
0 extend XPT waits
3236515060 pending I/O waits
61228166888 start I/Os
5934695650 iodones
832490306114 cpu context switches
338511249949 device interrupts
19631493994 software interrupts
69137392875 decrementer interrupts
7968996233 mpc-sent interrupts
7964556208 mpc-receive interrupts
27779042904 phantom interrupts
0 traps
2066646390008 syscalls
When a virtual address translation (to a real memory address) is not found in the TLB/HPT, it is an address fault. The Power8
core (or PHYP on Power7/+) then calculates only the first in a range of needed translations; not every address is translated.
These generally occur when creating processes and first access to IO. That is, address translation faults occur (and
saved in the TLB/HPT) when virtual-to-physical memory address translations are required when:
•creating/initiating/forking/extending processes (that is, memory is needed to store a process’ contents), i.e. zero
filled pages faults and executable filled pages faults
•instructions or data are initially read or written to/from persistent storage, i.e. page ins and page outs
•memory is needed by AIX to manage other operations, i.e. network IO mbuf allocations, creating SHMSEGs,
dynamic allocation of LVM/JFS2 fsbuf’s, etc.
This total over a standard 90 days is useful for evaluating the scale of memory usage in any historical/accumulated workload.
9digits/90days is a near-nil workload; 10digits is typically light; 11digits is a manageable hard-driving Enterprise-Class work-
load; 12digits is a rare freak/monster workload; 13digits has only been witnessed by me with huge workloads on Power8 E880s.
Using address translation faults in the denominator, useful ratios are offered with page ins and page outs.
These ratios indicate the relative VMM-initiated IO workload compared with the overall use of memory. As well, these IO ratios
can be compared with the ratio of zero filled pages faults::total address trans. faults for a comparative
sense of IO versus the memory used for creating Computational Memory.
178428078115 zero filled pages faults (pages used to create Computational Memory)
are Computational Memory pages created/constructed/generated using the instructions in
65185546 executable filled pages faults (binary code of the application/rdbms/AIX itself)
A ratio of 17842::6 is a useful relative indicator of processes that are mostly similar versus vastly very different to derive a
sense of workload character consistency (a higher ratio) versus workload character variability (a lower ratio).
For instance, Tivoli Storage Manager (aka TSM) typically creates many processes using the same smaller set of executable code,
while Informatica (an OLAP application) typically creates different processes using a vastly greater set of executable code. As
such, the TSM workload is mainly of the same character, while the Informatica workload can be variable and perhaps wildly
inconsistent throughout any given time-frame. This is best compared between LPARs on approximately the same time-frame.
09:54AM up 284 days, 5:41, 3 users, load average: 21.89, 22.66, 22.57
309514798826 total address trans. faults
41703570253 page ins
21072500495 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
178428078115 zero filled pages faults
65185546 executable filled pages faults
Initially when executable filled pages faults pages are read in from persistent storage, they are just mere File
Cache pages. Upon their first-use to create Computational Memory though, they are changed to Computational Memory. This
properly permits their subsequent re-use without re-reading from persistent storage. But of course, they may also reside without
re-use as Computational Memory until the next reboot. 6digits/90days is small&tight, i.e. TSM; 7-8digits is a typical Enterprise
class code set; 9-10digits is a huge and vastly-variable code set, i.e. Informatica. I have not yet witnessed 11+digits/90days.
AIX:lrud has one clock hand for computational memory, and another clock hand for non-computational memory (aka
JFS/JFS2 File Cache). AIX:lrud uses one or both clock hand’s to examine and free memory.
Typically AIX:lrud is only scanning the JFS/JFS2 File Cache to free memory; it only scans&frees computational
memory when forced to execute paging space page out’s.
Aim to keep revolutions of the clock hand at less than or equal to one revolution per Day Uptime.
© Copyright IBM Corporation 2016. Technical University/Symposia materials 31
IBM Systems Technical Events | ibm.com/training/events may not be reproduced in whole or in part without the prior written permission of
IBM.
beetle02: vmstat -s
07:45AM up 5 days, 14:35, 7 users, load average: 68.90, 55.77, 53.68
13783904583 total address trans. faults
18464514733 page ins
2091137989 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
6953205800 zero filled pages faults
89715 executable filled pages faults
38420778917 pages examined by clock Rule #4 denominator: Acceptable is 40% and higher
5 revolutions of the clock hand
20036911968 pages freed by the clock Rule #4 numerator: Acceptable is 40% and higher
46545239 backtracks
137657525 free frame waits Rule#2: Acceptable Tolerance is 5-digits/90days Uptime
Typically, [pages freed by the clock / pages examined by the clock] is comfortably greater than 0.40,
i.e. 20036911968 / 38420778917 = 0.5215 = 52.15%
If not greater than 40%, then the lower this value reaches below 40%, the more likely gbRAM needs to be added, or some other alternative.
This is a contributing or confirming factor suggesting more gbRAM may be needed; it is not a definitive indicator.
pages examined by the clock is the historical accumulation of AIX:vmstat:page:sr activity (aka lrud-scanrate).
pages freed by the clock is the historical accumulation of AIX:vmstat:page:fr activity (aka lrud-freerate).
The count of free frame waits increases when free memory repeatedly reaches down to zero and slightly back up. High counts
indicate a likely start/stop “stuttering” of user workload progress, as well as, frustrating JFS2default storage IO throughput; this is typically
associated with harsh bursts and burns of AIX:lrud scanning&freeing, as well as, higher CPU-kernel time (AIX:vmstat:cpu:sy >20%).
Second Rule of AIX Monitoring: For any 90 Days Uptime, 5-digits of free frame waits is Acceptable Tolerance. Your concern
should grow exponentially for each digit beyond 5-digits for every 90 Days Uptime.
Recommendation: If default minfree(960) & maxfree(1088), and 6+ digits of free frame waits per any 90 days uptime,
1) use vmo to tune minfree=(5*2048), maxfree=(6*2048); 2) use ioo to tune j2_MaxPageReadAhead=2048.
AIX needs free memory to drive virtually everything it does. free memory is used to create processes, make buffers, service
network IO, move JFS/JFS2 filesystem IO to/from persistent storage, etc. When free memory is exhausted, AIX can do nothing else
but invoke a high-priority mechanism called AIX:lrud to find free memory. Unfortunately many workloads suffer overwhelming
AIX:lrud activity as a normal and acceptable practice. But, we can make memory quick and convenient to release when it is other-
wise grindingly difficult and slow to release. How can we tell if memory is quick or slow to release?
I offer four methods: Monitor the count of paging space page outs, revolutions of the clock hand and free
frame waits, as well as, noting the ratio of pages freed by the clock versus pages examined by the clock.
These numbers are useful as indicators to distinguish if memory is quick or slow to release.
High counts of pending I/O waits indicate something is confounding the initiation and/or completion of read/write IOs.
Likely too few files are too large (thus causing typical default JFS2 inode-lock contention) or a grindingly slow release of free
memory or something else is surely not proper.
Acceptable tolerance is up to 80% of iodones; warning is 81%-100% of iodones; seek-resolution if beyond 100% of
iodones, i.e. pending I/O waits / iodones => 4336608286/6106596907 = 71.01% = At-Risk Close
start I/Os are generally the sum of page ins and page outs.
The ratio of start I/Os to iodones is a relative indicator of “sequential I/O coalescence”. Sequential read-aheads
and sequential write-behinds of JFS2 default-mount I/O transactions are automatically coalesced to fewer larger I/O
transactions. This is a quick&dirty method of distinguishing a generally random IO versus sequential IO workload,
i.e. start I/Os / iodones => 53315494410 / 6106596907 = 8.73 is a low Sequential IO reduction ratio.
Note the paired ratios of the above for a relative sense-of-proportion of system events -- for comparison between LPARs.
What is useful about the ratio of cpu context switches : decrementer interrupts? 2.53 is Light&Sparse.
549305497 / 217018913 = an average of 2.53 context switches per decrementer interrupt
What is useful about the ratio of device interrupts : decrementer interrupts? 0.80 is Unusually Sparse.
173807206 / 217018913 = an average of 0.80 device interrupts per decrementer interrupt
What is useful about the ratio of syscalls : decrementer interrupts? 154.86 is Moderately Dense.
33608286900 / 217018913 = an average of 154.86 system calls per decrementer interrupt
What is useful about the ratio of device interrupts : syscalls : cpu context switches? Inter-LPAR comparison.
173807206 : 33608286900 : 549305497 ~= 2.53:0.80:154.86 per decrementer interrupt
04:43PM up 372 days, 1:27, 31 users, load average: 34.39, 34.85, 33.77
538697619150 total address trans. Faults <- 12digits/90days; a monster workload
17669186505 page ins
12605059331 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
358598173665 zero filled pages faults <- a perfectly consistent and repetitious workload
481900 executable filled pages faults <- 6digits; a microscopic code set given 372days
606227084 pages examined by clock
1 revolutions of the clock hand <- virtually nil AIX:lrud activity for free memory
401196246 pages freed by the clock
1373821354 backtracks
0 free frame waits
0 extend XPT waits
3446688584 pending I/O waits
29733996936 start I/Os
5399455824 iodones
4455468974016 cpu context switches
1404608759903 device interrupts <- a highly device-interruptive/device-interactive workload
8489526323 software interrupts
335096216091 decrementer interrupts <- use as denominator for device interrupts above
6757373231 mpc-sent interrupts
6757370813 mpc-received interrupts
17221841315 phantom interrupts
0 traps
24310229004208 syscalls
uptime ; vmstat -v
07:45AM up 5 days, 14:35, 7 users, load average: 68.90, 55.77, 53.68
25165824 memory pages
19112512 lruable pages
9201 free pages This is the number of Free Pages on the Free Memory list
4 memory pools The count of AIX logical memory pools
8028287 pinned pages Generally AIX is comprised of only pinned memory pages
80.0 maxpin percentage
3.0 minperm percentage This value is the trigger for pagingspace-pageouts
90.0 maxperm percentage
75.8 numperm percentage This is the percent of JFS/JFS2/NFS/VxFS File Cache
14492435 file pages This is the number of JFS/JFS2/NFS/VxFS File Cache pages
0.0 compressed percentage
0 compressed pages
75.8 numclient percentage This is the percent of JFS2-only File Cache
90.0 maxclient percentage
14492435 client pages This is the number of JFS2 File Cache pages
0 remote pageouts scheduled
857 pending disk I/Os blocked with no pbuf AIX pbuf exhaustion
0 paging space I/Os blocked with no psbuf AIX psbuf exhaustion
1972 filesystem I/Os blocked with no fsbuf AIX fsbuf exhaustion
9900 client filesystem I/Os blocked with no fsbuf AIX fsbuf exhaustion
209695 external pager filesystem I/Os blocked with no fsbuf AIX fsbuf exhaustion
42.4 percentage of memory used for computational pages aka COMP%
09:54AM up 284 days, 5:41, 3 users, load average: 21.89, 22.66, 22.57
60293120 memory pages
58473104 lruable pages
5666522 free pages
10 memory pools
9658355 pinned pages
90.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
18.1 numperm percentage
10595551 file pages
0.0 compressed percentage
0 compressed pages
18.1 numclient percentage
90.0 maxclient percentage
10595551 client pages
0 remote pageouts scheduled
163 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2078 filesystem I/Os blocked with no fsbuf
344 client filesystem I/Os blocked with no fsbuf
41157262 external pager filesystem I/Os blocked with no fsbuf
73.0 percentage of memory used for computational pages
09:54AM up 284 days, 5:41, 3 users, load average: 21.89, 22.66, 22.57
60293120 memory pages
58473104 lruable pages
5666522 free pages
10 memory pools
9658355 pinned pages
90.0 maxpin percentage
10 memory pool * 960 = 9600 pages; 9600 * 4096 = 39321600 bytes = 37.5 MB
• Free Memory – Ideal: midrange 5 digits of freemem (fre) for LPARs <=16gbRAM
• Free Memory – Ideal: low range 6 digits of freemem (fre) for LPARs >=24gbRAM
• Free Memory – Ideal: high range 6 digits of freemem (fre) for LPARs >=48gbRAM
• Free Memory – Ideal: low range 7 digits of freemem (fre) for LPARs >=96gbRAM
• Free Memory – Ideal: never a need for more than 7 digits of freemem (fre) for any large LPAR
© Copyright IBM Corporation 2016. Technical University/Symposia materials 44
IBM Systems Technical Events | ibm.com/training/events may not be reproduced in whole or in part without the prior written permission of
IBM.
beetle02: vmstat -v
Growing computational memory larger causes numperm% and numclient% to grow smaller.
Free memory is released by AIX:lrud scanning&freeing the file cache for older Least-Recently-Used content.
As numperm% and numclient% grows smaller, AIX:lrud grinds harder and releases memory more slowly.
Understanding these numbers together can highly-characterize the nature and intensity of any POWER/AIX workload.
# lvmo -a -v apvg15
vgname = apvg15
pv_pbuf_count = 512
total_vg_pbufs = 15872 # total_vg_pbufs / pv_pbuf_count = 15872/512 = 31 LUNs
max_vg_pbuf_count = 524288
pervg_blocked_io_count = 517938
pv_min_pbuf = 512
global_blocked_io_count = 12018771
# lvmo -a -v pgvg01
vgname = pgvg01
pv_pbuf_count = 512
total_vg_pbufs = 1024 # total_vg_pbufs / pv_pbuf_count = 1024/512 = 2 LUNs
max_vg_pbuf_count = 16384
pervg_blocked_io_count = 8612687
pv_min_pbuf = 512
global_blocked_io_count = 12018771
As such, we should only add pbuf’s by-formula on a schedule of 90-day change&observe cycles.
Use AIX:lvmo to monitor the pervg_blocked_io_count of each active LVM volume group,
i.e. lvmo –a –v rootvg ; echo ; lvmo –a –v datavg
Acceptable tolerance is 5-digits of pervg_blocked_io_count per LVM volume group for any 90 days uptime. Change
the value of AIX:lvmo:pv_pbuf_count to control total_vg_pbufs.
Otherwise, for each LVM volume group, adjust the value of AIX:lvmo:pv_pbuf_count accordingly:
If 5-digits of pervg_blocked_io_count, add ~2048 pbuf’s to total_vg_pbufs per 90-day cycle.
If 6-digits of pervg_blocked_io_count, add ~[4*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
If 7-digits of pervg_blocked_io_count, add ~[8*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
If 8-digits of pervg_blocked_io_count, add ~[12*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
If 9-digits of pervg_blocked_io_count, add ~[16*2048] pbuf’s to total_vg_pbufs per 90-day cycle.
The ratio of paging space I/Os blocked with no psbuf / paging space page outs is a direct measure of
intensity, i.e. 1019076 / 4195229 = 24.2%. In this example, suffering 7-digits of paging space page outs in 18
Days-Uptime is bad enough, but when there are also paging space I/Os blocked with no psbuf, system perfor-
mance and keyboard responsiveness can stop-and-start in seconds-long cycles. One might believe AIX has even
crashed, when it hasn’t. Preclude paging space page outs by any means; add more gbRAM to the LPAR.
ioo -h j2_dynamicBufferPreallocation=value
The number of 16K slabs to preallocate when the filesystem is running low of bufstructs.
A value of 16 represents 256K. The bufstructs for Enhanced JFS (aka JFS2) are now dynamic; the number of
buffers that start on the JFS2 filesystem is controlled by j2_nBufferPerPagerDevice (now restricted), but
buffers are allocated and destroyed dynamically past this initial value. If the number of external pager
filesystem I/Os blocked with no fsbuf increases, the j2_dynamicBufferPreallocation should
be increased for that file system, as the I/O load on a file system may be exceeding the speed of preallocation.
Heavy IO workloads may require this value to be changed and a good starting point would be 5120 or 10240.
File system(s) must be remounted.
04:43PM up 372 days, 1:27, 31 users, load average: 34.39, 34.85, 33.77
97910784 memory pages
94891136 lruable pages
22397316 free pages
35 memory pools
17176909 pinned pages
80.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
8.9 numperm percentage
8491570 file pages
0.0 compressed percentage
0 compressed pages
8.9 numclient percentage
90.0 maxclient percentage
8491570 client pages
0 remote pageouts scheduled
3 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2228 filesystem I/Os blocked with no fsbuf
2288501 client filesystem I/Os blocked with no fsbuf
17167931 external pager filesystem I/Os blocked with no fsbuf
68.5 percentage of memory used for computational pages
IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be
trademarks or service marks of International Business Machines Corporation in the United States, other countries, or
both. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Other company, product, and service names may be trademarks or service marks of others.
References in this publication to IBM products or services do not imply that IBM intends to make them available in all
countries in which IBM operates.
IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should
consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf
of the client are the sole responsibility of the customer.
IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada,
and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability
are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country.
Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to
change, extension or withdrawal without notice.
1 2 3 4
Submit four or more session
evaluations by 5:30pm Wednesday
to be eligible for drawings!
*Winners will be notified Thursday morning. Prizes must be picked up at
registration desk, during operating hours, by the conclusion of the event.
ibm.com/training
provides a comprehensive
portfolio of skills and career
accelerators that are designed
to meet all your training needs.
If you can’t find the training that is right for you with our
Global Training Providers, we can help.