December 2002
December 2002
Task
Web-based exception reporting against computer performance database for a large, multi-platform environment Statistical Process Control (SPC) Multivariate Adaptive Statistical Filtering (MASF)
Methods
December 2002
December 2002
December 2002
SAS/ITSV SAS Unix scripting (awk/sed/perl) Intranet web server HTML, Email
Programming
Reporting
December 2002
Multiplatform environment
Data Collection
Performance Data Base (SAS/ITSV) for Unix,NT,Tandem, Unisys, and MVS servers
CPU Util.
CPU Queue
Disk Util.
Global Exception Detectors (SAS program) Appl. Exception Detectors (SAS program) CPU Util. # of active processes Disk IOs Yes Exception?
e-mail notification
Web publishing
Ad-hoc analyses
December 2002
EDS Structure
Exception detectors for the most important metrics such as CPU, memory and disk utilization, memory page rate, and CPU run queue; Exception Detection System database with history of exceptions; statistical process control daily profile chart generator; exception server name list generator; Leader/Outsider servers detector and detector of runaway processes; and Leaders/Outsiders bar charts generator.
December 2002
December 2002
December 2002
exceptions yesterday (In front of each server name, there is a sublist of application names that had exceptions as well for immediate identification of the critical workload)
!
December 2002
December 2002
server19 server2 server3 server11 server12 server20 . server13 server2 server15 server16 server17 server18 server11
25 12 24 15 15 8
4 0 5 3 9
0 10/17 unix SUN 20 10/17 unix HP 0 10/17 unix HP 0 10/17 unix HP 0 10/17 unix IBM 12/1 unix IBM
CPUqueue CPUqueue CPUqueue DiskIO DiskIO CPUutil CPUutil CPUutil CPUutil MEMutil MEMutil MEMutil MEMutil
4.5 length 5.7 length 12.1 length 2945.0 # of I/O 38455.0 # of I/O 0.0 CPU sec
25 24 25 23 20 21 3
2 4 2 5 0 0
0 11/21 unix IBM 0 11/21 unix HP 9 11/20 MVS 0 11/24 unix SUN 8 11/24 unisys 6 11/24 tandem 12/1 unix HP
274.0 CPU sec 5973.0 CPU sec 3634.8 CPU sec 239.0 Kb 998.0 Kb 490.0 Kb 0.0 Kb
December 2002
December 2002
December 2002
December 2002
December 2002
December 2002
December 2002
! !
The usage of the server's resources is not balanced. CPU subsystem has excess capacity Disk subsystem mostly experienced the impact. It is a possible performance and/or capacity bottleneck. Memory page rate had a few exceptions, which probably correlate to Disk I/O activity, and is not a concern.
December 2002
Summary
The Exception Detection System was developed as a combination of the classical SPC approach and some new ideas such as an EDS database to keep a history of exceptions, and using some new integrative metrics like ExtraVolume to better analyze unusual consumption of server resources. Application level is added to the system.
December 2002
Summary
The system adequately supports the rapid growth of the company, and it doesnt require buying new analysis software (when using existing SAS tools). The efficiency of this system has helped reduce the reaction time to exceptions and the amount of time necessary to prepare exception reports.
December 2002
References
[1] Krajewski / Ritzman: Operation Management, 1990, Addison-Wesley Publishing Company, Inc. [2] Jeffrey Buzen and Annie Shum: "MASF - Multivariate Adaptive Statistical Filtering," Proceedings of the Computer Measurement Group, 1995, pp. 1-10. [3] Kevin McLaughlin and Igor Trubin: "Exception Detection System, Based on the Statistical Process Control Concept", Proceedings of the Computer Measurement Group, 2001
December 2002
Thanks!
Igor Trubin Tech Services Capacity Planning Capital One Services, Inc. igor.trubin@capitalone.com
December 2002