Anda di halaman 1dari 13

IMPLEMENTING A SOFTWARE

PROBLEM MANAGEMENT MODEL,


A CASE STUDY
Jäntti Marko, Miettinen Aki
University of Kuopio, Department of Computer Science, P.O.B 1627, FIN-70211
Kuopio, Finland
mjantti@cs.uku.fi

Abstract. The primary goal of software problem management is to minimize the impact
of problems on the business and to identify the root cause of problems. At present, many
organizations are planning to implement a problem management model that is compliant
with IT Infrastructure Library (ITIL) framework. However, the ITIL framework is a heavy
standard with a large number of difficult concepts. IT organizations need practical
guidelines to be able to implement ITIL-based processes. The purpose of this study is to
provide a checklist of issues that are essential for implementing the problem
management process. The research question in this paper is: what are the requirements
for implementing a software problem management model. A case study research
method is used in this study to evaluate requirements.

Introduction

Software problem management is an important activity within software support


and maintenance processes (ISO/IEC 1995). Both IT companies and IT customers
need systematic approaches for managing software problems such as unavailability
of IT services, software failures, poor performance, and poor usability. According
to a recent IT service management study, many organizations are planning to
implement the problem management process in the near future and consider it as
one of the most important process development targets (Materna 2005). Hence,
we focus our research on software problem management and examine how IT
service provider organizations can use problem management methods to improve
the quality of IT services.
Most studies in this research area can be classified either into 1) software defect
management studies or 2) software problem management studies. Traditional
defect management studies (e.g. Hirmanpour and Schofield 2003, Leszak et al.
2000, Frederick and Basili 1998, Mays 1990, Quality Assurance Institute 1995)
are solely focused on software defect management. The main goal of defect
management is to detect and remove defects early in the software life cycle (Florac
1992). Defect management is also one of the key process areas of the Capability
Maturity Model (CMM) (Jalote 2000).
Problem management studies belong to the second category and are typically
based on some IT service management framework such as IT Infrastructure
Library (ITIL) that was developed in the end of 1980's by British Government
(Sallé 2004, OGC(1) 2002). ITIL provides guidance for IT service management,
such as a systematic framework for managing problems in IT services. IT services
can be customer tailored software projects by system integrators, ASP services,
consultation, training and hosting services etc. Many organizations have started to
"ITILize" their old service management processes because ITIL has become the
world-wide de facto standard in service management (OGC(1) 2002, Hochstein
2005). However, introduction of ITIL framework might be difficult because it
includes a lot of complex service management concepts.
This study continues the work of our previous study where we identified
following difficulties regarding defect management models: the defect / problem
management process is seldom company-wide, project teams use different problem
management methods which cause difficulties for the problem data analysis, there
are limited resources for fixing defects and problems, there is need for good
metrics for IT service problem management, customers complain that they do not
receive problem resolution reports from IT providers and that the IT providers do
not admit that there are bugs in their software, and the customers want a single
point of contact service instead of "this is not our bug" service.
A good problem management model should attempt to find a solution for above
mentioned problems. The main problem is that organizations do not have enough
information or experience to implement a problem management process that meets
requirements of the ITIL framework. One additional challenge is how to automate
the problem management process with a tool.
In this paper, we provide a list of requirements that help organizations to
implement the software problem management process. The general research
question in this paper is: what are the requirements for an ITIL-based problem
management model. In this paper, we examine the problem management process
of an IT service provider company.
The rest of the paper is structured as follows: The second section describes the
key issues required in problem management. Third section presents the research
methods used in this study. The fourth section presents findings from a case study
and finally the discussion and conclusions are presented.

Requirements for a Software Problem Management


Model
Software engineering standards use various definitions for software problems. The
problems are called defects, faults, anomalies, errors and bugs etc. In the IEEE
Standard Dictionary, defects are defined as product anomalies, for example
"omissions and imperfections found during early life cycle phases and faults
contained in software sufficiently mature for test or operation" (IEEE 1989).
According to the IEEE standard anomaly is "any condition that deviates from
expectations based on requirements specifications, design documents, user
documents, standards, etc., or from someone's perceptions or experiences.
Anomalies may be found during, but not limited to, the review, test, analysis,
compilation, or use of software products or applicable documentation" (IEEE
1994). Quality Assurance Institute (1995) has defined the defect as "an instance of
one or more baselined product components not satisfying their given set of
requirements". Additionally, a fault can be defined as "a manifested inability of a
system or component to perform a required function within specified limits"
(Binder 1999). An ITIL-based problem management process focuses on
minimizing the impact of incidents and problems on the business.
In this paper we focus on identifying requirements for the implementation of a
software problem management model. The following requirements were derived
based on the research literature.
Requirement 1. Establish a Service Desk. IT organizations should implement a
service desk that is a single point of contact for users of IT services (Kajko-
Mattsson 2003). The service desk is responsible for collecting all the incidents.
The goal is that the service desk could resolve as many incidents as possible and to
achieve a good first-time fix rate at the service desk.
Requirement 2. Define the lifecycle for incidents (OGC 2002): The typical
lifecycle is 1) Incident, 2) Problem, 3) Known error, and 4) Request for Change
(RFC). An incident can be defined as "any event which is not part of the standard
operation of a service and which causes, or may cause, an interruption to, or a
reduction in, the quality of that service". An incident does not necessarily lead to a
problem or a defect. They can also be service requests (for example, a user needs
instructions or advice). A problem is "an unknown underlying cause of one or
more incidents". A Known error is "an incident or problem for which the root
cause is known and for which a temporary Work-around or a permanent
alternative has been identified". There should be a traceable chain between
incidents, problems, known errors and change requests.
Requirement 3. Identify two different dimensions of problem management: 1)
proactive and 2) reactive problem management. The purpose of the proactive
problem management is to identify and resolve problems and known errors before
any incident related to them occurs (ITILPeople 2005). Reactive problem control
focuses on identifying the underlying cause of reported incidents.
Requirement 4. Establish a problem management repository and a knowledge
base. The ITIL framework recommends that problem records are stored in the
Configuration Management Database (CMDB) but usually a CMDB is
implemented as a separate database. A knowledge base is a database that contains
information on problems, known errors, and their resolutions (Davis 2002).
Requirement 5. Establish the problem control activity as follows (OGC 2002).
Problem control begins when the analysis of incident data reveals repetitive
incidents, or the analysed incident does not match any of the formerly appeared
problems or known errors. Additionally, when incidents are defined as very serious
and significant, they are sent directly to problem control. The process may also
start by discovering a problem in the infrastructure.
A) In the first phase of problem control the problem management team
identifies and records the problem. The problem record needs to be linked to
appropriate incident records. Hence, the problem solution or the work-around can
be linked to the similar incidents or problems in the future. The identification
process also includes linking configuration items (CIs) with the problem.
B) Classify and categorize the problem as follows. In the second phase, the
problem is classified by category, impact, urgency and priority. The possible
categories may be, for example, network, hardware, operating system, or software.
The impact of a problem is its analysed effect on the business. Priority should be
based on the urgency and the impact of the problem.
C) The third phase of the problem control is the investigation and diagnosis
which aims to find the underlying cause of the problem (Zhen 2005). The linked
incidents and their work-arounds are analysed here. The investigation may show
that the problem is not associated to any configuration item (CI) currently in the
CMDB but is procedural, for example, the problem might be insufficient testing. If
the cause of the problem relates to some registered CI, the status of the problem
will be changed to known error and the error control will handle it.
Requirement 6. Establish the error control activity as follows: The purpose of
the error control process is to correct known errors making changes to the
infrastructure. The process works in cooperation with the change management.
A) The first phase of error control begins by detecting a faulty CI or a CI that
might cause an incident. This can be done using the known error data. This data is
produced by either the development environment or the live environment. The
known errors from the development environment are the errors already known in
the development phase of the service or the product. The live environment errors
are the errors discovered when the service or the product is already in operation.
B) In the second phase of error control the possible means of resolving the
error are assessed. If necessary a request for change is generated. Priority is based
on the urgency and the impact of the error.
C) In the third phase, the request for change is linked to the known error record
to maintain the traceability chain stated before. The impact analysis, detailed error
assessment, testing and the final resolution of the error escalated to change
management.
The resolution process for known errors has to be recorded to the system. All
the data including CIs, symptoms, and resolution are stored in the known error
database. Thus, it can help in the future investigations of incidents and problems.
Finally, after the successful resolution process all the relevant known error,
incident and problem records are closed.
Requirement 7. Define appropriate metrics for monitoring the problem
management process. Metrics could be, for example, time-based performance
metrics, quality of resolution of incidents/problems (Litten 2004) and the number
of incidents and problems classified by impact, status, service, or user group (OGC
2002).
Requirement 8. Monitor the problem management process. The problem
management should continuously monitor the problem resolution process and the
impact of the problems and errors on users or customers. Problem management
should be aware of this progress although the change management is responsible
for some parts of the resolution. The monitoring should be done against the
service level agreement (SLA), the written agreement between the service provider
and the customer (Kajko-Mattsson et al. 2004). Usually the SLA defines the
maximum number of errors per period or service availability requirements. SLA
might also define penalties for breached service levels (OGC(2) 2002).
Requirement 9. Generate a request for change (RFC) to the change
management team to implement the permanent resolution for the problem. Ensure
that teams use standard methods to create RFCs (Dietel 2004).
Requirement 10. Continuously improve the problem management process.
Improvement actions could include training of the service support staff, the
development of problem management tools, the frequent service reviews and
inspections (Gilb and Graham 1993, Ebenau and Strauss 1994) with customers and
third party service providers, the continuous development and evaluation of the
working methods.
Methods
This case study is a part of the work of an ongoing research project SOSE
(Service Oriented Software Engineering) at the University of Kuopio, Finland.
One objective of the SOSE project is to research methods for improving the
quality of software development. This paper focuses on software problem
management. The research question in this paper is: what are the requirements for
implementing a software problem management model. According to Yin (1989)
case studies can be categorized into exploratory, explanatory and descriptive case
studies. An exploratory approach was used in this study. At first, we present a list
of key issues (requirements) required in problem management. Secondly, we
analyze how these key issues of problem management fit to the case organization.
The case study included following questions:
• Who are the stakeholders involved in the problem management process?
• What kind of problem management methods are used by the case
organization?
• What kind of metrics are used within the process?
• What kind of challenges are related to problem management?

Informal interviews were the main source of evidence in this study. The
qualitative data was collected during the problem management pilot project
(February-March 2006). Our case organization is a large IT service company with
over 15 000 employees. It supplies information systems to various industries, such
as banking and insurance, energy, telecom and media, and healthcare. The business
unit, where this study was performed, develops and maintains customer
information systems and energy data management systems. The case organization
was selected for this study because software problem management plays very
important role for it and it is interested in adopting ITIL-based problem
management methods.
The data was collected using participative observation in support&maintenance
team meetings, informal interviews with service desk workers, a service support
manager, a problem manager, and a system analyst. Additionally, some challenges
regarding problem management were gathered during ITIL training session
provided by the first author. Because we had an access to the support tool and
problem database we could also identify several tool-related difficulties. Persons
who participated in support team meetings and training sessions hold different
roles in the organization (product delivery, product support, configuration
management). A researcher's role in support team meetings was to participate in
discussions and to record the results of discussions.
A within-case analysis method was used in this study (Eisenhardt 1989). We
consider the requirement checklist as an analysis framework. Our framework is a
literature-based ideal process. Data analysis was focused on analyzing how close
the case organization is from the ideal process.

Problem Management Process: Main Findings from a


Case Organization
In this section, we explore how the case organization's existing problem
management process meets the requirements of the ITIL-based problem
management model presented in Section 2. The major stakeholders (see Figure 1)
involved in the problem management in the case organization are service desk
teams, product support team and product development teams. There are different
teams for different products. The case organization uses third party service
providers, for example, server and database providers.

Figure 1. Stakeholders involved in the software problem management.

Table 1 describes our analysis regarding the problem management process of the case
organization.

Requirements Implemented Case Organization


1. Establish a Service Yes Both an internal and an external service desk exist.
Desk
2. Define the lifecycle Partially The terminology is different than in ITIL (for example,
of Incidents term "known error" is not used). Types of incidents are
pure incidents that might lead to problems,
development ideas, change requests and change orders.
The incident category "service request" is not used.
3. Identify two Partially PM methods are mainly reactive: reactive problem
dimensions of Problem management means in this case resolving reported
Management: incidents. However, some teams use also proactive
a) Reactive and b) problem management methods such as FAQ function.
Proactive A knowledge base is not being used.
4. Establish the Yes The organisation uses a support tool to manage
problem management incidents, problem, RFCs and known errors.
repository
5. Establish the Yes The service desk collects incidents reported by
problem control customers. The incident is assigned to the product
activity support team if the service desk cannot resolve it.
a) Identify and record Therefore, problem control activity is performed by
the problem product support teams.
b) Classify and
categorize the problem
c) Investigate the
problem
6. Establish the error Partially Product development teams are responsible for error
control activity control activities and are responsible for correcting
a) Identify and record defects and recording information on the type of the
the error fault, the cause of the fault, the time to resolve the
b) Error assessment fault, and the resolution of the fault
c) Record error
resolution
7. Define appropriate Yes Cases per period metrics are used such as the ratio of
metrics for monitoring open cases and closed cases per month. Time based
the problem process performance metrics are not used.
management process
8. Monitor the Partially Currently, there are no service level requirements
problem management defined for incident and problem resolution.
process
9. Generate a request No There is no Change Advisory Board in the current
for change for change process. Product development teams usually make
management decisions whether changes are needed.
10. Continuously Yes The case organization has recognized the need for
improve the problem continuous process improvement
management process.
Table 1. The problem management process of an energy company

The Challenges Related to ITIL-based Processes


This section presents our findings regarding the challenges of ITIL-based
processes. Table 2 describes the ITIL process area, the goal for the process area,
and challenges that were identified.
ITIL Process Area Goal Challenges in the process
Incident Management To restore normal service -The lack of time-based performance
operation (energy delivery) metrics such as incident turnaround
as quickly as possible times.
-There are a lot of duplicate incidents
recorded in the database. The service
desk staff is not trained to merge or
relate incidents.
-Customers cannot use products as
search criteria for incidents.
Problem Management To detect underlying causes -The known error concept is not visible
of incidents. in the current problem management
process.
-There is no knowledge base available
for known errors.
-There is no problem category for errors
in third party products such as bugs in
database applications.
-Problem records include many data
fields that are seldom used.
-The connection between testing support
is unclear (reported problems and errors
should have links to test cases)
-Incidents and problems cannot be
targeted to hardware configuration items
such as servers.
-Problem management is not connected
to service level management (there are
no service level requirements defined for
problem resolution).
Release/Configuratio Maintain the Configuration -It is difficult to close several incidents
n Management Management Database and and problems with one product release
the information about because many customers use customized
configuration items. product versions.
-It is difficult to define an appropriate
frequency for delivering bug fixes to
different customers.
Service Level Maintain and negotiate -The organization does not have a
Management Service Level Agreements Service Level Manager.
and Operational Level -A lack of Service Level Agreement
Agreements (such as templates and SLM metrics: What is the
agreements between business number of breached/challenged
units within the case SLAs/period? How many % of services
organization) are covered by SLAs?
Availability To ensure IT service How to ensure the availability of online
Management availability support site?
Table 2. The challenges related to problem management and its neighbour
processes
Analysis of the Findings

The problem manager and the service support manager of the case organization
considered as major improvement actions regarding software problem management
1) reducing the increasing number of open incidents and problems by focusing
proactive problem management methods and 2) creating service level agreements
with IT customers to improve the IT service quality.
Firstly, the case organization uses mainly reactive problem management
methods to solve reported incidents from customers. In the long run, the
organization has to focus on proactive problem management to be able to manage
a large number of incidents and problems. A knowledge base might help as a
proactive problem management method in this case.
Secondly, the case organization needs to implement a service level management
process and establish a role of a service level manager. Service level agreements
(SLAs) are very useful for monitoring the quality of IT services. SLAs are suitable
for both service providers and customers to monitor availability, quality, usability,
and performance of the service and to ensure that critical IT services are available.
The case organization has allocated a lot of resources to process improvement
such as adopting the problem management concepts of ITIL. They already have a
well-organized service desk. The service interface between the case organization
and its customers is based on the service desk and the online support site. The
problem control activity is performed by product support or back office teams.
Product development teams are responsible for the error control, change
management, and product development. In the future, the case organization is
planning to implement a Change Advisory Board that would be responsible for
approving all the change requests. As strength, the case organization has clearly
defined processes in business framework WayToExcellence such as incident and
problem management processes that are based on ITIL principles.
The transition from the current process to the ITIL-based problem management
process has caused several challenges. Combining ITIL-based problem
management concepts to the organization's existing problem management process
has been a challenge. A knowledge base function is under construction, and the
support tool needs configuration work before it can be used to measure time-based
performance data such as problem resolution times. New datafields need to be
added to problem records such as a problem category that helps service desk and
customers to find cases more rapidly. ITIL-based processes seem to be designed
for large organizations. In practice, one person must hold several ITIL
responsibility areas. In our case, the same person held roles of a problem manager
and a change manager. The release and configuration management roles were also
targeted to the same person. According to our observations, the customers of the
case organization are also very interested in ITIL-based process improvement.
Discussion and Conclusions
This study aimed to explore the requirements for the ITIL-based problem
management model. First, we presented a problem management checklist with ten
process-related requirements. Second, we described the current problem
management process of the case organization (an IT service provider). Finally, we
analyzed how the case organization's existing problem management process meets
the requirements of the ITIL-based problem management model.
The main contribution of this study lies in helping IT organizations to identify
the key issues required for the ITIL-based problem management model. These
requirements are needed in implementing the transition from the current problem
management model to the ITIL-based model. However, the requirement checklist
we presented is not exhaustive. More research efforts are needed to explore
proactive problem management methods.
As with all case studies, there are threats to the validity of this study. First,
construct validity is problematic in case study research. Data for the case study
should be collected from several sources. In order to get a richer view of the
problem management, we need to interview more members of the service desk and
product support teams. Second, there is the threat to external validity, the
generalizability of the results. The results presented in this paper are valid only in
our case organization. In future studies we intend to improve our research
framework by exploring the introduction of a knowledge base as a part of the
problem management framework.
The main contribution of this study is that it increases understanding of
importance of building a software problem management model and gives a general
overview about the current methods used within problem management. A
systematic problem management model adds value for both software companies
and their customers. Reactive and proactive problem management methods are
used to minimize the impact of a problem on the business and prevent problems
before they occur. Hence, problem management can be used as a way to improve
the customer satisfaction.

Acknowledgments
This paper is based on research in the SOSE project (2004-2006), funded by TEKES (the
National Technology Agency), European Regional Development Fund (ERDF), ICT and
customer companies in electricity domain. We wish to thank professor Anne Eerola for her
comments, research assistant Niko Pylkkänen for his help in data collection and people in
TietoEnator for participating in interviews.

References

Benbasat, I., Goldstein, D. K. and M. Mead (1987). The Case Research Strategy in Studies of
Information Systems. MIS Quarterly (11:3), pp. 369-386.
Binder, D. (2000). Testing Object Oriented Systems. Addison Wesley.
Boardman, B. (2005). IT Best Practices. Network Computing, vol. 16, pp. 79.
Card, D. N. (1998). Learning from Our Mistakes with Defect Causal Analysis. IEEE Software,
January-February
Davis, K. (2002). Charting a knowledge base solution: empowering student-employees and
delivering expert answers. In Proceedings of the 30th Annual ACM SIGUCCS Conference
on User Services (Providence, Rhode Island, USA, November 20 - 23, 2002). SIGUCCS
'02. ACM Press, New York, NY, 236-239.
Dietel, K. (2004). Mastering IT change management step two: moving from ignorant anarchy to
informed anarchy. In Proceedings of the 32nd Annual ACM SIGUCCS Conference on User
Services (Baltimore, MD, USA, October 10 - 13, 2004). SIGUCCS '04. ACM Press, New
York, NY, 188-190.
Gilb, T. and D. Graham (1993). Software Inspection. Addison-Wesley.
Ebenau, R.G. and S.H. Strauss (1994). Software Inspection Process. New York, NY: McGraw-
Hill.
Eisenhardt, K. (1989). Building Theories from Case Study Research. Academy of Management
Review, Vol. 14:4, pp. 522-5506.
Florac, W. A. Software Quality Measurement (1992). A Framework for Counting Problems and
Defects. Technical Report, CMU/SEI-92-TR-022, The Software Engineering Institute,
Carnegie Mellon University.
Frederick, M. and V. Basili (1998). Using Defect Tracking and Analysis to Improve Software
Quality, US Air Force Research Laboratory, DACS State-of-the-Art Report SP0700-98-D-
4000.
Hirmanpour, I. and J. Schofield (2003). Defect Management through the Personal Software
process. Article in Crosstalk, The Journal of Defense Software Engineering.
Hochstein, A. Tamm, G. and W. Brenner (2005). Service-Oriented IT Management: Benefit,
Cost and Success Factors. In Proceedings of the Thirteenth European Conference on
Information Systems (Bartmann D, Rajola F, Kallinikos J, Avison D, Winter R, Ein-Dor P,
Becker J, Bodendorf F, Weinhardt C eds.), Regensburg, Germany.
IEEE (1989). IEEE Standard Dictionary of Measures to Produce Reliable Software, ANSI/IEEE
Standard 982.1-1988, p. 13
IEEE (1994). IEEE Standard Classification for Software Anomalies, IEEE Standard 1044-1993,
p. 3.
ISO/IEC (1995). ISO/IEC 12207, Information Technology: Software Life-Cycle Processes.
ISO/IEC Copyright Office.
ITILPeople.com (2005). What is ITIL? Retrieved November 11, 2005, from
http://www.itilpeople.com/What%20is%20ITIL.htm.
Jacobson, I., Booch, G. and J. Rumbaugh (1999). The Unified Software Development Process.
Addison-Wesley.
Jalote, P. (2000). CMM in Practise, Processes for Executing Software Projects at Infosys.
Addison Wesley.
Jäntti, M. and Toroi, T. (2004). UML-based Testing: A Case Study. Proceedings of 2nd Nordic
Workshop on the Unified Modeling Language (Turku, Finland, August 19-20, 2004).
Kajko-Mattsson, M. (1998). A conceptual model of software maintenance. In Proceedings of the
20th international Conference on Software Engineering (Kyoto, Japan, April 19 - 25,
1998). International Conference on Software Engineering. IEEE Computer Society,
Washington, DC, 422-425.
Kajko-Mattsson, M. (2003). Infrastructures of Virtual IT Enterprises. In Proceedings of the 19th
IEEE International Conference on Software Maintenance (Amsterdam, The Netherlands,
September 22-26, 2003). International Conference on Software Maintenance. IEEE
Computer Society, Washington, DC, 199-208.
Kajko-Mattsson, M., Ahnlund, C., and Lundberg, E. (2004). CM3: Service Level Agreement. In
Proceedings of the 20th IEEE international Conference on Software Maintenance
(September 11 - 14, 2004). ICSM. IEEE Computer Society, Washington, DC, 432-436.
Kruchten, P. (2001). The Rational Unified process, an introduction. Addison-Wesley.
Leszak, M., Perry, D. E., Stoll, D. (2000). A case study in root cause defect analysis.
Proceedings of the 22nd international conference on Software engineering, June.
Litten, K. (2004). IT Service Management: Selecting the Right Metrics for Performance
Measurement, INS Whitepaper. Retrieved November 10, 2005, from
http://www.ins.com/knowledge/whitepapers.asp.
Materna Finland Oy (2005). ITSMF Research. Retrieved November 7, 2005, from
http://www.materna.de/FI/Home/.
Mays, R.G. (1990). Experiences with Defect Prevention. IBM Systems Journal, Vol 29 No. 1.
Office of Goverment Commerce (1) (2002). ITIL Service Support. The Stationary Office, UK,
Ref. use in text, OGC(1).
Office of Goverment Commerce (2) (2002). ITIL Service Delivery. The Stationary Office, UK,
Ref. use in text, OGC(2).
Pink Elephant (2004). ITIL Process Maturity, Pink Elephant Whitepaper. Retrieved November 8,
2005,
http://www.pinkelephant.com/en-US/ResourceCenter/PinkPapers/PinkPapersList.htm
Quality Assurance Institute (1995). Establishing A Software Defect Management Process.
Research Report number 8.
Sallé, M. (2004). IT Service Management and IT Governance: Review, Comparative
Analysis and their Impact on Utility Computing. HP Technical Report, June 2.
Yin, R. K. (2002). Case Study Research, Design and Methods, 3rd ed. Newbury Park, Sage
Publications.
Zhen, J. (2005). IT Needs Help Finding Root Causes. Computerworld, vol. 39, pp. 26.

Anda mungkin juga menyukai