Anda di halaman 1dari 208

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

IMPLEMENTING METRICS IN AN
ORGANIZATION
FRANK VOGELEZANG

he metrics community is aware of the importance of implementing

metrics in an organization. Convincing decision makers in IT-environments is often a difcult job. Implementing a successful metrics program for software development is an even more challenging undertaking. Most implementations start as just another project, but experience learns that there are a number of aspects that are slightly different from a regular software project. Based on experience Sogeti has come up with MOUSE, a compact concept for implementing a metrics program in an organization. The concept describes ve important aspects of a good metrics implementation: Market view, Operation, Utilization, Service and Exploitation. This article shows the differences between the implementation of a metrics program and just another project and explains a concept that can help handle such an implementation the right way.

An information system is supporting the realization of organizational goals. The way in which such a system is developed has been standardized by the maturing eld of software engineering. Usually an information system will be developed in some form of project and will go through a number of development stages. By distinguishing the relevant stages a project can be divided into welldened activities with matching milestones. In this way we can control a project. Every metrics professional is aware of the value metrics have in decision making. With a trained eye metrics which are everywhere [1]: many software developers use some kind of metric to establish the quality of the requirements or to establish whether produced code is ready to be tested. Effective project managers have metrics that allow them to tell when the software will be ready for delivery and whether the budget will be exceeded. In projects metrics are

IMPLEMENTING

METRICS

IN

AN

ORGANIZATION

Why are we using metrics?

usually used implicitly. To convince decision makers in IT-environments that those metrics need to be used explicit and in an unambiguous way is often still a difcult job. Despite signicant progress in the eld of metrics, implementing a successful metrics program for software development is still a challenging undertaking [2]. A metric is also supporting the realization of goals. By dening metrics to certain stages or milestones we can verify and control the progress towards the goal.

Implementing a metrics program


Following this line of thought, implementing a metric or a metrics program should in many ways be comparable to the development of an information system. So implementing a metrics program is just another staged project. To some extent this is a valid comparison. But since a metrics program is really not the same as an information system it requires different activities. Also the stages are somewhat different. Before the decision to implement a metrics program is made, goals need to be dened clearly that should be served by the program [3]. A good framework to decide which metrics are needed for the dened goals is the GQM-method [4]. Metrics can be used for various purposes. These goals and corresponding timeframes are the basis for the organization specic elements in the implementation of a metrics program.
PLAN Preparation
YEARBOOK

DO Training Research

CHECK Implementation
pilot org.imp. evaluation

ACT Use
information adjustments
DO CHECK ACT

inventory introduction

ANNYVERSARY

Figure 1: Stages in the implementation of a metrics program.

Preparation stage: inventory


During this stage an inventory is drawn up of the current working methods and procedures are recorded together with all aspects that might have a relation with the metrics program to be implemented, such as:

THE

NESMA

Already implemented measurements; Software development methodology (staging, milestones, activities, products, guidelines and standards); Software development environment; Which project characteristics are general organization characteristics and which characteristics are project specic; The way effort is recorded.

Preparation stage: introduction


After an analysis of the current working methods and procedures the design of the metrics program will have to be established. Now can be determined which employees in which role will be affected by the metrics program. Those employees have to be informed and when necessary trained to work with the metrics program. At the end of the preparation stage there is a documented consensus about what the metrics program is expected to monitor, who will be involved and in what way.

Training stage
Employees in the organization who are affected by the metrics program will have to be trained to use the metrics properly. Depending on the role this training can range from an introductory presentation to a multiple day training course. For the introduction of a metrics program in an IT-organization typically ve categories of employees emerge:
ORGANIZATION

1. Management
The management must have and convey commitment to the implementation of a metrics program. The management needs to be informed about the possibilities and impossibilities of the metrics program. They also must be aware of the possible consequences a metrics program can have on the organization. It is well known that employees feel threatened by the introduction of a metrics program and in some cases react quite hostile. Such feelings can usually be prevented by open and correct communication of the management about the true purposes of the metrics program.

IMPLEMENTING

METRICS

IN

AN

2. Metrics analysts
The employees who are responsible for analyzing and reporting about the metrics data. They are also responsible for measuring and improving the quality of the metrics program itself. Usually they are already involved in the preparation stage and do not need any more training in this stage of the metrics program.

3. Metrics collectioners
The employees that are actively involved in collecting or calculating the metrics have to know all the details and consequences of the metrics, to assure correct and consistent data. If the metrics that are used in the metrics program come from activities that are already common practice, the training may only take several hours. If the metrics are not common practice or involve specialist training, for instance if functional sizes have to be derived from design documents, the training may take a substantial amount of time. In the last case this involves serious planning, especially in matrix organizations: It will not only consume time of the employee involved, but it will also affect his or her other activities.

4. Software developers
Usually a lot of the employees that are involved in the software development will be affected, directly or indirectly, by the metrics program, because they produce the information the metrics program uses. They need to have understanding of the metrics and the corresponding vocabulary. For them the emphasis of the training needs to be on understanding the use and importance of a metrics program for the organization, because they are usually not
YEARBOOK

experiencing any benet from it in their personal activities, but may need to change some of their products to make measurement possible or consistent.

ANNYVERSARY

5. End-users or clients
Although a metrics program is set up primarily for the use of the implementing organization, end-users or clients can also benet from it. In particular size metrics are useful in the communication between the client and the supplier: how much will the client get for its money. Whether this audience will be part of the training stage for a metrics program depends on the openness

THE

NESMA

of the implementing organization: are they willing to share information about the performance of their projects? At the end of the training stage everyone who will be affected directly or indirectly by the metrics program has sufcient knowledge about this program. It may seem obvious, but it is essential the training stage is nished before (but preferably not too long before) the actual implementation of the metrics program starts.

Research stage
In this stage the metrics to be implemented are mapped on the activities within the organization that will have to supply the metrics data. The exact process of collecting the metrics data is determined and described so that at the start of the implementation it is unambiguous how the metrics data are collected. In this stage it is useful to determine what the range of the metrics data might be. A useful concept for this is planguage [5]. A wrong perception of the possible result of metrics data can kill a metrics program at the start. It is also important to establish at least an idea of the expected bandwidth of the metrics data beforehand to know what deviations can be considered acceptable and what deviations call for immediate action. At the end of the research stage all procedures to collect metrics data are described, target values for each metric are known and allowable bandwidths are established for each metric in the metrics program.
ORGANIZATION

Implementation stage: pilot


Unless an organization is very condent that a metrics program will work properly from the start, it is best to start the implementation with a pilot. In a pilot metrics are collected from a limited number of activities. In a pilot all procedures are checked, experience is built up with these procedures and the rst metrics data are recorded. In this way the assumptions about the metrics values and bandwidths from the research stage can be validated.

IMPLEMENTING

METRICS

IN

AN

Implemention stage: Evaluation


After the completion of the pilot all procedures and assumptions are evaluated and modied if necessary. When the modications are substantial it may be necessary to test them in another pilot before the nal organizational implementation of the metrics program can start.

Implemention stage: Organizational implementation


The pilot and its evaluation can be considered the technical implementation of the metrics program. After completion of this stage the metrics program is technically ready to be implemented. Until now the metrics program has had little impact on the organization, because only a limited number of employees has been involved in the pilot. The organizational implementation of a metrics program will have an impact on the organization because the organization has formulated goals which the metrics program will monitor. These goals may not have been communicated before or may not have been explicitly made visible. Metrics will have to be collected at specied moments or about specied products or processes. This could mean a change in procedures. For the employees involved this is a change process, which can trigger resisting or even quite hostile reactions. In our experience it is essential to install an independent body within the organization that is responsible for carrying out the metrics program. How this body should operate is laid down in the MOUSE concept, which will be described in detail later on. At the end of the implementation stage the metrics program is fully operational and available throughout the organization.
YEARBOOK

Use of the metrics program: information


This stage is actually not a stage anymore. The metrics program has been implemented and is now a part of the day-by-day operations. The metrics program is carried out conform the way it is dened and is producing information that helps the organization to keep track of the way it is moving towards their goals.

THE

NESMA

ANNYVERSARY

Use of the metrics program: Adjustments


A mature metrics program gives continuous insight in the effectiveness of current working procedures to reach the organizational goals. If the effectiveness is lower than desirable adjustments to the procedures should be made. The metrics program itself can then be used to track if the adjustments result in the expected improvement. If working procedures change it is also possible that adjustments have to be made to the metrics program. Organizational goals tend to change over time. A mature metrics program contains regular checks to validate if it is still delivering useful metrics in relation to the organizational goals. All these aspects are covered in the MOUSE concept [6].

Key issues of the MOUSE concept


Implementing a metrics program is more than just training people and dening the use of metrics. All the lessons learned from organizations like Rabobank formed the basis for MOUSE, a concept to help to set-up the right implementation and to create the environment the method ts in [7]. MOUSE describes all activities and services that need to be carried out to get a metrics program up and running. The MOUSE concepts contains all activities and services required to implement a metrics program successfully and lasting, clustering the activities and services into groups of key issues, described in the table below:
Market view Communication Evaluation Investigation Improvement Operation Application Review Analysis Advice Utilisation Training Procedures Organisation Service Helpdesk Guidelines Information Promotion Exploitation Registration Control

A major lesson that can be learned from the MOUSE concept is how a metrics program should operate. We found out that the most suitable organizational structure for a metrics program is to concentrate expertise, knowledge and responsibilities in an independent body. An independent body has many advantages over other organizational structures.

IMPLEMENTING

METRICS

Organizational implementation

IN

AN

ORGANIZATION

For example, when activities are assigned to individuals in projects, many additional measures have to be taken to control the quality of the measurements, the continuation of the measurement activities and the retention of expertise about the metrics program. When responsibilities for (parts of) the metrics program are assigned to projects, additional enforcing measures have to be taken to guarantee adequate attention from the project to metrics program assignments over other project activities. Installing an independent body to oversee and/or carry out the metrics program is essential for achieving the goals the metrics program was set up for. This independent body can be either a person or a group within or outside the organization. In the next paragraphs the ve key issues of the MOUSE concept will be explained and illustrated with examples.

Market view: communication


Communication in the MOUSE concept is an exchange of information about the metrics program both internally (the own organization) and externally (metrics organizations). Internal communication is essential to keep up the awareness about the goals for which the metrics program is set up. Communication with metrics organizations is important to stay informed about the latest developments. Usually an important metric in a metrics program in an IT-environment is the functional size of software. For some of our clients the current applied functional size metric (usually function point analysis) does not fulll all their needs [8]. One of the more promising next generation functional size metrics is COSMIC Full
YEARBOOK

Function Points [9].

Market view: Evaluation


If the independent body is located within the clients organization, a direct and open communication is possible with stakeholders of the metrics program to evaluate whether the metrics program is still supporting the goal it was set up for. When the independent body is positioned outside the clients organization more formal ways to exchange information about the metrics program may be desirable. Regular evaluations or some other form of assessment of the

10

THE

NESMA

ANNYVERSARY

measurement process work well for an open communication about the metrics program.

Market view: Improvement


The signals that the evaluations provide are direct input for continuous improvement of the metrics program. Depending upon the type of signal (operational, conceptual or managerial) further investigation may be required before a signal can be translated to measurement process improvement.

Market view: Investigation


Investigation can be both theoretical and empirical. Theoretical investigation consists of studying literature, visiting seminars or following workshops. Empirical investigation consists of evaluating selected tools for measurement and the analysis of experience data. Usually these two ways of investigation are used in combination. An example of such an investigation is the research of early sizing techniques for COSMIC-FFP [10].

Operation: application
Application includes all activities that are directly related to the application of the metrics program. This includes activities like executing measurements (for example functional size measurements, tallying hours spent and identifying project variables). Within the MOUSE concept the client can choose to assign the functional sizing part of the operation either to the independent body or to members of the projects in the scope of the metrics program.
ORGANIZATION

Operation: Review
The best way to guarantee quality of the measurement data is to incorporate review steps into the metrics program. The purpose of reviewing is threefold: Ensure correct use of the metrics (rules and concepts); Keep track of applicability of the metrics program; To stay informed about developments in the organization that might inuence the metrics program.

11

IMPLEMENTING

METRICS

IN

AN

Operation: Advice
During the research stage all procedures to collect metrics data are described for each metric in the metrics program. These procedures are usually described in a way that they support the organizational goal for which the metrics program was set up. Some metrics data can also be used to support project purposes. The independent body can then be used to give advice about the use of the metrics for these purposes. For example an aspect of the metrics program can be the measurement of the scope creep of projects during their lifetime. Functional size is measured in various stages of the project to keep track of the size as the project is progressing. These functional size measures can also be used for checking the budget as a second opinion to the budget based on work breakdown structure for example. The independent body can give advice about the translation of the creep ratio in the functional size to a possible increase of the budget.

Operation: Analysis
During the research stage target values and allowable bandwidth are established for each metric in the metrics program. The independent body will have to analyze if these target values were realistic at the beginning of the metrics program and if they are still realistic at present. One of the organizational goals might be to get improving values for certain metrics. In that case, the target values for those metrics and/or their allowable bandwidth will change over time.

Utilization: training
YEARBOOK

Next to the basic training at the start of a metrics program it is necessary to maintain knowledge about the metrics program at an appropriate level. The personnel of the independent body should have refreshment training on a regular basis, referring to new developments (rules, regulations) in the area of the applied methods. The independent body can then decide whether it is necessary to train or inform other people involved in the metrics program about these developments. In the case that the independent body is outsourced, the supplier can be made responsible for keeping the knowledge up-to-date.

12

THE

NESMA

ANNYVERSARY

Utilization: Procedures
To guarantee the correct use of a method, procedures related to measurement activities of the metrics program are necessary. They are usually initiated and established in the research stage of the implementation. Not only the measurement activities themselves need to be described, but also facilitating processes like: project management; change management control; project registration; (project) evaluation. After the initial description in the research stage the independent body should monitor that that all the relevant descriptions are kept up-to-date.

Utilization: Organization
As stated earlier the independent body can reside within or outside the organization where the metrics program is carried out. The decision about this organizational aspect is usually combined with the number of people involved in the metrics program. If the metrics program is small enough to be carried out by one person in part-time the tasks of the independent body are usually assigned to an external supplier. If the metrics program is large enough to engage one or more persons full-time the tasks of the independent body are usually assigned to employees of the organization. Depending on the type of organization this might not always be the best solution for a metrics program. When the goals the organization wants to achieve are of such a nature that it involves sensitive information, calling in external consultants might be a bad option, no matter how small the metrics program might be. If employees have to be trained to carry out the tasks of the independent body, they might perceive that as narrowing their options for a career within the organization. In that case it might be wise to assign these tasks to an external party specializing in these kinds of assignments, no matter how large the metrics program is. Outsourcing these assignments to an external party has another advantage: it simplies the processes within the clients organization. Another advantage of outsourcing the independent body could be political: to have a really independent body to do the measurement or at least a counter measurement.
ORGANIZATION

13

IMPLEMENTING

METRICS

IN

AN

Service: helpdesk
To support the metrics program a helpdesk needs to be instated. All questions regarding the metrics program should be directed to this helpdesk. The helpdesk should be able to answer questions with limited impact immediately and should be able to nd the answers to more difcult questions within a reasonable timeframe. It is essential that the helpdesk reacts adequately to all kinds of requests related to the metrics program. In most cases the employees that staff the independent body constitute the helpdesk.

Service: Guidelines
Decisions made regarding the applicability of a specic metric in the metrics program need to be recorded in order to incorporate such decisions into the corporate memory and to be able to verify the validity of these decisions at a later date. Usually such decisions are documented in organization specic guidelines for the use of that specic metric.

Service: Information
The success of a metrics program depends on the quality of the collected data. It is important that those who supply the data are prepared to provide this data. The best way to stimulate this is to give them information about the data in the form of analyses. This should provide answers to frequently asked questions, such-as: What is the current productivity rate for this specic platform?, What is the reliability of the estimations?, What is the effect of team size?. For questions related to functional size metrics the experience database of the ISBSG Benchmark [11] can usually answer most of those questions.
YEARBOOK

Service: Promotion
Promotion is the result of a proactive attitude of the independent body. The independent body should market the benets of the metrics program and should sell the services it can provide based on the collected metrics. Promotion is necessary for the continuation and extension of the metrics program.

14

THE

NESMA

ANNYVERSARY

Exploitation: registration
The registration part of a metrics program consists of two components: the measurement results and the analysis results. In a metrics program in an ITenvironment all metrics will be led digitally without discussion. Here a proper registration usually deals with keeping the necessary data available and accessible for future analysis. For most metrics programs it is desirable that the analysis data is stored in some form of an experience database. It this way the results of the analyses can be used to inform or advice people in the organization.

Exploitation: Control
Control procedures are required to keep procedures, guidelines and the like up-to-date. If they do no longer serve the metrics program or the goals the organization wants to achieve, they should be adjusted or discarded. Special attention needs to be given to the procedures for storing metrics data. That data should be available for as long as is necessary for the metrics program. This might be longer than the life of individual projects, so it is usually advisable to store data in a place that is independent of the projects the data comes from.

Critical success factors for implementing a metrics program


Some factors can considerably increase or decrease the success of a metrics program. Below a number of critical success factors is listed. If all these factors are managed well there is no guarantee the metrics program will be a success, but if these things are kept in mind, the risk of a failing metrics program will decrease signicantly. Most of these success factors have to do with the metrics program itself: The metrics program should support clearly stated goals which are seen as relevant by most employees involved. The organization should be ready to contribute additional effort to execute the metrics program, for example by appointing an independent body and to allow sufcient time to this body to fulll its tasks. The metrics program should receive the right support (skills and commitment), not only during the implementation but continuously during the operation.
ORGANIZATION

15

IMPLEMENTING

METRICS

IN

AN

The organization should be aware of the fact that starting a metrics program will not yield results in a short period. It is a continuous process which only shows results after a period of time. The metrics program should be embedded in the organization. Using the MOUSE concept can help doing that in a sustainable way. Manage the (exaggerated) expectations towards a metrics program. A metrics program is not a goal; it is a tool in evaluating and controlling goal achievement. Make sure the organization is prepared to give insight into their way of working and more in particular how time is spent. Without the proper information about the use of a metrics program employees tend to be uncooperative. The organization must be willing to evaluate projects. The organization should understand that objective measurements are a necessity to estimate and manage activities in a uniform manner. Accept that comparing projects or activities is only possible when there is uniformity in stages, activities and products. There are also success factors that are external to the metrics program and deal with the way the organization interacts with the metrics program. The internal factors are crucial for a good implementation, the external factors are important for a lasting success [12]: Specify how to react to different outcomes. An organization should consider all possible negative and positive outcomes and decide how to act on them. Too often only the desired outcome is specied and there is quite a chance that an organization will not react to a different outcome.
YEARBOOK

Implement organizational changes. An organization should act according to the outcomes of a metrics program, whether they are positive or negative. If an organization fails to act, the value of the metrics program will degrade. Monitor the implemented changes. An organization should verify that changes indeed constitute the intended improvement for the organization. Use explicit and tested assumptions. During the preparation stage and the research stage assumptions are made about the interactions between the organization and the metrics program. It is important that these assumptions are made explicit so that they can be tested at an appropriate time.

16

THE

NESMA

ANNYVERSARY

Added value of a metrics program


Most projects need to save the organization money in some way or another. A metrics program will not do that, at least not directly. Performing the activities involved in a metrics program will cost effort and thus time. However, for a metrics program to survive in the long run, it has to generate added value to the organization, not necessarily in the form of a nancial prot [12]. The added value of a metrics program is control of resources. Because of the metrics program the organization knows what processes or projects use what number of resources. This knowledge can help the organization to use resources in a more controlled way because of the better understanding of the expected use of resources. Knowledge about the use of resources can help to identify processes or projects that consume more resources than average. This might be a trigger to make those processes or projects more resource-efcient. If there is only a limited number of resources, a metrics program can also be helpful in planning these resources. Because the use of these resources is better understood, they can be scheduled with more detail. This reduces the time a resource is not used because it is available earlier than expected. It also reduces the time a process or project has to wait for a resource to become available. Metrics programs are also a valuable asset for customer satisfaction. Because processes or projects are understood in more detail, they can be better predicted. Agreements with customers can thus be met in a predictable fashion, which leads to a better customer satisfaction. A metrics program will not save you money. The added value of a metrics program is that it allows you to do more with the same amount of money, which may be even better.
ORGANIZATION

About the author


Frank Vogelezang (frank.vogelezang@sogeti.nl) has been working as a practitioner and consultant within the area of software metrics for over ve years. Within this area he specialized in estimation and performance measurement within client organizations. He is a consultant for the Expertise Center Metrics of Sogeti Nederland B.V. He is a member of the Measurement Practices Committee of COSMIC and a member of the COSMIC working group of NESMA.

17

IMPLEMENTING

METRICS

IN

AN

References
1 Fenton, N.E., Peeger, S.L., Software metrics: A rigorous & practical approach, 2nd edition, PWS publishing company, Boston (USA), 1997. 2 Briand, L.C., Differding, C.M., Rombach, H.D., Practical guidelines for measurement-based process improvement, Software Process-Improvement and practice, nr 2 (1996). 3 Holmes, L., Measurement program implementation approaches, chapter six in: Jones, C., Linthicum, D.S. (editors), IT measurement Practical advice from the experts, IFPUG / Addison-Wesley, Boston (USA), 2002. 4 Solingen, R. van, Berghout, E., The goal/question/metric method: a practical handguide, McGraw-Hill, Columbus (USA), 1999. 5 Gilb, T., Competitive engineering: A handbook for systems and software engineering using Planguage, Addison-Wesley, Boston (USA), to be published, see www.gilb.com. 6 Dekkers, A.J.E., The practice of function point analysis: measurement structure, Proceedings of the 8th European Software Control and Metrics conference ESCOM 1997, May 26-28, Berlin (Germany), 1997. 7 Dekkers, A.J.E., Vogelezang, F.W., COSMIC Full Function Points: Additional to or replacing FPA, Proceedings of the 13th International Workshop on Software Measurement IWSM 2003, September 23-25, Montral (Canada), 2003. 8 Vogelezang, F.W., Dekkers, A.J.E., One year experience with COSMIC-FFP, Software Measurement European Forum SMEF 2004, January 28-30, Rome (Italy), 2004. 9 Vogelezang, F.W., COSMIC Full Function Points: the next generation of funcYEARBOOK

tional sizing, chapter 12 of this book, NESMA 2004. 10 Vogelezang, F.W., Lesterhuis A., Applicability of COSMIC Full Function Points in an administrative environment: Experiences of an early adopter, Proceedings of the 13th International Workshop on Software Measurement IWSM 2003, September 23-25, Montral (Canada), 2003. 11 Hill, P.R. (editor), The Benchmark release 8: Analyses of the factors that affect the project duration, quality & productivity of software development & enhancement projects & package customisation projects, ISBSG, january 2004.

18

THE

NESMA

ANNYVERSARY

12 Niessink, F., Vliet, J.C. van, Measurement program success factors revisited, Information and software technology, 43 (2001).

19

IMPLEMENTING

METRICS

IN

AN

ORGANIZATION

20

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

THE EXPERIENCE OF REAAL INSURANCES WITH FPA AND COSMIC-FFP


MARCEL KOSTER

This article describes the experiences that REAAL Insurances have had
with Function Point Analysis (FPA) and with Cosmic Full Function Points (COSMIC-FFP). A start was made with FPA in the nineties. After a period of a few years, in which FPA was applied to our satisfaction, the use of FPA fell off until its reintroduction in 2002. The move to COSMIC-FFP took place a short time afterwards. In this article the various challenges, considerations and choices concerning FPA and COSMIC-FFP will be reviewed, taking into account the point at which we saw COCOMO as a possible supplement to COSMIC-FFP.
1990
Budgeting method

1995
FPA succesfull. Reduced use of FPA.

2000

2004
Need for a budgeting method: - Reintroduction of FPA. - Switch to CFFP.

System development

Many new developments.

Enhancement & conversions. New development & reengineering.

Environment / methods

Stable development environment / methods Changing environments / methods.

Figure 2, pag. 19
Figure 2: Budgeting and methodology developments over time.

21

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

COSMIC-FFP

Figure 2 shows these developments over time with regards to the budgeting methodology, system development and the environment/methods, along with the relationships between them. We will discuss these developments in more detail in the article.

Introduction and experiences with FPA in the nineties


The IT organization chose FPA as an independent budgeting method in the beginning of the nineties; there were no, or hardly any, proven alternative methods. Productivity management was not really a problem for REAAL Insurances in those days. The leading reason for this was that we worked mainly with the LINC 4th generation programming language at the time, and a productivity rate could be quickly determined. This productivity rate turned out to be valid for a number of years and it was used for a long time, to our great satisfaction, for the drawing up and support of our budgets. Implementing FPA caused little resistance. Support for it was high, mostly because it provided welcome additional support for the budget as calculated by the project leader. Almost every project leader used FPA. The IT organization had widely accepted the FPA method at the time. The FPA estimates were always executed alongside the expert estimations. No budget was delivered that was exclusively based on FPA. The IT organization had completely afliated itself with FPA and used the tables that were specied within FPA. The most decisive factor in doing the FPA is the expertise of the personnel
YEARBOOK

deployed, especially in the hardware and software areas. Within ITF the inuence of this factor was smaller as the system development activities were grouped around business units. Therefore a developer usually worked with only one programming language and always in the same technical environment. He worked for a xed business unit and could thus gain in-depth knowledge of the customers systems. A developer specialized, for example, in systems concerning damage insurance. The different business units of REAAL Insurances, considered customers, thus have ITF contact people with knowledge of their systems, which simplies communication greatly.

22

THE

NESMA

ANNYVERSARY

Reduced attention for FPA since 1995


The reduced attention for FPA had a number of fundamental causes. In 1995 most new development projects were completed and the focus turned to maintenance. FPA turned out to be less suited for maintenance than for development projects. In this period there were also a large number of conversion projects, mostly caused by various mergers and takeovers. There were many conversions of systems from REAAL, Helvetia and NOG to Hooge Huys systems (currently REAAL Insurances). It turned out that these projects could not be estimated with the help of FPA. These developments resulted in FPA only being used sporadically within ITF.

Implementation and experiences with FPA since 2002


In the period from when FPA was found less applicable until its reintroduction, several new technical environments were selected. So we did not only have LINC as a technical environment, but also had to deal with others such as Fort and Visual Basic, where the latter was chiey used for internet
COSMIC-FFP

applications. A start was made during 2002 with a project to reintroduce FPA. The following points formed the foundation of this new interest: Functionality la carte The customers expressed an ever increasing desire for choice. In other words, the customers wanted to buy functionality la carte. Clarity with regards to the cost per feature is in this case essential because a client makes his choices upon this basis. It sounds simpler than it really is; functionality has to be seen as a cohesive whole. It is for example nonsense to build a query function while nothing can be inserted. Objective budgeting method There re-emerged a wish from management for an objective budgeting method that was based on function points and productivity.

23

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

Analysis Besides this, IT management considered it desirable to have analysis capabilities designed for managing improvement. The analysis concerns among other things the comparability of differences in productivity during the various build phases and the various technical environments. Along with the points mentioned above, there was another advantage of FPA that was considered important, FPA provides an inventory of the functional user requirements in the form of an overview of all functions and data of the application to be built. The overview helps with gaining insight into the functional size of the user requirements. This method of analysis is simpler than technical documentation such as a functional design or a detailed design. In other words, function points are simple to understand even for the non-technical user. It was still necessary to translate some of the FPA technical reports into a form that the user could understand.

Switch to COSMIC-FFP early in 2003


The reintroduction of FPA did not lead to the positive experience that the IT organization had hoped for beforehand. It can be supposed that FPA was satisfactory for LINC environments, if not optimally because these systems had also become more complex. But FPA appeared particularly less suitable for the new environments. Our experiences with FPA can be summarized as follows: Component Based development (CBD) It became apparent after the rst count that FPA was not suitable for counting in this environment. The evaluation of the count established that so many questions existed around FPA in this environment that it must be concluded that FPA is not appropriate. Example: In a CBD environment the functionality of a system is dened in use cases. Technical concepts, such as report, do not belong in use cases. Precisely the absence of such concepts makes FPA counting more problematic. There

24

THE

NESMA

ANNYVERSARY

YEARBOOK

appear to be many catches if you try to apply FPA in this kind of environment, it was in any case not as straight forward as it rst seemed. System complexity FPA does not take the increase in system complexity into account. The method only distinguishes between low, average and high functional complexity ratings. So, if functions that are rst rated as high become even more complex then this cannot be indicated with FPA. It remains a fact that systems are continually becoming more complex and thus more expensive, yet this is not expressed by FPA. More complex and ever greater output FPA does not account for the increasing complexity of output. Here too you can only differentiate between low, average and high. For example, there was mention of a system where only 6 reports were developed. They had, however, such a high level of scope and complexity that they could not be
COSMIC-FFP

satisfactorily expressed in function points. Multi-tier architecture FPA is unsuitable for working with multi-tier architectures. The documentation does not align with FPA any more. FPA requires additional rules concerning how to deal with components that call other components. An extra step was needed in order to do an FPA count, this was namely the re-engineering documentation into functional requirements. Calculations FPA does not make allowances for complex calcuations such as calculating interest. Counting guidelines Immediately following the reintroduction of FPA a list of counting guideline interpretations was begun. It had soon become clear that several alternative interpretations were possible. Many of these rules appeared to be fairly subjective. This was caused by the number of counting guideline dialects that have appeared and it is quite awkward to apply the guidelines uniformly.

25

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

Intake interview After the reintroduction of FPA it became apparent that it was necessary to hold a meeting between the project leader and the person carrying out a count before the function point count could take place. There were clearly a number of questions that had to be answered before the FPA counter could successfully perform the count. Considering the above disadvantages, a choice was made for an alternative. And so COSMIC-FFP was reviewed along with COCOMO. The latter is a possible supplement to COSMIC-FFP. The COCOMO model was developed by Boehm (1981). It is a mathematical model where the estimate is based on the lines of code in the system that is to be built. COCOMO was eventually not selected. The reason is explained in the next paragraph. COSMIC-FFP is, like FPA, a method for measuring the functional size of an information system. While FPA was developed for administrative applications, COSMIC-FFP was designed with the special characteristics of a real time application in mind. The expectations of COSMIC-FFP are: Component Based Development (CBD) The expectation is that COSMIC-FFP will result in a good count in this kind of environment. This is based on the experience of 3rd parties. Currently we assume that, if the use cases have been written in a uniform manner, then COSMIC-FFP will lead to a good count. Agreements have rst to be made concerning the manner in which the use cases are dened. System complexity In contrast to FPA, COSMIC-FFP does allow for increasing complexity because COSMIC-FFP does not use complexity levels such as low, average and high but continues counting. Thus, a system becoming ever more complex can be better represented in the budget. In my opinion the following example claries the difference between FPA and COSMIC-FFP. Suppose that we decide to develop a car. For this purpose we dene a whole series of functions. This list of functions could result in a Fiat, but it could also lead to a Mercedes. There is quite a difference in price between a Fiat

26

THE

NESMA

ANNYVERSARY

YEARBOOK

and a Mercedes. COSMIC-FFP allows much better for the bells and whistles that cause a Mercedes to have a higher price, as for example more expensive mirrors. Though the functionality of the mirror remains the same, the complexity often differs and because COSMIC-FFP is not limited in its measurement conventions, the complexity can be better differentiated. Thus a much better estimate can be delivered than with FPA. Calculations The FPA disadvantage with calculations is not removed with COSMIC-FFP either. Multi-tier architecture COSMIC-FFP accounts better for multi-tier architectures. There is an extra step to identify layers. The rules that identify architecture layers could actually be somewhat improved. Our conclusion is that COSMIC-FFP can best be applied through counting each architecture tier separately. Counting guidelines COSMIC-FFP does not have any correction factors. The counting rules are simpler and are less subjective than FPA. The chance that different dialects of COSMIC-FFP will spring up is considerably less than with FPA. We have already spent more time with COSMIC-FFP than with FPA (after its reintroduction) but a list of counting guideline interpretations has not yet surfaced, which is in contrast with FPA. Intake interview After the reintroduction of FPA, it had proved necessary to hold a meeting before carrying out a count. After implementing COSMIC-FFP it quickly became clear that these intake meetings were no longer necessary. The few questions that did arise could be handled by phone. Our opinion is that COSMIC-FFP demands less of the documentation that is used to perform the count.
COSMIC-FFP

27

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

Readability counting report COSMIC-FFP is based, more than FPA, on the language of the users and the result of a COSMIC-FFP function count is therefore more readable than an FPA count. The difference in readability between an FPA and a COSMIC-FFP report is very big. Fragments of FPA and COSMIC-FFP reports are reproduced below to demonstrate this.

Fragment of an FPA report: Internal logical database: External databases: Total database: Internal logical databases - ADVICE WOONVERKENNER - Settlement form - FPA Tables - Financier - Loan part - Interest - Complexity - Low - Medium - High 45 15 + 60 function points DET 52 4 12 7 16 12 Number 5 1 0 RET 1 1 2-5 1 1 1 Weight 7 10 15 COM M L L L L L FP 35 10 0 + 45

Internal Logical Databases

Fragment from a CFFP report:


YEARBOOK

Functionality Output offer Output spread sheet Output Pure Annuity time indicator Output Pure Annuity Golden Handshake Total

Entries 1 1 1 1

Exits 9 9 2 2

Reads 7 7 2 2

Writes -

Total 17 17 5 5 44 cfsu

ANNYVERSARY

Figure 3: A fragment from a FPA report and from a COSMIC-FFP-report.

28

THE

NESMA

The difference in counting between FPA and COSMIC-FFP


FPA is based on counting and assessing the number of functions executed by the software. These functions are: Input functions; Output functions; Query functions; Internal databases; Interfaces with other systems. In contrast, COSMIC-FFP is based on counting the processing of logical groups. A logical group is for example the address data of a customer. Within COSMIC-FFP every entry, exit, read and write has the value of one cosmic functional size unit. The size of a functional process is equal to the number of entries, exits, reads en writes of the process.

Conclusion with regards to COSMIC-FFP


completion of the functional design. Translating this to a productivity factor is somewhat awkward. Productivity varies according to the environment (Linc, Uniface, MQ, Visual Basic, Fort) and it is dependent on the degree of re-use. This makes it necessary to translate the functional design into terms of programs to be built or re-used. Only then, based on past productivity rates, can you create a budget. In other words, if the functionality is dened but it is not yet known in which technical environment the functionality will be built, then no realistic budget can be provided with the help of COSMIC-FFP, but at the same time it is also not possible with FPA. The difference in productivity between the various technical environments is too large for this. The size of the system can of course be expressed in function points.
COSMIC-FFP

COSMIC-FFP works well for determining the functional size after the

Shortly after the choice for the transition to COSMIC-FFP was made, the COCOMO-model was examined because it could possibly be used in combination with COSMIC-FFP to improve control over the various system development processes.

29

THE

EXPERIENCE

COCOMO

OF

REAAL INSURANCES

WITH

FPA

AND

At rst sight it seemed a welcome addition. The ability to predict the lines of code count can have added value, for example, for project monitoring. The fact is that with this model a project plan can be changed in an earlier phase. If, after a certain time, 10,000 lines of programming code should have been written, and it turns out to be only 5,000, than it is possible to intervene at a relatively early moment, often even before the various signals from the project have reached those responsible. However, we soon hit a few challenges. After all, what is the fundamental dilemma of cost estimations that are based on functional size, causing the application of lines of programming code counting difcult to get going? The problems entailed by changing existing code. Re-use of code is not always economically viable when, for example, there is too much of it to be examined and changed; The publication of new development processes, methods and supporting software; Difculties in inspecting existing code; At the start of a project not all factors (functionality, development environment, available resources, tools, and required activities) inuencing for example the duration and cost of a project are known. Especially the inspection of existing code takes a lot of effort. In some technical environments like Uniface, programming code might be stored spread across dozens of database tables. It takes a lot of effort to retrieve the lines of programming code for a specic application, especially when also a distinction
YEARBOOK

has to be made between lines of programming code which have been developed within a certain project at a specic time. Apart from these points, we estimated that one or two persons would have been employed full-time to keep track of developments around the lines of programming code, such as checking regularly the number of written lines of code. We decided therefore not to spend any more time and attention on this issue for the time being.

30

THE

NESMA

ANNYVERSARY

What is the current position of REAAL Insurances?


After describing the experiences with COSMIC-FFP, this section will pay attention to where REAAL Insurances currently is with COSMIC-FFP. A few people have been retrained and we will touch upon that experience. Furthermore we will indicate in which phases of the system development process COSMIC-FFP is currently being used. We will also pay attention to the support of the project leader in the budgeting of the building phase. Next we will discuss the denition of the metrics that are also used to determine the productivity rate and to follow its development. In the nal part of this section we will indicate why we, at least for now, will not make a distinction between enhancement project counting and development project counting.

Retraining
It was decided to retrain a number of persons at an early stage, in order to support the COSMIC-FFP-coordinator in the propagation of the new method, and concerned mainly project-leaders that had gained some experience with FPA earlier. It was surprising to see that only one day was needed to retrain from FPA to COSMIC-FFP. Of course the necessary experience still has to be gained, but the counting guidelines turned out to be so simple that no discussions arose. The only point of discussion was the term 'logical group', as used in COSMIC-FFP. Its interpretation elicited some differences of opinion. The positive conclusion was that gaining acceptance of COSMIC-FFP within this group went smoothly.
COSMIC-FFP

to act as an expert-group together with the COSMIC-FFP-coordinator. This

COSMIC-FFP-counts
Currently COSMIC-FFP-counts are made during the following system development phases: COSMIC-FFP-measurement afterwards, that is to say: after completion. This is done in order to get a productivity rate based on measuring several projects. Before development but after completion of the functional requirements. This is done in order to gain experience with the applicability of COSMIC-FFP as a budgeting method in advance, which is to say before starting de devel-

31

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

opment process. The ination factor of a project can also be established if a count is executed afterwards. Even though the productivity rates to be used still have a largish margin as far as we are concerned, the project leaders can gain some experience with using COSMIC-FFP to support budgeting. The counts can be used by the project leaders to have a fresh look at the budgets they determined, or it can provide additional certainty about the budget's size.

Budgeting in the development phase


This year we started supporting of the project leader with the budgeting of the development phase. The project leader makes a budget and uses COSMIC-FFP for added certainty. As long as the budget determined by the project leader falls within the mentioned margins, it is not looked into any further. If this is not the case, the COSMIC-FFP-counting coordinator together with the project leader will have a look at the possible reasons for this deviation. The possible causes are logged in the counting administration, as these data are important for, among other things, the determination of the productivity rate. The project-leader might have determined a higher estimation because the project team consists of relatively many inexperienced team-members. If this kind of analysis is known, it might be decided later not to include the productivity rate from the system developed by this team in the metrics.

Registration of hours spent


After completion of a system/application the productivity is determined based on the number of hours spent per function point. This is done not only for
YEARBOOK

the system as a whole, but also for the various phases. The following control information is available to the management: The productivity per phase within a technical environment; A comparison of the productivity of a specic phase, for example 'Implementation' between the various technical environments; The relation to the total number of hours for a project, regarding for example the acceptance test, in the various technical environments.

32

THE

NESMA

ANNYVERSARY

Data for management reports is registered during the system development process. In doing so the phases that are important during the building of an information system are taken into account.

Distinction between enhancement and new development


Enhancement projects are counted as if they were development projects. Just like with FPA, COSMIC-FFP attributes different 'weights' to enhancement functions. The choice was made not to introduce this distinction as it looks like this will result in enhancement counts that are too low. The reason can probably be found in the complexity of the systems, where it might even be more expensive to chance or remove functions than to build new ones.

Integration into the ITF Quality system


We are currently integrating the COSMIC-FFP methods into the ITF Quality system. The registration of project data and the publication of benchmark numbers (the productivity rates for a specic technical environment including part of quality monitoring. The COSMIC-FFP-coordinator is responsible for these activities.
COSMIC-FFP

the accepted margin) are centralized. These activities are considered an integral

motivation for the transition to COSMIC-FFP. We outlined a few steps taken so far, but we are not there yet! We want to make the following steps to reach our goals, namely having at our disposal an objective budgeting method, an improvement directed analysis capability and the capacity to offer customers functionality a la carte: Applicability in the CBD-environment There are agreements in place concerning a uniform method for dening use cases. A pilot count will be made to establish whether these agreements will sufce to get accurate COSMIC-FFP-counts in the CBD-environment.

33

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

So far we have described our experiences with FPA and COSMIC-FFP, and our

AND

What are the next steps?

Distinction between enhancement and development Experience will tell if this distinction will have to be made after all. After executing a few counts we will look at this in more detail. Productivity rate After a running-in period in which a number of trial counts with COSMIC-FFP were executed, 2004 will be dedicated to determining the productivity rates for the various technical environments. This year about 25 counts will be performed by means of COSMIC-FFP. It is expected that the now somewhat large productivity rate margins (between 8 and 11 hours per COSMIC function point) will decrease and deliver a more accurate productivity rate. Counting group In 2005 a counting group will be established to perform the counts. Within a small group of counters, interpretations and assumptions can easily be aligned and more value can be attached to the independence and consistency of the executed counts. At the moment our counts are still done externally. la carte functionality in the build phase In 2005 we will be able to support the customer in offering la carte functionality in the build phase. When the functional requirements are known, counting will not be much of a problem. From 2005 we will deploy COSMIC-FFP fully in the estimation of the development phase. Fully, in the sense that we will count all projects that are larger than a specic minimum size.
YEARBOOK

la carte functionality in a previous phase We have not yet reached the point that we can provide the customer with la carte functionality for an earlier phase than the build phase. In a phase as early as this a budget might deviate as much as 50% from one that is established at the moment all functional requirements are known. If a customer has to be supported earlier on in the process in order to be able to acquire la carte functionality, based on -at that time- high level functional requirements, it will be necessary to have a detailed agreement in place. This will receive the necessary attention as of 2005. After all, the functionality la

34

THE

NESMA

ANNYVERSARY

carte concept is one of the corner stones supporting the decision to introduce a new budgeting method.

About the author


Marcel Koster (marcel.koster@reaal.nl) works in the Quality department, section ITF, of REAAL Insurances. He was the project leader supervising the reintroduction of FPA, and later the transition to COSMIC-FFP. He is still involved with COSMIC-FFP in his role of COSMIC-FFP-coordinator. SNS REAAL Group is an innovative retail bancassurance group. With its main brands, SNS Bank and REAAL Verzekeringen, SNS REAAL Group provides services to both private and business customers. It also carries a number of marketspecic brands: ASN Bank, BLG Morgages, CVB Bank, Proteq Direct, SNS Securities and SNS Asset Management. Its market attitude is retail-based. By picking up signals in the market, we can gain an in-depth understanding of our customers' wishes, which we can translate into customer-friendly products. The customer sets the agenda, not the customer-oriented, professional, honest and socially committed are always the starting point for doing business. SNS REAAL Group has a total balance of 53 billion, with nearly 6.000 employees. Within the main brand of REAAL Insurances (known as 'Hooge Huys' till April 5th, 2004), ITF is responsible for computerization of the insurance branch within this group. Currently ITF has 300 employees.
COSMIC-FFP

other way round. We come up with suitable solutions. Our business principles:

35

THE

EXPERIENCE

OF

REAAL INSURANCES

WITH

FPA

AND

36

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

MEASURE AND CONTROL SOFTWARE


TESTING BY QUANTIFICATION
HENRY PETERS

he view on software development and quality is changing. Users dont

expect 100% error-free software applications anymore, and software quality does not have to guarantee a long lifecycle. These shifted expectations on software quality inevitably are due to the fact that the software developers apparently are not able to make better software. Related to that, also the testing approach is changing. Testing has to be performed from this lower quality and higher risk expectation: test what is necessary in the given circumstances. Test quantication in this situation is essential, to maintain a professional test approach. This applies particularly to the start, control and nishing decisions for test projects. At the start of a test project the main question is how much testing is required. Several effort estimation techniques are available. However, in general it remains unclear how many actual test cases have to be executed, in order to reach a valid conclusion about the software quality. In every day practice, with its time pressure and priority issues, this is quite a problem. A practical estimation method can be implemented by using function points together with some straightforward depth and quality indicators. Information from this estimation method can be used further to control the test project. Test projects by denition are somewhat unpredictable. Tight project control by means of proper, quantiable indicators is necessary. Finally the moment of nishing the test work has to be established, with conclusions about remaining errors that can be expected. Here also quantication is necessary. It is important to nd and use the proper measurements.
QUANTIFICATION

39

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

Introduction to this article


The main questions in testing are: what do I have to test, when can I stop testing and what can I tell about the quality of the tested object? By applying some rather simple metrics one can answer these questions. In this article the following issues will be treated: The changing view on system development and software quality, and the consequences from that view for software testing; The possibilities for quantication in software testing; Estimating, controlling and proper nishing tests, by means of quantied data.

Changing views on software development


The view on software development and software quality is changing. From early development the starting point was: software should not contain errors. Because it appeared to be very difcult to predict and test all possible errorgenerating situations the point of view was shifted to a more practical: no errors in normal use. Errors that appeared anyway in general were repaired. The view on software errors has gradually changed further. The complexity of software applications has increased, in core functionality as well as options for practical use. Software applications furthermore require a heavy interaction between numerous software and hardware components. There is a pressure from the software suppliers, with impact on software quality too. Frequent new releases or versions, with new possibilities, options, sophisticated user interfaces etc. There is also a pressure towards the software suppliers: customer organisations want new services to their markets, supported
YEARBOOK

by software applications that can be implemented and adapted (or replaced) rapidly. The time to market for products and services is an important pull mechanism for software development. This all leads inevitably to more software errors. We use platforms, tools and standard software packages that we dont know thoroughly and that certainly are not completely error-free. Software is brought to the market very quickly en user organisations have no opportunity to nd and eliminate all errors before they start using the software. Software more and more is considered as a temporary tool, instead of a nal, ultimate solution as in early development. Organisations more quickly decide to

40

THE

NESMA

ANNYVERSARY

adapt the software, switch to newer versions or even completely replace software applications. Due to these developments the software user has learned to accept software that is not 100% error-proof. Maybe the conclusion should be that we are simply not capable to produce better software than currently available, within the given time frames and budgets. It appears to be that a lower software quality and business risks in the information processes are acceptable.

Consequences for testing


Related to the changing view on development, also the testing approach is changing. According the traditional test theory literature a test has to be build upon a solid base of system specications, e.g. a functional design with sufcient detail and quality. The tester then determines on an analytical way the test cases, from the specications that are nalised and approved by all parties involved. Depending on the design viewpoint (logical or technical) one builds a black box or a white box test. The traditional solid V-model illustrates this approach very well. However every days practice is rather different. A user organisation wants to use the software, project management sees the development dead lines and budget exhaustion, the supplier wants an acceptance decision to get paid etc. The software applications together with the underlying and related components build a very complex structure. Analysing al error possibilities, translating them into test cases and executing all these cases is impossible. And of course, software testing is under pressure because it takes time and money and delivers no direct positive results. Selling bugs is more difcult than selling fancy software features. Project managers dont want lists of problems that should have been avoided earlier. They rather want an emergency plan in order to nish the project in a decent way by delivering the software products with acceptable quality. Testing, to the expectation of the software application stake holders, has be quick, cheap and practical. Testing should encompass what has to be done and can be done in a given situation. Testing has to be performed from the lower quality and higher risk expectation: test what is necessary in the given circumstances.
QUANTIFICATION

41

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

Such an approach easily can degenerate to unstructured error guessing process, to try and nd in the remaining (narrow) time at least the most critical errors. A software tester in these kinds of situations only can perform on a professional level by controlling the test effort with proper variables and quantiable data, and by reporting quantied results and conclusions, build on these results. Quantication within the test process is essential for planning the tests, control the test process and taking the decision to stop testing on the right moment.

Quantication of the test process


The test process in general can be divided into a number of process phases. Within our own test process modelling we use four phases for the primary testing activities: Preparation, Specication, Execution and Consolidation. All project management, control and supporting activities are cleared from this process and are placed in other, separate activity layers. 1) Within the layer of the primary activities the phases Specication and Execution require the main part of the total testing effort and therefore are the most difcult to control. The focus in the remaining part of this article will be on these two phases.

specification specification
system model
YEARBOOK

execution execution
L>P
system realisation

test cases

defects

P>L

Figure 4: The test process in general can be divided into four phases: preparation, specication, execution and consolidation. In this article the focus will be on Specication and Execution.

ANNYVERSARY

At the beginning of the specication phase the tester rst collects the necessary information. This could be a functional design but also a set of

NESMA

1. A detailed description can be obtained by request to DataCase. For the purpose of this article a quick view on the phases is sufcient.

42

THE

documents with system requirements, change requests, known problems and already implemented (but not tested..) solutions, etc. In any case the test base for the tester builds a system model of the software application that has to be tested. The tester then denes on this base information a set of test cases, that are elaborated to a level that makes them executable without ambiguity. This not always requires a full detailed technical/physical description. The realised software application will be delivered to the tester. The test cases are transformed into physical actions, like performing an actual deletetransaction, and executed. These actions generate responses of the software that have to be measured (observation, counting, comparing, etc.) and evaluated into (at least) the two categories good and wrong. These reactions are registered carefully, especially the wrong-results or defects. What can be quantied in this process? Three important variables emerge: Volume (and complexity) of the software application; The number of test cases; The number of defects. To simplify the remainder we will assume that the volume of the system model does resemble the volume of the actual realised application, that all test cases designed also will be executed and all defects found also are registered. In other words there is no difference in quantities between the logical and physical domain.
QUANTIFICATION

Variables and metrics


Software application volume The volume of a software application can be expressed very well in the number of function points. The Function Point Analysis of FPA method to calculate this number is a well-established, standardised technique and is recently certied by ISO as a standard Functional Sizing Method. There are several variants, aimed towards specic use, like test function points and maintenance function points. Using standard function points however has the advantage of easy comparison of applications, as well as development methods and development phases.

43

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

Furthermore within standard FPA there are well-dened quick and easy techniques to make a fair estimate of the number of function point in situations where specications are incomplete. Function point can be counted but also derived from or checked by other known facts about the software application, like the number of data objects, screens and reports. This makes FPA a good solution for estimating the software application volume for test purposes. The number of test cases First we have to establish to a certain level what a test case is. All basic actions that a user can execute within a software application, like opening and closing a window, could be considered a individual test cases. This leads to an unlimited number of cases, with no practical use for our purposes. Therefore we will use a more useful denition here: a test case has to deal with some kind of application-specic content: adding a new customer with certain characteristics, a nancial transaction, deleting an customer order with a specic order status, etcetera. More sophisticated and precise denitions can be made without any doubt, but they are not necessary for the purposes of the test control described here. The absolute number of test cases itself has a limited value as an indicator. To come to a more solid measure the number of test cases can be related to the application volume (to be expressed in function points). This is an easy way to express the depth of a test. An important advantage of this metric is that it can be applied from the outside: an exhausting and time-consuming analysis of internal decision
YEARBOOK

structures and program paths is not necessary. The number of defects As with the test case one can also bother about the denition of defects, failures, errors etc. For the purpose of controlling the test process again this is not necessary. Of course it is good to give proper attention to defects and defect registration. For instance defects that are merely questions by the tester or additional requirements from users can be excluded and complex defect descriptions that contain in fact a number of defects have to be rearranged. In

44

THE

NESMA

ANNYVERSARY

general this is limited to a practical quality check and no extensive defect analysis. A defect has an important feature that will be used in our control approach: an originating date and time. This can be used to establish the defect nding pattern as will be explained later. There are several defect metrics: Number of defects per function point; Likely to be thought of, but only useful as an overall indicator for system development including testing. One can not determine the cause of the defects. E.g. a small number can point to good development as well as poor testing (or both). Number of defects per test case; This is a more precise metric for testing. The test quality is in fact normalized here. Number of defects in time; This appears to be a useful metric for software quality. This will be explained in next sections. Number of defects found/not found; This detection rate is in fact the essential metric for testing. The main question here is to derive and or estimate the proper values for these two variables. This also will be elaborated further.
QUANTIFICATION

We now will discuss the application of the metrics described in the preceding paragraph, for planning, control and stop decision purposes.

Estimation and Planning


At the start of a test project an important issue is the required testing effort. Several estimation techniques are available. However, in general it remains unclear how many actual test cases have to be executed to enable valid conclusions about the software quality. The traditional, rather technical test depth measures are not very useful in every days practice, with its time pressure and priority issues. A practical estimation method can be implemented by using function points together with some straightforward depth and quality indicators.

45

MEASURE

AND

CONTROL

SOFTWARE

TESTING

Metric application for planning, control and stop decisions

BY

The regular use of function points for estimation purpose is: number of function points * productivity factor (hours per function point) The productivity then applies to the activity that has to be established, e.g. technical design and programming. One could use this for estimating the testing effort also. For testing however it remains unclear what will be done in these hours. To get a more solid result some indication of test depth has to be included. This can be accomplished by means of the metric described earlier: the number of test cases per function point. This leads to an elaboration of the formula, to: number of function points * number of test cases per function point * hours per test case As said before, the number of function points can be calculated, from a quick and easy to a detailed approach, depending on available time and information. The number of hours per test case is a simple, easy understandable measure that can be established and maintained very well in a testing environment. The number of test cases per function point causes more practical problems at rst sight: how can we determine the required number of test cases per function point in a planning stage? Within DataCase we have collected data from a set of test projects. For these projects (about 15 projects currently) the test depth as dened here is compared to the test results.
YEARBOOK

At rst an elementary classication for a set of projects was executed.: Which of these projects were good, sufcient tests? In other words: after test execution the application could be transferred to regular use and maintenance phases, without major problems. Which of these projects had less quality than necessary? In other words the application could not be transferred to regular use and maintenance without further measures. The application had to go in a special extra phase of (e.g.) tight production control, limited use, shadow production, with continuation of testing etc.

46

THE

NESMA

ANNYVERSARY

Which projects had a quality somewhere in between those extreme cases, classied as Moderate (low) and Fair (reasonable). This resulted in a rst, very global impression of test depth and results, shown in next table.

Number of test cases / function point 0.25 0.50 1 1.5

Test Quality Minimum Moderate Fair Good

As a second study a more detailed quantitative analysis has been performed. For a number of projects from the available data a detection rate could be calculated. This detection rate was calculated as the ratio between the number of defects found in the own test (from which also the test depth was known) and the total number of defects known. The total number consists of the defects found in the own test and the defects that emerged from a certain period after this test. This period could be an acceptance test period by the user organisation, a period of production etc. The numbers are normalized (whenever necessary by extrapolation) to a comparable period of four months. Figure 5 shows the detection rate for a number of projects as a function of the test depth, expressed in the number of test cases per function point.
QUANTIFICATION

47

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

detection rate 100% 80% 60% 40% 20% testcases/function point 0% 0,0 0,5
detection rate / test depth exponential calculated s-curve calculated linear calculated average fit

1,0

1,5

2,0

Figure 5: The detection rate for a number of projects as a function of the test depth, expressed in th number of test cases per function point.

Of course these dots are not in a straight line, but a relationship can be seen. The relationship can be investigated in different ways. The diagram shows for instance a linear, exponential and s-curve relation that can be determined from these points. The average of these shows a pattern that can be expected by intuition: adding more test case results in a higher detection rate but the effect of this will gradually decrease. With these table and diagram information it is possible to make an estimation of the percentage of defects one will probably nd (as well as the
YEARBOOK

percentage probably to be missed) by applying a number of test cases per function point This enables the choice of planned test depth at the start of a test project. Of course there will be a substantial uncertainty, because this choice is based upon statistical and still limited data. Besides that, the detection rate does not give any clue about the severity of the defects found or missed.

ANNYVERSARY

NESMA

Control
The quantitative data and metrics that are used in planning stage can also be used very well during project control. Test projects by denition have a certain

48

THE

degree of uncertainty, due to the inuence of the unknown software quality on the test itself. Tight project control by means of proper, quantiable indicators is necessary. The variables that can be used for control are: Number of function points Whenever required by the situation one can limit the number of function points, by putting some application parts out of scope. This can be done for instance for parts that appear to have very low quality after some initial testing. These parts then can be shifted to further development. Another strategy is to shift application parts that are not necessary immediately for production to later, additional test projects. These parts (when not too small) can be expressed in function points and then the impact on the test planning can be calculated directly. The parts cannot be too small because the function points measure is not suitable for small application bits like individual user functions. Number of test cases The number of test cases and their distribution over the application parts can be changed during the test when forced by the circumstances. Testing for weak parts with a lot of defects can be increased by adding test cases, testing for already proven rm and stable parts can be limited. Treating the defects This is a typical control factor, a little beyond the scope of this article, but also important. By limiting the defect solution activity the project control for testing can be improved. There are two extreme strategies, with a lot of possibilities in between: Repair all defects. This leads to high application quality, but gives a very unpredictable end to the test project. Repair no defects at all. The application will be transferred as is or will not be transferred at all when defect are too numerous or severe. For defect repair a separate new project including test will be dened, e.g. a next release. The optimum strategy often lies in between these extremes. A proper decision on defect repair however is an important instrument for control.
QUANTIFICATION

49

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

Stop decision
Finally the moment of nishing the test work has to be established, with especially the conclusions about remaining errors that can be expected. Here also quantication is necessary. It is important to nd and use the proper measurements. There are different possibilities, that can be combined. We chose two practical approaches: Determine the number of test cases executed: is this sufcient for the intended detection rate? The assumption here is the correctness of the initial estimation of depth and detection rate in the planning stage. Determine the actual defects in time. The defect pattern shows whether or not the intended detection rate has been reached. This is a type of metric that already exists in several technical appearances. The cumulative defect pattern in time appears to be a suitable instrument for the second approach. The time axis can be the days of actual testing activity. The cumulative number of defects is the total of defects found up to the actual day. It is of course possible to distinguish defects, for instance according severity, but for tracking the test process en making the proper stop decision this is not relevant. The defect pattern shows a curve that in general can be approached by a socalled s-curve. The formula of these types of curves is S = a.X / (X + exp(b-cX) S is the cumulative number of defects and X is the test day sequential
YEARBOOK

number (or, in other words, the number of test days up till this moment). The formula has three parameters by which the curve can be tuned to the real defect data. Parameter a determines the total level, parameter b the length of the initial at part of the s-curve and parameter c the slope of the mid-part of the s-shape. With some basic statistical alignment, by means of correlation and quadratic deviation analysis, one can determine the optimum parameter setting. Figure 6 shows the actual and calculated defect percentages as a function of the test days for some test projects. During the test execution estimations can be made, by determining the scurve. The s-curve is used for extrapolation of the actual data, to see the nal

50

THE

NESMA

ANNYVERSARY

level of the number of defects and the time when this level of will be reached. The theoretical curve is placed upon the real data and tuned by means of the three parameters.

cum.defects 100% 80% 60% 40% 20% test days 0% 4 7 10 13 16 19 22 25 28 31 34 37 40 43

Figure 6: The actual and calculated defect percentages as a function of the test days for some test projects.
QUANTIFICATION

There are some known alternatives for this method. One is merely plotting the daily number of defects per day. This has to show a so-called Rayleigh curve. The S-curve can be seen as the integrated curve with the advantage of less statistical noise and therefore a better view. Figure 7 shows the calculated s-curves for a number of projects. These are the theoretical curves that have been tted as good as possible to the real defect data. The diagram shows that the curves can vary rather heavily. In general this can be explained by the differences between the test projects involved. A test that mainly consists of limited regression testing clearly gives a picture that is different from a rst-time test on a new-build and not stabilized software application.

51

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

100% 80% 60% 40% 20% 0% 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69

Figure 7: The calculated s-curves for a number of projects.

When applying these types of metrics some issues have to be kept in mind: The underlying assumption is a stable, constant "test load". When the test effort is substantially decreased from a certain day this will lead to a decrease in defects found. Filtering and adjusting for minor changes in test effort is not very useful, but major changes in the test have to be accounted for. Examples are complete stops due to major problems in software or test environment. A registration of defect originating day but also test case execution is necessary to keep track of the defect curve and the actual test depth. The general time unit is a test day.
YEARBOOK

Basic, simple and pragmatic statistics will do. Too many ltering, categorizing etc. leads to incomparable detail. This also applies to categorizing test cases or defects. This limits the available data and thus gives less usable statistical results. Ultimately every project, test case or defect could be considered as unique, but this point of view does not provide any statistical result.

ANNYVERSARY

Finally
Applying metrics and techniques for planning, controlling and stopping tests at the right moment requires very little additional effort for registration, analysis and calculation. The additional costs are minor compared to the total costs of a

52

THE

NESMA

test project. The appliance does not require specic tools, education or support. In general the available tools in a test environment will be sufcient. From the point of project costs there is hardly a reason for not implementing these instruments, while the benets are very clear.

About the author


Henry Peters (h.peters@datacase.nl) is manager and consultant on software engineering. He is board member of the NESMA and general manager of DataCase, an IT organisation specialised in software testing. Over 25 years of ITexperience, mainly on the area of quality management and testing, in many organisations, projects and software applications. Focus on measurement of test activities and results. He developed an experience-based test method, with special focus on estimation, planning, control and quantitative evaluation of test projects.

53

MEASURE

AND

CONTROL

SOFTWARE

TESTING

BY

QUANTIFICATION

54

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

MEASURING DEFECTS TO CONTROL


PRODUCT QUALITY
BEN LINDERS

ouldnt it be nice if you could have more insight into the quality of a

product, while it is developed, and not afterwards? Would you like to be able to estimate how many defects are inserted in the product in a certain phase, and how effective a (test) phase is in capturing these defects? The simple but very effective model described in this paper makes it possible! The model is used at Ericsson to develop software for telecommunications product. It supports controlling projects, by putting quality next to planning and budget, evaluating risks, and taking decisions regarding release and maintenance. This paper will rst highlight why there was a need for such a model, and why existing measurements didnt fulll this need. Then the model itself, and the deployment in the projects are described. Conclusions that were drawn from the model, using feedback sessions, are described, explaining how the projects have beneted from the model. At the end we look shortly into the future, regarding both the model and the needs of the organization regarding measurements on product quality.
QUALITY

Why a Model?
Within Ericsson there has always been focus on the quality of developed products, next to planning and budget. Initially measurements like fault density were used. But fault density has major drawbacks; one being that you can only measure it after a phase is completed, and another is that it does not give any insight on the causes if a product has a quality risk. For instance, high fault density can either mean that there is a quality risk, that the product was more thoroughly tested than other products, or both. The same applies for a low fault density, the reason could be that insufcient testing was done and that defects remain undetected in the product (a product quality

55

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

risk), or that the product has a better quality and thus less defects were found, or both. Studies outside of Ericsson have also revealed the limited value of fault density; see for instance [1], other studies showed defect measurements that successful organizations used [2]. So there was a need for new measurements that would give more insight. The GQM metric approach was used to dene the measurements [3].

Goals:
1 Control verication activities (optimize defect detection). 2 Control development activities (minimize defect injection). 3 Predict release quality of the product. 4 Improve the quality of the development and test processes. There is need for measurements usable to steer quality: measurements to plan quality at the start of the project, and track it during project phases. Enabling corrective and preventive actions and reducing quality risks in a project. An additional projects need is to estimate the number of latent defects in a product at release. The purpose is twofold. In the rst place, it is usable to decide if the product can be delivered to customers, or released, knowing the quality. Secondly, it helps to plan the support and maintenance capacity needed to resolve the defects that are anticipated to be reported by customers. Finally it should be possible to have quality data that is analyzed together with the applied processes, and the way a project is organized. This analysis
YEARBOOK

provides insight into process and organizational bottlenecks, and therefore enables cost efcient improvements.

ANNYVERSARY

Questions:
1 What will be the quality of the released product? a Per requirement? b As perceived by customers? 2 How good are inspections? a How effective is the preparation? b How effective is this review meeting?

56

THE

NESMA

3 How good are the test phases? a How many test cases are needed? b How effective is a test phase? 4 What is the quality of the requirement denition? 5 What is the quality of the high level and detailed design? 6 What is the initial quality of the code (before inspections/test)? 7 Which phase/activity has the biggest inuence on quality? This list of questions is not exhaustive, but they are the rst ones that come to mind when you want to measure and control quality. Certain questions can trigger additional questions, for instance when it appears that a certain test phase is ineffective in nding defects, additional questions are needed to investigate the activities and its effectiveness.

Metrics:
1 Number of undetected defects in the released product. 2 Number of defects found per requirement/feature. 3 Number of latent defects in a product before an inspection or a test phase (available). 4 Number of defects expected to nd in an inspection. 5 Actual number of defects found in an inspection (detected). 7 Actual number of defects found in a test phase (detected). 8 Size of the document/code. 9 Detection rate: percentage of defects detected (detected/available). The metrics listed above can be collected in most projects, since the data is usually available in inspection records and defect tracking tools. But to analyze the metrics, a measurement model is needed. This since the metrics are related, only when looking at a combination of several metrics conclusions can be drawn that help answering questions and reaching the goals of the measurements. To get more insight into the quality of the product during development, the software development processes must be measured with two views: Introduction
QUALITY

6 Number of defects expected to be found in a test phase.

57

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

and detection of defects. To develop the model, descriptions from Watts Humphrey [4] and Stephen H. Kan [5] have been used.

Figure 8: Defect ow.

Introduction is done during the specication, design and coding phases; defects are either introduced into documents or into the actual product. Measuring introduction gives an indication of development phase quality. Detection of defects is done via inspections and test during all the phases of the project. Measuring detection gives insight into the effectiveness of verication phases. By using these two measurements, a project can determine if there is a quality risk, and what the origin is: Too many defects in the product and/or insufcient testing to capture the defects.

Development Phase Quality


The quality of a product depends on the number of defects that are inserted during the development phases. Mistakes are made in every phase, from
YEARBOOK

specication to implementation. Defects that are detected and removed increase the likely quality of the end product. However, those defects reect the inefciency of the development process. Defects which are not detected in the phase in which they are inserted lead to more and expensive rework and can decrease product quality if they remain in the product after release and surface when customers use the product. The aim is to remove defects as early as possible and to have as few defects as possible in the product when released, thereby delivering quality products. At the start of a phase the number of inserted defects is estimated. During execution of the phase this estimate is adjusted based on the number of defects

58

THE

NESMA

ANNYVERSARY

actually detected. Since it is sometimes difcult to estimate the number of defects, an alternative method is to estimate the size of the produced documents or code, and use size multiplied by the defect density to estimate the number of defects. In all cases, it is better to do a rough estimate, and adjust it during a project, than to do no estimate. Historical data of earlier projects is very useful when estimating defect introduction. Also industry data is used when no historical data is available.

Defect Detection Effectiveness


The aim of verication is to detect the inserted defects, preferably in the earliest phase that they can economically be detected. The effectiveness is expressed with a detection rate, that is: Detection Rate = Number of defects detected / Number of defects present in product An organization has a detection rate for a certain phase, which is estimated within certain statistical limits. Initially when no historical data of an organization is available, industrial gures can be used. An alternative for the detection rate is to estimate the absolute number of defects that are likely to be found in the current phase. Based on that number and the number of defects During the execution of a phase, the detection rate is adjusted based on the actual number of defects detected. If, for instance, a detection rate of 50% is expected, and 46% of the expected defects are detected halfway through the phase, then either the number of defects that was inserted will be higher than initially expected, or the actual detection rate is higher fewer defects were inserted than were predicted. If the rst is true, then there is a quality risk in the product, which needs to be investigated. Also it gives a signal usable to improve the process phase where the defects were introduced. In a next increment of the project, defect introduction can thus be reduced. If fewer defects were inserted and thus the resulting detection rate is higher, then further investigation is warranted to understand how this was accomplished. experiences from this one. That would make it possible to learn and improve verication in other projects, based on the positive
QUALITY

present, the detection rate is derived.

59

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

The combination of measurements on defect insertion and defect detection gives a more detailed view of the quality of the product, and effectiveness of the development processes. This provides a project with better means to track and control quality.

Pilot Project
The defect introduction and detection model as described in the earlier paragraphs is implemented in a pilot project for a network management product. Since the project has two distinct requirements, the project is divided in two increments with separate teams, which are overlapping in time. The model copes with these two increments separately, since different processes are used. As the rst part of the project was combined for the two increments, and also nal testing was combined, the basic introduction/detection model becomes:

Requirements

Network Test

1st customer Test Test

Maintenance

Architecture

FT/ST Incr 1 FT/ST Incr 2 Design-Impl-Unit Test Incr 1 Design-Impl-Unit Test Incr 2

Figure 9: Project phases.

A tool for the model is developed using a spreadsheet: The Project Defect Model. The purpose of the Project Defect Model is to estimate defects inserted
YEARBOOK

and detected by phases, and to track defects from inspections and tests against the estimates. The model supports analysis of the data with both calculated values and graphs comparing actuals to estimates in terms of current status and trends. In the pilot, 420 defects are collected, which are analyzed and classied on introduction phase, requirement, and phase where they could have been detected. The result data gives an estimate of 21 latent defects in the released product, expected to be found in the rst six months of operation. This estimate was used as one of the criteria in the release decision; it was decided that this would be an acceptable quality level provided that sufcient maintenance

60

THE

NESMA

ANNYVERSARY

support would be available to solve the 21 defects when detected by customers. The 6 months operation period ended in June 2003, and 20 defects were actually found, a difference of 1 defect with the estimate at release.

Activity

Latent estimated defects defects # % 197 194 9 400 20 420 47% 87% 38% 95% 100% Plan Test Only % 91%

Inspection Test in project MDA/FOA Project totals Maintenance Average / Total

420 223 24 420 20

Figure 10: Defect Figures from Pilot Project.

250 200 150 100 50 0 Req

Defects Inserted

Arch

Design

Impl

Figure 11: Defect Inserted per Phase.

Based on the estimated number of latent defects, the project has a defect detection rate of 95%, i.e. 400 of the 420 defects made in the project have been detected before the product is released. If we exclude the phases before test (that used inspections for verication) from the measurement, the detection rate is lower: only 91% of the defects left after inspections were detected in the test phases. This shows that inspection has contributed towards the quality of the released product. However, the average detection rate from inspections is 47%. According to industry data, inspections can detect between 60%-80% of the available defects, so there is room for improvement. Even more important than the data are the benets the project received by using the model. During the project, data feedback and analysis sessions are done

61

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

QUALITY

where corrective actions based on the data are implemented Major conclusions/ actions included: A slip through of requirement defects is detected early in the architecture phase. Investigation showed however that good high-level design, combined with effective architecture inspections, revealed many requirement clarications. Action dened is to monitor requirement defect detection in the design phase for quality risks; it turned out that both the number and impact of the detected defects are limited. No more requirement defects are detected in later phases, nal conclusion is that the requirements after initial clarication reached a high quality. Data from defects inserted/detected, test requirement coverage, and Orthogonal Defect Classication, shows that inspection effectiveness depends on several issues: Good and focused inspectors, qualied moderators, sufcient preparation, and thorough inspection planning. The detailed conclusions on inspections are used to further improve reviews and inspections in future projects. Though it was known that inspections are an effective way of detecting defects (as was to be expected from many earlier studies), our data conrmed this and has lead to more focus from management and buy in for further improving inspections. Data also made clear that test phases discover defects that could have been found in earlier phases. Function Test nds many inspection defects, where System Test discovers a lot of Function Test defects. Based on Trigger analysis
YEARBOOK

with Orthogonal Defect Classication, we determine our test progress. Together with a requirement based test matrix, the project is able to predict where requirements are sufciently veried, and where there are risks of latent defects. Test focus and scope is changed during the project, based on data from the model, and the remaining quality risks are on requirements that are seldom used. The Project Defect Model is benecial to the project. It helps estimating, planning, and tracking quality during the project. This quality data is used in the project together with time and cost data, to take better decisions. Also the model

62

THE

NESMA

ANNYVERSARY

identies quality risks at an early stage, helping the project to take corrective actions and decisions on product release and maintenance capacity planning. The teams using the model gain signicant quantitative insight into their design and test processes, that they will use in future projects. Feedback sessions of defect data analyzed by the team themselves prove to be very powerful. More detailed information about the model and the results from the pilot can be found in [6].

Data from nished and ongoing projects


Based on the results in the pilot project, the management team has decided that all future projects would use the Project Defect Model to estimate and track their quality. Until now (march 2004) there are 7 projects from R&D in Rijen, that have used or are using the model. Also the model is used to do retrospective analysis on some older projects, to get data to derive planning constants for future projects. Below data collected from the projects:

Project detection rates (inspections & test) Proj A Rate Size 95% 1 Proj B 95% 4 Proj C 90% 1 Proj D 59% 1 Proj E 94% 5 Proj F 86% 3 Proj G 89% 1 Proj H 90%
QUALITY

The detection rate calculates which percentages of the defects are found within the project, before delivery to the customer. Size is a relative indication on how big the project (man-hours and lead-time) is. This table shows that on average, 90% of the defects are detected in the project, while the customers detect 10% of the defects. If we exclude project D from the gures (a project that was expected to nd less defects, since it integrated earlier developed and tested components), the average becomes 92%. Industry gures for best in class vary between 90% and 95% (see [7]). We conclude that bigger projects have a better detection rate. This has to do with more extensive test phases, more clearness into the interdependencies and risks between projects and the usage of incremental development. This leads to making fewer defects, and nding defects earlier.

63

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

Phase injection rates Requirements Rate 6% Architecture 21% Design 15% Code 58%

These gures show what kinds of defects are made in projects. We see that most of the defects are coding defects, while architecture and design are the 2nd and 3rd biggest categories. Given the fact that much effort is put in exploring, dening and verifying the architecture, that includes formal inspections on architecture documents, this is an expected and wanted result.

Phase detection rates RequireRate ments 30% Architecture 67% Design 66% Code 40% Function Test 48% System Test 48% Network Test 27% Total 47%

The phase detection rate calculates which percentage of the available defects in a product at the start of a phase is captured in that phase. We see that the architecture and design inspections have a high detection rate, while requirements and code inspections detect fewer defects. For requirements this has to do with early stage that the project is in at the time of inspection, we know from other data that the requirements defects that slip through are mostly detected during architecture and design inspections. Function test and system test both nd 48% of the defects, however in absolute gures function test nd more defects and removes them before system
YEARBOOK

test starts. The average gure of all detection phases is 47%, given industry gures this acceptable but there is room for improvement. The gures above from all projects help us to dene planning constants that are used for future projects, thereby improving our estimation accuracy.

ANNYVERSARY

Feedback sessions
Organizations are increasingly relying on measurements, but many struggle to implement them. There are the usual technical problems associated with collecting data, storing it efciently, and creating usable reports. However, the biggest challenges are often related to using the data to actually make decisions

64

THE

NESMA

and steer the activities of the organization. Miscommunication, incomplete analysis, and corrective actions that seem to come from nowhere create resistance to the whole idea of measurements. Feedback is based on the assumption that you should give the raw data to the people who did the work, and that they should perform the analysis. Why? Because they know the story behind the data. For instance, defect detection rates are discussed with the test team leader, he knows how much and what kind if testing they have been doing, and they expect to nd. With the Project Defect Model we do regular feedback sessions. On average once a week we look at the data, compare it to our estimates, and check where there are differences, trends, or signals in the data that something is going wrong. This is compared with the development status from the design and test teams, based on that we draw conclusions and take the necessary actions. We see that development teams learn a lot from the feedback they received on defects. They see which kinds of defects they discover too late, and use the data to improved the early test and inspection processes. For instance, for defects that slip through many times, checks are add to the design and inspection checklist. Using these checks the defects are found before or at the latest in inspection. One project found analyzing the data that both product knowledge and test skill are the cause that insufcient defects are found in early test. The test team now takes time to study product behavior documentation, and uses test phase increases signicantly, thus reducing the number of defects available in the product when delivered to system test. So during the improvements they use the data from the model to check if they are actually making progress. Also the teams get insight where they make the most defects, using the data they are able to determine root causes and improve quality right at the start. In a project a team found out that a specic part of the project is more complex and difcult to verify. They put in extra time in the investigation of possible solutions, to prevent many small but disturbing defects during design and coding, and reduced the risk that defect would slip through to late testing. An effective feedback process doesnt come easily. In the beginning it will need a lot of attention and perseverance, but once the benets of the effort become clear, which is usually early in the process, people will start to give their
QUALITY

coaches to support newcomers on the team. As a result, the detection rate of the

65

MEASURING

DEFECTS

TO

CONTROL

PRODUCT

support. More information about feedback in measurement systems, including the key success factors & pitfalls, can be found in [8].

Conclusions
The Project Defect Model is benecial for our projects. It helps estimating, planning, and tracking quality during the projects. This quality data is used in the projects together with time and cost data, to improve decision-making. The model identies quality risks at an early stage, helping the projects taking corrective actions and decisions on product release and maintenance capacity planning. Also the design and test teams using the model gain signicant quantitative insight into their design and test processes, that is used in future projects. Future extensions of the model will include effort spend in design and test phases. This will enable trade-off between appraisal cost (pre-release defect detection), rework cost (pre-release defect removal) and operational cost (postrelease defect removal). By extending the model in the future with cost data, it will evolve in a true implementation of a Cost of Quality model.

About the Author


Ben Linders (ben.linders@ericsson.com) is Specialist Operational Development & Quality at Ericsson Telecommunicatie B.V., the Netherlands. He has a Bachelor in Computer Science, and did a Master study on Organizational Change. He works in process- and organizational improvement for more then 15 years, implementing high maturity practices, to improve organizational performance and bring business benets.
YEARBOOK

Since 2000 he leads the Defect Prevention program. He coaches implementation of Root Cause Analysis, Reviews and Inspections, and has dened and applied a Project Defect Model, used for quantitative management of the quality of products and effectiveness of verication. Also he introduces and supports the measurement system, manages continuous improvement, and is an expert and coach in several Organizational Performance & Quality areas. He is a member of several (national and international) SPI and quality related networks, has written several papers, and regularly gives presentations.

66

THE

NESMA

ANNYVERSARY

References
1 A Critique of Software Defect Prediction Models, Norman Fenton and Martin Neil, in IEEE Transaction on Software Engineering, September/October 1999. 2 Software Measurement Programs and Industry Leadership, Capers Jones, in Crosstalk jones.asp. 3 The Goal/Question/Metric Method: A Practical Guide for Quality Improvement of Software development, R.v.Solingen, E.W. Berghout, McGraw-Hill. 4 Managing the software process, Watts Humphrey, Chapter 16, managing Software Quality. 5 Metrics and models in Software Quality Engineering, Stephen H. Kan. Chapter 6, Defect removal effectiveness. 6 Controlling Product Quality During Development with a Defect Model, Ben Linders, in: Proceedings of the 8th European SEPG conference, London, 2003. 7 A Business Case for Software Process Improvement Revised, DACS state of the art report, Measuring Return on Investment from Software Engineering and Management. 8 Make whats counted count, how one company found a way to use measurements to steer the actions of their organization, Ben Linders, in: Better Software magazine, march 2004. February 2001, http://www.stsc.hill.af.mil/CrossTalk/2001/feb/

68

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

COSTS OF APPLICATION MAINTENANCE


MADE MORE TRANSPARENT THROUGH THE USE OF METRICS
RICHARD SWEER

Quality costs money, and money determines quality

T not only includes the development of new software products, but, as well,

the maintenance of its applications. Even more importantly, software development accounts for only a limited part of the costs generated by IT. Nevertheless, software development is, for many organisations, the source of all that is unpleasant, and thus constitutes a critical element of quality management. Many organisations make a strict distinction between, on the one hand, those individuals who provide system development and, on the other, those in charge of application maintenance. As a consequence of the xed price / xed time approach, the developers are not always able, or willing, to devote additional effort to make easily and cheaply maintained systems available to the maintainers. The maintainers, in turn, are not always able to make clear what this additional effort will entail. The aim of the present article is to provide insight into the most important factors inuencing the costs of application maintenance. There is a wealth of literature concerning related topics, such as the quality of software products, quality management and quality systems. However, hardly any study has been done into the relationship between system development and application maintenance from the standpoint of costs. Customers not only expect an acceptable level of prompt service provision for an acceptable price, but are also thinking about what their situation will be in three to ve years. Reducing the costs of application maintenance is the

71

OSTS

OF

APPLICATION

MAINTENANCE

MADE

MORE

TRANSPARENT

THROUGH

THE

USE

OF

METRICS

aspiration of many customers. Not only technological knowledge, but that of IT processes, as well, are the prerequisites for this.

Software Quality Model


It is undeniable that the use of software quality models has yielded substantial improvements within organisations. Most software quality models are nothing more or less than the distillation of practical experience of proven value. Indeed, this is where the greatest strength of such models lies: it is no longer necessary to prove that a given approach works all that is needed is to ensure that the approach is applied with the necessary expertise. Another great advantage of these models is that they attempt to provide clues toward the attainment of an ideal software engineering process. In general, much time and effort is devoted to devising method and technology oriented procedures for software improvement. However, when it comes to software development, people continue to be the prime means of production. The importance of the human factor is usually underestimated. If no attention is paid to, e.g., the culture, knowledge and expertise of the developers involved, or the motivation/style of management, it is difcult to study the effect of a change in the process on the quality of the product. After all: the customer purchases a product, not a process. Info Support has developed a Software Quality Model (SQM) based on both accumulated practical experience and analyses of literature research. With the help of this SQM model, an IT organisation can obtain insight into the most important factors inuencing quality and the resulting maintenance costs for applications (or application maintenance). The models strength lies in its integral
YEARBOOK

approach to a number of (less well-known) factors. These factors are divided into ve different but closely related and continually interacting areas, viz.: staff member, team, process, product and software factory (Figure 12: Software Quality Model.12). In addition to these factors, the context in which software development takes place also has an important effect on software quality. Here, we distinguish between two categories of factor, namely, organisation and environmentdependent factors.

72

THE

NESMA

ANNYVERSARY

combination combination leadership leadership style style team team behaviour behaviour feedback feedback staff member staff member knowledge knowledge & expertise & expertise commitment commitment motivation motivation maturity maturity means means organisational organisational management management validation and validation and verification verification application application management management process process adequate adequate management management methods methods techniques techniques
METRICS

stages of stages of development development

needs and needs and requirements requirements specifications specifications product product interested interested parties parties

Software Quality Model


quality tree quality tree generators generators

infrastructure infrastructure design design patterns patterns

software factory software factory charactecharacteristics ristics maintenance maintenance development development re-use re-use

igure 12, pag. 65


Figure 12: Software Quality Model.

SQM model: context


The context in which software development takes place is divided into two categories: organisation-dependent and environment-dependent factors. The rst category involves the inuence of, e.g., policy, culture and structure, and, as well, the degree to which the experiencing of quality is anchored in an organisation. The second category, which involves context, includes the inuence exerted by, e.g., the customer, user, legislator or branch of trade in question.

SQM model: staff member


Regardless of how much is invested in improving processes, methods and techniques, these investments will yield little if the staff who must work within these improved processes and with these newest of methods, techniques and tools are not capable of adequately functioning within this context or are not willing to. A lack of commitment, motivation, feedback, or poor/neglected training programmes and insufciently mature behaviour are all factors which can lead to non-realisation of ones quality objectives.

73

OSTS

OF

APPLICATION

MAINTENANCE

MADE

MORE

TRANSPARENT

THROUGH

THE

USE

OF

SQM model: team


The ideal objective of software development is having the best team produce the best software. Many of the problems of software development can be blamed on poorly functioning teams, which is why the importance of teamwork and creating and putting together teams is generally viewed as one of the prime factors inuencing quality. It is important to bear in mind in this connection that the life cycle of a team is necessarily coupled to the various phases in the life cycle of the project in question. Team behaviour (or team maturity) can raise the quality of the software process to a higher level. Such behaviour is characterised by such qualities as pro-activeness, and being focused on solutions, etc. Management often tries to motivate staff to change by creating for them a positive, attractive and challenging work context and making available to them the means they require in order to deliver good software. Having an optimal working environment, with the most modern of tools and equipment will contribute greatly to motivating software developers to produce good work, so that economising on such things will not have a positive effect on the software produced, and this becomes all the more clear, once one considers that such expenditures often constitute only a small fraction of ones total project costs.

SQM model: process


Paying attention only to development, as is traditionally customary in project management, is clearly no longer sufcient. Project management must extend over the entire software life cycle, i.e., over development, marketing, use and maintenance. Thus, project management becomes product management.
YEARBOOK

Multiproduct management, the management of an entire range of products, goes one step further. Here, re-use plays a crucial role. In the transition from development management to product management, two organisational aspects play crucial roles. The rst is directed toward distinguishing a discrete function within the organisation, responsible for the evolution of a software product through its entire life cycle, i.e.: the product manager. The second aspect has to do with the organisation of this function. Typically, when examining the requirements to be placed on information provision, while taking account of the needs, wishes and requirements of users and the requirements stemming from a given business process or set of business

74

THE

NESMA

ANNYVERSARY

operations, the organisation forms ones sole starting point. However, application maintenance, as it relates to information provision, should not, in fact, be carried out solely from the standpoint of the existing organisation, but as well, from the standpoint of marketing and the effects and possibilities of new technologies upon and for the organisation. This two-sided orientation is a feature of adequate application maintenance, and is essential to modern information provision. It can be stated that, in this connection, validation and verication are indispensable parts of supplying software with reasonable to good demonstrable validation and verication activities, e.g., inspections, walk-throughs and tests.
METRICS

quality. For this reason, it is necessary always to combine different types of

SQM model: software factory


As a result of the immense increase in the scope and complexity of software, development teams continue to increase in size, such that communication within teams and coordination between them constantly increase in difculty. In order to be able to deal with the phenomenon of greater and more complex software, several development organisations have sought solutions, one of which is the software factory. An organisation which manages development, maintenance and re-use, is referred to as a software factory: a noble aspiration, whose key word is re-use. Important basic elements within the software factory are infrastructure, model supporting generators and templates for design and analysis.

SQM model: product


Quality management should begin at the earliest possible stage, i.e., from the rst phase of development (planning). This helps prevent delays at later stages. It is important to formulate good specications, i.e., ones that are unambiguous, complete, veriable, consistent, alterable, traceable and usable. Various writers have, in recent decades, striven toward an optimal description of software (product) quality. The similarities and differences between these different versions have prompted an intensied call for one universal and operational set of terms for software quality. Only standardisation would seem to be able to meet this need.

75

OSTS

OF

APPLICATION

MAINTENANCE

MADE

MORE

TRANSPARENT

THROUGH

THE

USE

OF

Within the ISO 9126 standard, a distinction is made between internal software quality, external software quality and software quality-in-use. In particular with the recently developed term quality-in-use, an attempt is being made to direct ones product more clearly toward the customers daily operations. If one distinguishes between quality needs, quality requirements and product characteristics, it is possible to distinguish two different possible paths. The rst path entails the execution of activities directed toward specifying quality requirements (= the specication path). The second path entails the execution of activities directed toward realising quality requirements (= the realisation path).

The measuring of performance indicators for application maintenance


Several factors play a role in assessing software quality and the application maintenance costs resulting from it. The 25 most important factors have been identied in the SQM model. Experience teaches that many of these factors are not or hardly quantiable, such that, unavoidably, some performance indicators in the model are quite subjective in nature. This makes it very important to dene such indicators as clearly and unambiguously as possible in order to be able to approach the ideal situation (objectivity and explicitness) as closely as possible. An aid in determining the right set of performance indicators is the use of the quality score matrix in which a given performance indicator can be scored by means of the quality criteria selected. NESMA has developed a number of matrixes which can be used to measure application maintenance performance indicators. These matrixes provide insight
YEARBOOK

into: the size of a system, its manner of documentation, how the system was designed, how the system has been built in terms of technology, the systems ancestry and what its recent maintenance has entailed. Aside from NESMA, ISO has also developed a model who aim is to provide points of reference in determining and measuring software quality. An updated version of the ISO 9126 model is now available. It distinguishes between the internal and external quality (ISO 9126-3 and ISO 9126-2) of software with accompanying metrics. Another term used in the new ISO 9126 model is qualityin-use (ISO 9126-4).

76

THE

NESMA

ANNYVERSARY

The metrics of NESMA and ISO 9126 are different in respect of a number of points. ISOs metrics are highly process oriented, whereas those of NESMA are extremely product directed. Both institutes indicate that the user can augment or alter the metrics as he sees t. Both the NESMA and ISO models include, in addition to metrics for application maintenance, ones for other purposes, as well. Both the NESMA metrics and those of ISO 9126 can be used during the development process. There are, however, a clear differences in the applicability of the different metrics. E.g., the ISO 9126-2 metrics are better for use on software programming, while the ISO 9126-3 metrics are better suited for use on Aside from the applicability of these metrics, it is also important to look at how work-intensive their use is. Collecting, registering and maintaining the data per metric can vary sharply in terms of time and means used. In daily practice, the use of metrics is still quite limited. For the development and use of metrics, the connection to practical experience is indispensable. As a result of time limitations and limited IT budgets, there are actually few possibilities for applying software metrics in daily operations. Both customers and IT suppliers fail to devote sufcient attention to performance measurement when assessing the costs of application maintenance. However, this applies not only to maintenance costs, but, as well, to IT performance measurements in general. Experience teaches that ones choice of software supplier is often determined solely by the price which the customer pays for the rst delivery. This can have unforeseen consequences, both with regard to the quality of the application in question and the resulting (long-term) maintenance costs.
METRICS

products created during the phases: requirements, analysis and design.

About the author


Richard Sweer (richards@infosupport.com) leads as manager of the Business Unit Finance about forty to fty ICT-professionals that perform (turn-key) projects for customers of Info Support. Netherlands. Besides his work as a business team manager Richard is responsible for the development and implementation of a Professional Development Center for Info At this moment the Business Unit Finance is active in about fteen projects for more than ten customers in The

77

OSTS

OF

APPLICATION

MAINTENANCE

MADE

MORE

TRANSPARENT

THROUGH

THE

USE

OF

Support. Within this unit a development-factory is being implemented for three platforms: J2EE, .NET and Open Source. This development-factory supports customers of all business units wih the development of software applications.

78

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

SOFTWARE RELEASING: DO THE NUMBERS MATTER?


HANS SASSENBURG

he software industry is growing exponentially. Due to its enormous impact

on todays society, the software industry has become critical. However, the adhoc and immature way of working is leading to an increasing number of reported serious problems. Software products are released without knowing their exact behaviour and without knowing the expected operational cost. In this article a control system for (software) product development is dened, used as a reference to conduct a series of case studies in industry. These studies revealed serious deciencies with respect to evaluating innovation proposals, dening project scope, designing products, and implementing them. As a consequence, release decisions are characterized by a lack of quantitative information (e.g. nancial consequences cannot be predicted). An economic model as often used in the semiconductor industry is software development and enables an organisation to build, implement, monitor and evaluate a business case. As such, it also enables software release decisionmaking from a nancial point of view. Effective application of the model requires having an understanding of the expected product lifetime, the revenue, the development cost and operational cost, and the resulting prot. An important factor inuencing these parameters is the type of relationship between the software manufacturer and the buyer/user of the software. This relationship determines the product development strategy used as input to business cases. It is concluded that both software manufacturers and the users of software products could benet greatly from applying the model given the relative immaturity of the software industry in comparison to other engineering disciplines.
MATTER

79

SOFTWARE

RELEASING

: DO

THE

NUMBERS

introduced in this paper. This model enhances a long-term perspective on

On the other hand, there will remain practical limitations with respect to a purely nancial approach. Information will always be imperfect and incomplete and decision-makers are inevitably confronted with cognitive limitations that cannot be ignored.

Introduction and Outline


The amount and variety of software applications is growing exponentially. As a consequence, the impact of software applications on society is increasing rapidly. As software spreads from computers to the engines of automobiles to robots in factories to X-ray machines in hospitals, defects are no longer a problem to be managed. They have to be predicted and excised. Otherwise, unanticipated uses will lead to unintended consequences. However, ongoing research continues to reveal that the development process of most IT-suppliers is characterized as being immature. This immaturity in the software engineering discipline surfaces when new software products are developed or existing products are maintained. In 1998 roughly 28% of all software projects in the United States were stopped prematurely [1], due to either changed economic conditions or, in most cases, project failures. This does not mean that projects that do release a software product are necessarily successful. These projects do release a product to their customers, but may be confronted with considerable budget overruns, schedule delays and poor quality. Many software manufacturers have a short-term horizon that focuses on controlling the cost and schedule of the current product release, often neglecting other aspects of the software lifecycle [2]. In that case, the focus is on controlling the
YEARBOOK

cost and schedule of the current product release. This potentially leads to sub optimisation instead of a strategic long-term approach and as a consequence the premature release of software products. This leaves the manufacturer exposed to the following risks: Unpredictable product behaviour. It is very difcult to guarantee to the user(s) what the exact functionality of the product will be. This may lead to user dissatisfaction and to unforeseen, even potentially dangerous situations which may put people lives may be at risk.

80

THE

NESMA

ANNYVERSARY

Unknown operational cost. The post-release or operational cost of the software products may be unexpectedly high. For example the exact status of the software product and its documentation may be unknown leading to high corrective maintenance costs. In addition, adaptive and perfective maintenance activities may be severely hampered. Over the last decades, an increasing number of serious problems that illustrate these risks have been reported. Leveson has published a collection of well-researched problems along with brief descriptions of industry-specic approaches to safety [3]. Safety problems are described in the elds of medical devices, aerospace, the chemical industry and nuclear power. Other descriptions of the consequences of software failures can be found in [4] and [5]. In this article, the results of seven conducted case studies are presented revealing current industry practices with respect to software development. The focus of these studies was to determine to what extent software release decisions are based on a nancial analysis. These case studies were conducted using the control system described in section 1 as a reference. The results of the case studies are described in section 2. Based on the case study results, an economic model is presented in section 3 whose purpose is to enable an organisation to build, implement, monitor and evaluate a business case throughout the lifecycle of a software product. The model will provide nancial gures which can be used as input to the release decision process. In section 4, some implementation factors are described focussing on the characteristics of a software manufacturer and the resulting product development strategy. In section 5 conclusions are drawn, answering the question whether software releasing should be based on nancial gures or not. Some limitations with respect to a purely quantitative approach are addressed as well.
MATTER

Control System
De Leeuw described a general approach to the effective control of a target system [6]. He represents a control situation by a controlling organ, a target system and an environment. The controlling organ exerts goal-directed inuence on a target system, while the environment affects both the controlling organ and

81

OFTWARE

RELEASING

: DO

THE

NUMBERS

the target system. Hollander adapted the control system to the controlling power of business development teams [7] in the following way: The environment is based on Porters ve forces model, being the company and its competitors, the customers or buyers of the product, the suppliers, the substitutes for the product and new potential entrants from other markets [8]. The controlling system consists of the project management function. The target system is the business development project. It will now be described how this control system could be practically implemented for software product development, using a business case as the underlying rationale and monitoring instrument for a project.

Business Strategy
Senior Management at a strategic level denes a business strategy, which describes the long-term expectations of business and technology developments. Business developments are addressed in terms of changes in the marketplace and organisation. Technology developments are addressed in terms of adoption of new technologies and new application of existing technologies. The business strategy is the input for Product Management (or the department responsible for information planning) at a tactical level to derive business cases. It is assumed here, that in general the denition of a business case and its further implementation at operational level afterwards are executed in ve sequential steps. They will be further described. For each step examples will be given of possible methods which can be used to support a quantitative
YEARBOOK

approach.

Step 1: Investment proposal


A business case is used to dene the rationale for a project that is initiated to develop a product (either a new product or a newer version of an existing product) [9]. It is in fact a proposal to start investing in a project denition. It describes the expected revenue for the vendor organisation taking into account the expected development or pre-release cost (to develop the product) and operational or post-release cost (to produce, deploy and maintain the product).

82

THE

NESMA

ANNYVERSARY

The business case denes in high-level terms the external product needs and constraints as input to a project at operational level. The external product needs describe the required functionality seen from the perspective of the customer(s). Distinction can be made into functional needs and non-functional needs. The functional needs describe the functionality that must be offered by the product. The non-functional needs dene product properties and put the constraints upon the functional needs (e.g. reliability, safety and accuracy). These are often referred to as quality attributes. In the non-functional needs, the compliance to external standards is an additional requirement. Constraints determine the boundaries of a project and may, for example, be limitations with respect to budget and lead-time of the project and cost price of the nal product. Calculation methods such as the traditional discounted cash ow [10] and the newer real option approach [11] may be used here to build the case.

Step 2: Project denition


Internal stakeholders dene internal product needs and constraints. The internal product needs are also expressed in functional and non-functional needs. Functional needs describe for instance the documentation that is needed to produce, deploy and maintain the resulting product. Non-functional product needs describe for instance the compliance to internal standards. The combination of the external product needs and constraints and the internal product needs and constraints are the inputs to the project. They are can be dened, that meet the formulated needs and constraints. The project alternative that most satises them will be selected. At this stage, the release criteria can be dened. They are the particular criteria of a project and its resulting products that are taken into account to make the decision whether or not to release the product. Project estimation methods like COCOMO II [12] and SLIM Estimate [13] may be used here to make the optimal trade-off between function needs, non-functional needs, lead-time and cost. Different project alternatives may be evaluated with multiple stakeholders using the Win-Win Negotiation Model [14]. The Project Denition step may lead to changes in the business case as better insights are gained.
MATTER

further analysed and detailed to the level where one or more project alternatives

83

OFTWARE

RELEASING

: DO

THE

NUMBERS

Step 3: Product design


After the project has been dened and accepted, the project starts. Further analysis of all needs and constraints will lead to the formulation of different product design alternatives. The design alternative that most satises the business case will be selected. Supporting methods here are for instance ATAM" [15], SAAM [16] and CBAM [17]. After the product design has been selected the release criteria are deployed to lower-level process and product attributes. Suppose that lead-time and budget are constraints and thus release criteria. They will put constraints on each component as dened in the product design. If for example, reliability and maintainability are part of the non-functional needs, they will have to be deployed in some way to the dened components in the product design. It may not always be possible to conduct a simple mathematical breakdown of a non-functional need. In that case implementation rules may be dened that will implicitly contribute in meeting the nonfunctional need at product level. Parnas for instance describes how a high level of extension or maintainability can be obtained through design rules [18]. This step may again lead to additional changes in the business case. During further implementation of the product the project must stay aligned with the business case. The status of the project is obtained by evaluating the dened and deployed release criteria. Currently measured values and predictions of nal values form the pre-release data. A steering committee may be in place to discuss the pre-release data, combined with any new insights. For instance, the business case may have been changed due to market developments or the
YEARBOOK

service department may come up with additional product needs.

Step 4: Product release


The continuous alignment of the status of the project with the status of the external product needs and constraints and the internal product needs and constraints will nally lead to a situation where the release decision can be made. Release alternatives to be considered are: Release now. Release later after the successful implementation of some corrective actions. Do not release the product and stop the project.

84

THE

NESMA

ANNYVERSARY

To answer this question, so-called software defect prediction or reliability models have been developed. The usefulness of these models can be questioned. Most models assume a way of working which does often not reect reality. As a result, several models can produce dramatically different results for the same data set [19, 20]. Sometimes, no parametric prediction model can produce reasonably accurate results. Because no two models provide exactly the same answers, care must be taken to select the most appropriate model for a project and not too much weight must be given to the value of the results [21].

Step 5: Investment and project evaluation


After the product has been released, assuming that the project is not stopped, data is needed to determine the result of the business case. A distinction is made between end-user data (for instance the revenues of the product and the buyer/ user satisfaction level) and post-release data (for instance the cost of corrective maintenance). Evaluation of these data might result in changes to the business strategy and future business cases, as well as removal of organisational process deciencies (root-cause analysis). In Figure 13, the resulting overview is presented.

business case results

Project Steering Committee pre-release data

end-user data post-release data product needs and constraints Production Deployment Maintenance

Figure 13: Control system for software product development.

85

OFTWARE

RELEASING

Software Development Team

internal release

external release

: DO

End-user(s)

THE

NUMBERS

business strategy

MATTER

Senior Management

Case study results


The control system describing a business-case driven approach to software product development was used to conduct case studies in seven large organisations developing software products both for internal use and for external markets. These case studies revealed the following ndings [22]: Alignment between business case and project. In all cases but one a business case was used as the rationale for a project, stating both the expected cost and benets.2 During the project however, in most cases the Project Steering Committee and the Software Development Team failed to inform each other explicitly about the current status of the business case (new insights) and the current status of the project (progress so far and estimates to completion). Comparison and evaluation of alternatives. This happened in most cases implicitly. However at crucial decision moments (dening the project scope, selecting the product design) no evidence was found why one alternative was selected above the other, using criteria derived from the business case. Available methods and techniques for comparison and evaluation (like software estimation methods and architecture evaluation methods) were in most cases known but not used. Estimation of operational cost. In all cases, reliability and maintainability were considered to be important non-functional product needs as they determine to a great extent the operational cost after product release. High reliability reduces corrective maintenance effort and high maintainability reduces both corrective maintenance effort and adaptive/perfective maintenance effort. In nearly all cases, these non-functional needs were not deployed to lower level components as identied in the selected product
YEARBOOK

design or software architecture. It was only during testing that much effort was spent on trying to meet a high level of reliability. No cases have been found where the level of maintainability was evaluated. In all cases reliability and maintainability could not be expressed in nancial terms. Evaluation of business case and project. After the nal product release, there were no specic actions undertaken to evaluate the result of the business case as a whole and the results of the implemented decisions at crucial moments during development (dening the project scope, selecting the prod2. In one case it was found impossible to allot benets to a specic product release, as the clients of the product pay an annual fee for a larger set of products or services.

86

THE

NESMA

ANNYVERSARY

uct design, releasing the product). Only in one situation a plan was available to evaluate the business case at predened moments after product release by the chairman of the Project Steering Committee, who was assigned the responsibility for the investments made. In all cases, there was no dened process in place to analyse the defects found after product release and to use the results to remove process deciencies in product development. In Figure 14, the results are illustrated in the control system.
Senior Management

Evaluation of business case and project

business case results

business strategy

Project Steering Committee pre-release data

end-user data post-release data product needs and constraints Production Deployment Maintenance

End-user(s) external release

Alignment between business case and project

Software Development Team

internal release

Estimation of operational costs Comparison and evaluation of alternatives

Figure 14: Control system for software product development.

An Economic Model
The challenge facing organizations today is to focus continuously on the realization of prot maximization. The software industry is no different in this respect. This can only be accomplished by managing software development and releasing software products from an economic perspective, using a business case approach throughout the subsequent phases of innovation proposal, project denition, product design, product release and post release evaluation.

87

OFTWARE

RELEASING

: DO

THE

NUMBERS

MATTER

Time

Time

baseline model

delayed entry limited competition

Time

Time

rushed entry poor reliability


Figure 15: Examples of prot models [23].

delayed entry heavy competition

If the exact relationships between the four main development parameters (functional needs, non-functional needs, schedule and development cost) and additional parameters (revenue, operational cost) were known, it would enable a software manufacturer to apply continuously trade-off rules among these parameters. For different combinations, the resulting prot functions could be calculated and compared. In Figure 15: Examples of prot models [23].15 some examples of prot models are given when a software manufacturer is faced with a release decision.
YEARBOOK

When, for instance, the entry of a new product is delayed in a market with heavy competition, the probability of the manufacturer capturing the advantages of early adopters will decrease with negative impact on revenue, thus prot In this section a generic economic model is used as the basis for a business case. The objective of this model is to illustrate how business decisions during the software lifecycle affect prot. For this purpose a simple product lifecycle which is frequently used in the semiconductor industry was employed [24]. The product lifecycle as illustrated in Figure 16 is approximated by a triangle. It is assumed that market ramp up and market decline have the same rate and duration. 3

88

THE

NESMA

ANNYVERSARY

Revenue

Introduction

Saturation

Maturity

Growth

Decline

Time
Figure 16: Approximation of product lifecycle to a triangle [24].

This model will be used to make a comparison between delivering a software product on-time and delivering it with a delay. It is a typical issue a software manufacturer is confronted with during the development of a product prior to the release decision.

On-time Entry
In this section, a generic economic model is presented as the basis for a
MATTER

business case whose objective is to determine prots. Three models are dened, using the following assumptions: 4

3. Extended models have been described as well. Others describe for instance a model distinguishing a period maturation in which no market growth occurs [25, 26]. At this stage, the simplied triangle model will only be used as an example to illustrate the effect of lost revenue due to a delayed market entry. Factors inuencing the shapes of the product lifecycle curves, revenue curve and cost curves are not addressed in this paper. 4. These assumptions are used to dene simplied functions for the case that a product is developed for a new market. The objective here is to demonstrate how a prot level can be calculated. In reality, there will be many drivers that determine the exact shape of the functions. The development cost function will for instance heavily depend on cost drivers like the level of reuse, experience and the maturity of the organisation. Further, a software manufacturer might dene a strategy with multiple releases, where the rst one is used to capture the market and further ones are aimed at adding functionality and improving quality.

89

OFTWARE

RELEASING

: DO

THE

NUMBERS

Revenue model (Figure 17): Product lifetime is equal to 2.W with peak P at Tr + W. Time of market entry denes a triangle, representing market penetration. Triangle area equals total revenue. Development cost model (Figure 18): Product development time is equal to Tr with peak Cd at Tr/2. Start of project at T=0 denes a triangle, representing development cost distribution. Triangle area equals total development cost. Operational cost model (Figure 18): Peak Co at Tr + W. Time of market entry denes a triangle, representing operational cost distribution. Triangle area equals total operational cost. This leads to the following equations: Revenue Development cost Operational cost = = =
1/ 2 1/ 2 1 /2

. 2W . P(1) . Tr . Cd(2) . 2W . Co(3)

Revenue

Peak revenue

YEARBOOK

Market rise

Market fall

ANNYVERSARY

Tr

Tr + W

Tr + 2W

Time

NESMA

Figure 17: Revenue model (on-time entry).

Figure 17 pag 82

90

THE

Cost Peak development cost

Cd Co

Peak operational cost

Tr

Tr + W

Tr + 2W

Time

Figure 18: Development cost and Operational cost model (on-time entry).

Combining the Revenue model, Development cost model and Operational cost model, the resulting prot can be calculated: Prot =
1/ 2

. 2W . P - 1/2 . Tr . Cd - 1/2 . 2W . (4)

The resulting breakeven point and prot level are given in Figure 19.
Revenue, Cost

Break Even

Profit

Revenue

Tr

Tr + W

Tr + 2W

Time

e 19 pag 83

91

OFTWARE

Figure 19: Prot model (on-time entry).

RELEASING

: DO

Total Cost

THE

NUMBERS

MATTER

Delayed Entry
In this section the model for the product lifecycle will be used to calculate the prot in case of delayed delivery of a product. Three models are dened, using the following assumptions: 1 Revenue model (Figure 20: Revenue model (delayed market entry).20): Product life time is equal to 2.W, product is released at Tr + D, with peak P at Tr + W, with P = ((W-D)/W) . P. Time of market entry denes a triangle, representing market penetration. Triangle area equals total revenue. 2 Development cost model (Figure 21: Development cost and Operational cost model (delayed market entry).21): Product development time is equal to Tr + D with peak Cd at (Tr+ D)/2, with Cd = ((Tr + D)/Tr) . Cd. Start of project at T=0 denes a triangle, representing development cost distribution. Triangle area equals total development cost. 3 Operational cost model (Figure 21): Peak Co at Tr + W, with Co = ((W - D)/W) . Co. Time of market entry denes a triangle, representing operational cost distribution. Triangle area equals total operational cost. This leads to the following equations:
YEARBOOK

Revenue Development cost Operational cost

= 1/2 . (W-D+W) . ((W - D)/W) . P(5) = 1/2 . (Tr + D) . ((Tr + D)/Tr) . Cd (6) = 1/2 . (W-D+W) . ((W D/W)) . Co (7)

92

THE

NESMA

ANNYVERSARY

Revenue

Peak revenue

P' Market rise Market fall

Tr Tr + D

Tr + W

T r + 2W

Time

Figure 20: Revenue model (delayed market entry).

uur 5 8
Cost
Cd' Co'

Tr

Tr + D

Tr + W

T r + 2W

Time
MATTER

Figure 21: Development cost and Operational cost model (delayed market entry).

cost model, the resulting prot can be calculated: Prot =


1/ 2 -1

. (W-D+W) . ((W-D)/W) . P

/2 . (Tr + D) . ((Tr + D)/Tr) . Cd

The resulting breakeven point and prot level are given in Figure 22.

93

OFTWARE

RELEASING

- 1/2 . (W-D+W) . ((W-D)/W) . Co(8)

: DO

THE

NUMBERS

Combining the Revenue model, Development cost model and Operational

Break Even

Revenue, Cost

Profit

Total Cost Revenue

Tr

Tr + D

Tr + W

T r + 2W

Time

Figure 22: Prot model (delayed market entry).

In Figure 23 an example is presented what the relative consequences of a delayed market entry can be on the prot level5.

Tr= 50 weeks W = 50 weeks Development Cost Operational Cost Prot (P = 8, Cd = 5, Co = 5)

D = 0 wk -

D = 2.5 wk -7% 10% -7% -25%

D = 5 wk -14% 21% -14% -50%

D = 7.5 wk -21% 32% -21% -75%

D = 10 wk -28% 44% -28% -100%

Figure 23: Example of consequences for delayed market entry.

YEARBOOK

The objective of the presented model is not only to support release decisionmaking. It is also meant to support the business case denition process (innovation phase) and the comparison and evaluation of alternatives (project denition phase, product design phase). Further, a proper evaluation of a business case after having released a software product should include a nancial evaluation of the prot made (revenue versus cost). In other words, the objective

NESMA

ANNYVERSARY

5. Implementation of this model could in practice be supported with a sensitivity analysis to see which parameters affect the outcome more strongly than others. Further, condence limits could be considered given the uncertainties on all quantities.

94

THE

of applying the model is to support the elimination of the negative consequences of the case study ndings.

Application of the Model


Software manufacturers must have an understanding of the expected product lifetime, the revenue curve, the development cost and operational cost curve, and the resulting prot. This information, part of the business case, is not only gathered during the investment proposal phase. It must be continuously updated with respect to new market insights and the actual project status. This information will be specic to the external and internal characteristics of an organization. An important factor inuencing the shapes of the revenue curve and cost curves (and thus prot function) is the relationship between the software manufacturer and the buyer/user of the software. In Figure 24: Characteristics of software manufacturer types [26].24 some typical characteristics are given for different software manufacturer types.

Software manufacturer type Custom systems written on contract

Typical characteristics

Software made for one particular buyer Budget and schedule xed Penalties for late delivery

Custom systems written in-house

Software used to improve eforganisation Limited number of end-users Annual budget divided amongst different projects Possible users conicting interests between IT-department and endMATTER

ciency/effectiveness of internal

Commercial software (business-to-business)

Software sold to other businesses Many different buyers Critical to the buyers business

95

OFTWARE

RELEASING

: DO

THE

NUMBERS

Software manufacturer type

Typical characteristics

Mass-market software (business-to-consumer)

Software sold to individual buyers High volume buyers Market windows and buying seasons

Commercial/mass-market rmware

Cost of distributing xes very high (physical items) Many to high volume buyers Failures can have fatal consequences

Figure 24: Characteristics of software manufacturer types [26].

The characteristics of the relationship between a software manufacturer and its potential buyers/users are input to the determination of a product development strategy. An important aspect here is also the possibly prescribed compliance to standards. In several markets standards have been dened to ensure the safety of products at a cost to the manufacturer, examples being the defence industry, aerospace and medical devices. These standards will have an effect on the way software is produced and released, thus on the development cost curve. Further, if warranty or liability conditions apply both the development cost curve (increased need for higher reliability level) and the operational cost curve (higher penalty for software failures) might be inuenced. Knowing the relationship between a software manufacturer and the potential
YEARBOOK

buyers/users of the software, a product development strategy can be dened, providing the framework to orient a software manufacturers development projects as well as its development process. As a starting point to develop a product development strategy, the software manufacturer must determine its primary strategic orientation. A software manufacturer must recognize that it cannot be all things to all people and that it must focus on what will distinguish it in the marketplace. Some possible product development strategic orientations are: First Mover. This involves an orientation to getting a product to market fastest. This is typical of software manufacturers involved with rapidly changing

96

THE

NESMA

ANNYVERSARY

technology or products with rapidly changing fashion (small market window). Pursuit of this strategy typically leads to tradeoffs in optimising functional product needs, development cost and non-functional product needs. Lowest Development Cost. This orientation is focused on minimising development cost or developing products within a constrained budget. It occurs for instance when software manufacturers are developing under contract for other parties, where a company has severely constrained nancial resources. It involves tradeoffs between functional product needs, time-to-market and non-functional product needs. Unique Functional Product Needs. This orientation focuses on the highest level of product features (including aspects like the latest technology and/or product innovation). It involves a tradeoff between time-to-market, development cost and non-functional product needs. Highest Non-Functional Product Needs. This orientation focuses on assuring high levels of product quality (reliability, safety, etc.). This orientation is typical of industries requiring high quality because of the signicant costs incurred in xing post-release defects (e.g. instance recalls in a mass market), the need for high levels of reliability (e.g. aerospace industry), or where there are signicant safety issues (e.g. medical devices). It corresponds to the orientation of minimising operational cost. It involves a tradeoff between functional product needs, time-to-market and development cost. How does the selected product development strategy inuence the economic model? Theoretically not, as it is assumed that each software manufacturer will strive for maximized prot. The product strategy chosen inuences however the shapes of the revenue and cost curves and as a result the potential prot level. Card suggests that the number of potential buyers and competition level together determine the kind of strategy that makes the most prot in the long run [27]. See Figure 25: Model of Software Markets [27].25.
MATTER

97

OFTWARE

RELEASING

: DO

THE

NUMBERS

Few buyers Many buyers

Competition Level Low Unique Functional Product Needs First Mover

High Lowest Development Cost Highest Non-Functional Product Needs (Lowest Operational Cost)

Figure 25: Model of Software Markets [27].

It is assumed here however, that the strategy to be chosen also depends on the companys capabilities (strengths, weaknesses and core competences), market needs and opportunities, goals, and nancial resources. There is no one right strategy for a software manufacturer, but it is considered important that a strategy is chosen as input to the business case denition during the investment proposal phase.

Conclusions
Do the numbers really matter? Yes, there is no doubt that they matter. Software manufacturers will only invest in new software products as a means of prot maximization. This is true both for manufacturers selling their products to an external market and for manufacturers investing in information technology to support their internal processes. The case studies revealed however that although business cases were often used as the rationale for investments, decision-making in subsequent phases is characterized by a lack of quantitative information. This is especially true of
YEARBOOK

software release decisions which lack a clear expectation of operational cost. It is more the exception than common practice that software release decisions are heavily inuenced by nancial considerations. How can this be explained? In the rst place, Etioni and Amitai argue that it is impossible to perform a precise analysis necessary to maximize economic objectives, because limitations with respect to the information will normally exist. Information is incomplete and imperfect [28]. Decision-makers will make a trade-off between the amount of information (perfection, completeness) and the cost related to searching for additional information. Beyond a certain point, obtaining additional information would lead to diminishing returns. Secondly, it

98

THE

NESMA

ANNYVERSARY

is very difcult if not impossible for decision-makers to escape the diverse psychological forces that inuence their individual behaviour. These forces lead to cognitive limitations. A decision-maker simplies reality, leaves out information and prefers simple rules of thumb as a consequence of limited cognitive capabilities. Although these limitations cannot be ignored, they are not excuses to avoid a more nancial approach to software development in general and software release process in particular. Both software manufacturers and the users of software products could highly benet from applying sound economic principles. It offers the possibility to select only those projects, which offer increasing business benets and it might support avoiding the release of software products that impose an unacceptably high risk on both the user(s) of the product and its manufacturer.

About the author


Hans Sassenburg (hsassenburg@se-cure.ch) received a Master of Science degree in Electrical Engineering from the Eindhoven University of Technology in 1986 (The Netherlands). He worked as an independent consultant till 1996, when he co-founded a consulting and training rm. From 1996 till 2001 he also worked as a guest lecturer and assistant professor at the Eindhoven University of Technology. Having sold his company, he moved in 2001 to Switzerland where he founded eld of applied business/software metrics. In 2002, he started in parallel with his consulting activities a PhD at the Faculty of Economics at the University of Groningen (The Netherlands). Objective of this research is to design a decisionmaking software release model for strategic software applications.
MATTER

a new consulting rm (SE-CURE AG, www.se-cure.ch), offering services in the

2 Full life-cycle management and the IT Management paradox, E.W.Berghout, M.Nijland, in D.Remeny & A.Brown (Eds.): Make or Break Issues in IT Management, Butterworth-Heinemann. 3 Safeware: System Safety and Computers, N.G.Leveson, Addision-Wesley.

99

OFTWARE

RELEASING

1 Chaos Report, Standish Group Report.

: DO

References

THE

NUMBERS

4 Software Runaways: Monumental Software Disasters, R.L.Glass, PrenticeHall. 5 Collection of Software Bugs, Prof. Thomas Huckle, TU Mnchen, www.zenger.informatik.tu-muenchen.de/persons/huckle/bugse.html. 6 Besturen van Veranderingsprocessen: Fundamenteel en Praktijkgericht Management van Organisatieveranderingen, A.C.J. de Leeuw, Van Gorcum, Assen, pp. 69-74 (in Dutch). 7 Improving Performance in Business Development, J.Hollander, doctoral dissertation, University of Groningen, The Netherlands. 8 Competitive Strategy, M.E.Porter, New York: Free Press. 9 Making the Software Business Case, D.Reifer, Addison-Wesley. 10 Comparative evaluation of software development strategies based on Net Present Value, H.Erdogmus, First International Workshop on Economicsdriven Software Engineering Research, Toronto (Canada). 11 Software Design as an Investment Activity: A Real Options Perspective, K.J.Sullivan et al., Real Options and Business Strategy: Applications to Decision Making, L.Trigeorgis, ed., (London, England: Risk Books), pp. 215-261. 12 Software Cost Estimation with COCOMO II, B.W.Boehm et al., Prentice-Hall. 13 Measures for Excellence: Reliable Software On Time Within Budget, L.H.Putnam, W.Myers, Yourdon Press Computing Series. 14 A Requirements Negotiation Model Based on Multi-Criteria Analysis, H.In et al, International Symposium on Requirements Engineering, Toronto (Canada). 15 The Architectural Trade-off Analysis Method, R.Kazman et al., Software Engineering Institute, CMU/SEI-98-TR-008. 16 Using SAAM: An Experience Report M.DeSimone, R.Kazman, Proceedings of
YEARBOOK

CASCOM '95, Toronto (Canada), pp.251-261. 17 A Foundation for the Economic Analysis of Software Architectures, J.Asundi, R.Kazman, Third International Workshop on Economics-driven Software Engineering Research, Toronto (Canada). 18 Designing Software for Ease of Extension and Contraction, D.L.Parnas, IEEE Transactions on Software Engineering, March, pp. 128-137. 19 A Critique of Software Defect Prediction Research, N.Fenton and M.Neil, IEEE Transactions on Software Engineering, Vol. 25, No. 5.

100

THE

NESMA

ANNYVERSARY

20 Important Milestones in Software Reliability Modeling, S.S.Gokhale et al., Communications in Reliability, Maintainability and Serviceability, SAE International. 21 Hardware and Software Reliability: Application and Improvement of Software Reliability Models, D.Wallace, C.Coleman, Software Assurance Technology Center, Report 323-08, NASA. 22 When can the software be released?, J.A.Sassenburg, 2003, Proceedings of the European SEPG, London (UK). 23 Technology Marketing, M.Sawhney, Lecture Section 8: Managing New Offering Realization, Kellogg Graduate School of Management, Northwestern University (USA). 24 Detailed Model shows FPGAs True Costs, J.Liu, 1995, EDN, pp. 153-158. 25 Economic and Productivity Considerations in ASIC Test and Design-for-Test, M.Levitt, Digest for Papers: Compcon 1992, pp. 440-445. 26 The economic models for a VLSI test strategy planning system, C.W.Wu, C.C.Wei, Proceedings SEMICON Taiwan 1997, Test Seminar, pp. 87-92. 27 They Dont Care About Quality, K.Iberle, Proceedings of the Software Testing and Analysis & Review (STAR) East Conference, 2003. 28 Is Timing Really Everything?, D.N.Card, Guest Editors Introduction, IEEE Software Magazine, Vol 12 (5), 1995, pp.19-22. 29 Humble Decision Making, Etzioni, Amitai, Harvard Business Review (JulyAugust 1989), pp. 122-126.
MATTER

101

OFTWARE

RELEASING

: DO

THE

NUMBERS

102

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

SOFTWARE LIFECYCLE MANAGEMENT: PREDICTABILITY IN SOFTWARE


DEVELOPMENT
ERNST VAN WANING

he commercial development of software is hardly doable by eye: what

used to be called the software crisis is recognized as a chronic situation since a long time. Developing software proves to be difcult enough; managing software development process as it is: in project execution, in plans and bids and in comparison to others on the market. Easy to say, but what does that mean? Software development projects are like ocean crossings: you dont reach your goal on gut feelings, you make successful crossings with instruments that tell you where you are, where you are going and what weather you can expect. Managing software development projects is similar: without instruments that tell what you have achieved and what you can expect you are like Columbus who did reach the other side, but who, during his rst trip, had no idea how long the journey would take. Columbus, by the way, has never known where he actually arrived.
DEVELOPMENT

development projects is even harder.

Tools are needed that show your

The value of measurement instruments


Shifting from ocean crossings to everyday trafc leaves the importance of measurement instruments intact, as anyone with speeding ticket may testify. A speedometer is a device that tells us the speed at which we drive. It does this in terms of kilometers (or miles) per hour. The property of communicating in meaningful terms is essential: this makes it possible to talk about speed and indeed impose speed limits. Measurement instruments translate measurements into meaningful terms. The question of how to translate the measurements into a speed is relevant only to the designers of the device, not to its users. Indeed, we have used speedometers for centuries and the way in which measurements have been

103

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

IN

SOFTWARE

translated into speed has changed over time, but the concept of speed has not changed at all. It is the very property of communicating in meaningful terms that makes measurement instruments so useful: they make us aware of a situation and enable us to communicate about this with others. Measurement instruments give us immediate and meaningful knowledge about the state of our environment. Knowledge of your own performance and the risks you can take with that performance are forms of the old wisdom know thyself. Companies that are well aware of their performance are better at project planning, project bidding and project delivery than companies with less self-consciousness. That by itself will lead to better customer satisfaction. But there is another great advantage: knowledge of ones own performance is the key to systematically improve it.

A dashboard for software development projects


During the execution of a project we want to see immediately if we are on schedule or not. In case we are off-schedule, we want to change course effectively, which means that we need timely information to diagnose the project and make necessary corrections. Apart from effective corrections, people will always ask when the project will be ready. Project leaders must be able to re-plan their projects. Added to that, it is often the case during a project that either the world itself or our conception of it, or both change. This invariably results in change proposals. Always accepting change proposals will quickly ruin a project leaders credibility; always refusing will eventually lead to the same result. A better idea
YEARBOOK

is to give information that helps to make a rational decision: an estimate of the consequences of a change proposal based on actual project performance will make such a decision much more rational. Such a tool would use statistical techniques to evaluate progress, give timely warnings that a project may run astray and give estimates of projects expected end-date, expected costs and expected quality at delivery time, all based on actual project data. The tool would not only inform you about the current status of a project, it would also help to formulate expected developments about the project, based on actual observations.

104

THE

NESMA

ANNYVERSARY

Gantt Chart
S S 24 24 7 7

Aggregate Staffing Rate


S S 24 24 7 7

Total Cum Effort


175
S S 24 24 7 7

2500

150

MB

2000

125

100

1500

People

PM

75

1000

Maint

50

25

500

Jan '96

3 Jul

9 Jan '97

15 Jul

21 Jan '98

27 Jul

*
Jan '96

3 Jul

9 Jan '97

15 Jul

21 Jan '98

27 Jul

0 *
Jan '96

3 Jul

9 Jan '97

15 Jul

21 Jan '98

27 Jul

0 *

Total Defect Rate


S S 24 24 7 7

Total Cum Normalized Defects


350
S S 24 24 7 7

Total MTTD
2500
S S 24 24 7 7

100

300

2000

80

250

Defects

Defects

200

1500

60

Days

150

1000

40

100

50
3 Jul 9 Jan '97 15 Jul 21 Jan '98 27 Jul

500

20

0 *
Jan '96

Jan '96

3 Jul

9 Jan '97

15 Jul

21 Jan '98

27 Jul

0 *
Jan '96

3 Jul

9 Jan '97

15 Jul

21 Jan '98

27 Jul

0 *

Size
S S 24 24 7 7

Total Cum Cost


600
S S 24 24 7 7
Date 31-10-98 (30.00 mos)

60
Plan

Actual/ Elapsed Months Agg. Staff Total Cum Effort (PM) Total Defect Rate Total Cum Normal Defects Total MTTD (Days) Size (ESLOC(K)) Total Cum Cost ($ M) PI MBI 21.04 1400.21 1629 398.54 31 18.6 5.3 Forecast 28.35 2376.99 1073 443.40 53 16.8 4.6 %Diff 34.8 69.8 -34.1 11.3 69.8 -9.7 -8.1

ESLOC (thousands)

500

50

$ (millions)

400

40

300

30

200

20

100
3 Jul 9 Jan '97 15 Jul 21 Jan '98 27 Jul

10
3 Jul 9 Jan '97 15 Jul 21 Jan '98 27 Jul

0 *
Jan '96

0 *

urrent Plan Actual Interpolated = Start, 2 = CDD, 4 = CUT, 7 = FOC

Current Forecast

Green Control Bound

Yellow Control Bound

Life Cycle includes MB, Maint

Figure 26: An example of an overall report in a tool for managing actual project data.

r 11.1

Such a tool lets you manage a project on actual data. These data tell how much time has been spent on what parts of the project and how much has been realized within that time period. In fact, the time sheet data we already collect bear that information. When entered in such a tool, you can quickly see how your project is doing and where attention is needed if your project does not do well enough.

Effectiveness
Organizations that have started to manage their project with metrics are usually quite happy with that decision. Everyone involved can see in an eye blink if a project runs well, and, if not, where extra attention is needed. This makes discussions much more directed and clearer. Moreover, you can see very early on if projects go astray, so that corrections can be effective. Because following of and negotiating about projects require less time, you can manage more projects and concentrate on those that really require attention. Apart from spending your time better, you will also be much better informed about your projects. Organizations that manage their projects in this way report that they have become more effective: they deliver more projects within budget,

105

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

IN

SOFTWARE

DEVELOPMENT

Jan '96

on time and the desired level of quality. These companies manage their risks better.

Measuring your performance


Apart from the advantages already mentioned, you collect actual data about your projects. Whereas the data tell you how your organization behaves, the tool calculates important parameters like the performance of your software development projects and the stress put on the people working on these projects.

Better plans, happier customers


As humans we have the remarkable property that we can make rational decisions in environments that lack clarity and precision. Apparently we can give approximate answers to questions with inexact, incomplete and not entirely reliable knowledge. But then, how could we have survived in the complex world we live in? Clients expect this sort of behavior: they expect bids before projects are dened. Even clients know that correct calculations can only be made after projects have been nished. Because they cannot make use of exact information, plans and bids are always uncertain.

Project information
A twinkle in someones eye can be the start of a project. remarks, however inaccurate, are a rst approximation of a plan. As people seem to be interested, a (somewhat) more serious plan is quickly
YEARBOOK

As soon as

discussions start, remarks will be made about similarities to other projects. These

made. There is some experience, so a feasibility study will not be necessary. Requirements and design phases are deemed necessary. There will also be corrective maintenance after rst delivery. The system will probably consist of 80% business code, 5% telecom code and 15% system code. Experience indicates that the system will have about 63000 lines of C and there will be no more than 7 people on the project. What we know about the project is very little, if anything. Yet, to see its consequences in terms of project parameters would be extremely useful at this point.

106

THE

NESMA

ANNYVERSARY

At the very least, an estimate would give us an idea of the duration and the effort (cost) for such a project. But, as duration and effort are dened in terms of each other, we should be able to assess the consequences of trading duration for effort. There are other indispensable parameters that may seem less obvious at rst sight: do we know our productivity in software development or the stress we work under when we develop software? If we do not use metrics and instruments as described in the section on a dashboard for software development, then we dont know these. In that case, we would like to fall back on market data to nd a reasonable value to assume for our project. Furthermore, such a tool must enable us to present our ndings in a clear and unambiguous manner to decision makers.
Staffing & Probability Analysis
R&D
C&T
P_Mnt
Milestones 0 - CSR 1 - SRR 2 - HLDR 3 - LLDR 4 - CUT 5 - IC 6 - STC 7 - UAT 8 - FCR 9 - 99R 10 - 99.9R

Monthly Avg Staff (people) <QEW 50%>


1 2 3 4 5 6 8 9 10

SOLUTION PANEL <QEW 50%> C&T Life Cycle Duration 8.3 16.3 Months Effort 43 62 PM Cost 692 1005 $ (K) Peak Staff 7.0 7.0 people MTTD 3.4 147.7 Days Start Date 2003-12-15 2003-09-01 PI=18.3 MBI=2.8 Eff SLOC=63000

RISK GAUGE <QEW 50%> Duration Effort Peak Staff Quality % 0 10 20 30 40 50 60 70 80 90 100 <No constraints>

CONTROL PANEL <QEW 50%>


18.3
7.0
63

14.6 PI

22.0

5.6

8.4

50

76

Peak Staff

Eff SLOC (K)

Figure 27: An example of a report on Stafng and Probability Analysis.

Project: P

We see the number of people working on the project during its entire life cycle, a table with expected values, a graph showing the probability that certain constraints will be met (empty, because we have no constraints) and pointers (that we may manipulate) indicating productivity, peak staff and lines of code.

107

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar '03 '04 '05

IN

SOFTWARE

DEVELOPMENT

Avg Staff (people)

It is good to mention that the solution above is the most likely one. Usually, this means that the probability is 50% that we need less, but also 50% that we may need more. Often, it is advisable to give plans with more guarantee of really attaining what they say. The numbers for the entire life cycle for 50% and 80% probability are in the table below.

Duration life cycle Effort Cost Peak staff MTTD

50% 16.3 62 1005 7 147.6

80% 17.3 80 1291 8.5 121.5

Months PM K$ People Days

Explanation R&D, C&T, 99% error free

Mean Time To Defect

During the meeting where these numbers are discussed someone states that this can and should take much less time: a year would be more than sufcient. The same person adds that if we take the maintenance phase for granted, we would be rst on the market.

Duration life cycle Effort Cost Peak staff MTTD

50% 12 216 3503 33.1 31.0

80% 13 310 5019 43.5 23.7

Months PM K$ People Days

Explanation R&D, C&T

Mean Time To Defect

Investment and effort are three to four times as high as in the previous plan,
YEARBOOK

the stress on the project to deliver within a year will be very high and partly because of that the expected quality will be much less. Moreover, 13 months duration instead of 12 is quite probable.

ANNYVERSARY

Market data
Some say that plans without data to calibrate them are mere speculation. In our case, we can compare our plans with the market data. Below you see four graphs and two tables. The graphs plot the lines of code (LOC) against expected duration, expected effort, peak staff and the expected number of errors found during construction. The three lines indicate market

108

THE

NESMA

performance: the central line is the average; the other two indicate 1 standard deviation below and above the average.

Compare Estimates to Historical Data


C&T Duration (Months) vs Effective SLOC
100

C&T Effort (PM) vs Effective SLOC


1000

C&T Duration (Months)


10

C&T Effort (PM)

100

10

10

1
10

1
100

100

Effective SLOC (thousands)

Effective SLOC (thousands)

Errors SIT-FOC vs Effective SLOC


1000

C&T Peak Staff (People) vs Effective SLOC


100

C&T Peak Staff (People)

Errors SIT-FOC
10

100

10

10

1
10

1
100

Effective SLOC (thousands)

Effective SLOC (thousands)

jaar 50% Contingency Plan (ba... jaar 80%

10

12

14

16

18

20

Life Duration (Months)

Current Solution

Logged Solutions

Historical Projects

QSM 1999 Business

Avg. Line Style

1 Sigma Line Style

Project: P

Looking at duration and errors found, our plans do not look unreasonable. However, the plans for a one-year project (red circled dot) demand very much effort and a high peak staff, compared to market data.

Company data
Can we trust our plans if we compare them to our own records? At the next page you will see how our plans compare to projects we have done previously.

109

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

Figure 28: An example of a report on Stafng and Probability Analysis.

IN

SOFTWARE

SOLUTION PANEL jaar 80% Life Cycle C&T Months 13.0 6.7 Duration PM 310 214 Effort $ (K) 5027 3474 Cost people 43.6 43.6 Peak Staff Days 23.6 0.5 MTTD 2003-12-30 2003-10-08 Start Date PI=17.3 MBI=6.1 Eff SLOC=63000

Solution Comparison Life Duration (Months)


QEW 50% Quick Estimate Wizard... QEW 50% QEW 80%

DEVELOPMENT

100

Solutions

Compare Estimates to Historical Data


C&T Duration (Months) vs Effective SLOC
100

C&T Effort (PM) vs Effective SLOC


1000

C&T Duration (Months)


10

C&T Effort (PM)

100

10

10

1
10

1
100

100

1000

1000

Effective SLOC (thousands)

Effective SLOC (thousands)

Errors SIT-FOC vs Effective SLOC


10000

C&T Peak Staff (People) vs Effective SLOC


100

C&T Peak Staff (People)

Errors SIT-FOC

1000
100
10
1
10

10

1
10

100

1000

100

1000

Effective SLOC (thousands)

Effective SLOC (thousands)

SOLUTION PANEL jaar 80% Life Cycle C&T Months 13.0 6.7 Duration PM 310 214 Effort $ (K) 5027 3474 Cost people 43.6 43.6 Peak Staff Days 23.6 0.5 MTTD 2003-12-30 2003-10-08 Start Date PI=17.3 MBI=6.1 Eff SLOC=63000

Solution Comparison Life Duration (Months)


QEW 50% Quick Estimate Wizard... QEW 50% QEW 80% jaar 50% Contingency Plan (ba... jaar 80%

Solutions
20

10

12

14

16

18

Life Duration (Months)

1 Sigma Line Style Project: P Avg. Line Style QSM 1999 Business Logged Solutions Historical Projects Current Solution Figure 29: An example of how plans compare to projects that are performed previously.

4
The interpretation remains the same as in the last section: the peak staff for a one-year project is, compared to our own records, extraordinarily high.

Benchmarking and performance improvement


Clients of a company that develops software for the telecom market had complaints that they were relatively expensive and their software contained
YEARBOOK

many errors. To answer the complaint, the company decided to benchmark itself to its peers on the market. Strong points were the high motivation and technical know-how of its staff, its effective management and a shared, genuine feeling to be part of the company. A weaker point was that a lot of pressure was put on projects. Because of the pressure, there was hardly time to test code, leave alone document it. The development department was separated from the department responsible for quality. Budgets for education and training were small, but there was hardly time for education and training anyway.

110

THE

NESMA

ANNYVERSARY

Overall example view


Errors SIT-FOC vs Effective SLOC
10000

MBI vs Effective SLOC


10

1000

Errors SIT-FOC

MBI

100

10

10

100

1 1000

10

100

0 1000

Effective SLOC (thousands)

Effective SLOC (thousands)

Productivity SLOC (per MB MM) vs Effective SLOC


10000

FUNC Duration (Months) vs Effective SLOC


100

Productivity SLOC (per MB MM)

FUNC Duration (Months)

10

1000

10

100

100 1000

10

100

0.1 1000

Effective SLOC (thousands)

Effective SLOC (thousands)

Figure 30: An overall example of trend lines on theTelecom Avg. Line Style QSM 1999 Business - companies market. Business Systems

1 Sigma Line Style

5
The graphs above show trend lines of the companies market. The horizontal axes show the lines of code. In reading order the graphs show the number of errors found before delivery, the pressure on the project, the productivity per staff member and the duration of the functional design phase. Errors found: Although not discovered by the customer, four projects have a high number of errors. The company saw this a reason for the customer to complain about the quality. MBI (pressure on projects): Most projects on the high side. MBI is a measure for the effort per time unit, the pressure on a project. Higher MBIs lead to higher project costs and lower quality. Productivity per staff member: Contrary to what market data show, the company has a diminishing productivity, as projects get larger. This company had complaints about high prices and low quality. The company spent little time on documentation, used high stafng rates. As a consequence, staff people could not read the information they needed, but were forced to ask col-

111

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

IN

SOFTWARE

DEVELOPMENT

leagues. Result of this strategy was both a diminishing productivity and an increasing error rate. Functional design: Four projects show that the company spent an equal amount of time to functional design, irrespective of the size of the project. The company has introduced a number of improvements. The development and quality groups started to work together, investments were made in better tooling and plans were based on metrics. The company started to take documentation seriously, even after the design phase. After some time people were much more familiar with the code, and work became less hectic. Management realized that project pressure could be lowered. The company had lower operating costs and the customer had better quality.

Conclusion
Accompanying Columbus would have been for madmen in our 21st century eyes. We refuse to cross the ocean without good navigation instruments. In the centuries after Columbus we have found out what data to collect and how to build instruments that present data about a journey in a comprehensible way. Now we know exactly where we are on the ocean, what kind of whether we can expect and when we will arrive. Comparable instruments have not only helped us to y around the world, they have even helped us to get into space. Good metrics and instruments made it possible to plan a journey accurately in advance and to execute the plan with the exibility to adapt quickly and effectively to unexpected circumstances. This development is closely connected with the development of science.
YEARBOOK

We have also learned to measure and interpret our measurements in other areas. Insurance companies translate measurements (counts) into insurance contracts. Like the development navigation instruments the development of insurance instruments is work for specialists: not only do we need the right data, we also analyze and interpret the data to translate them into contracts. The use of measurement instruments has had an enormous inuence on the development of our society. The use of measurement instruments for projects has a big positive impact on the organization using it. As knowledge about work improves, staff involvement grows. Compare this with other areas where measurement is knowledge: the question if a plan meets its expectations

112

THE

NESMA

ANNYVERSARY

becomes a certainty and one concentrates on the process that makes sure that the projects are delivered to the customers expectations.

About the author


Ernst van Waning (evw@infometrics.nl) is a metrics expert with more than 25 years of experience. He is senior consultant for QSM, in which function he carried out many assessment studies of software projects, including advice on how to improve the organizations carrying out these projects. He is founder of Infometrics, a consulting company specialized in quantitative analysis of data for business performance and strategic decision support. He is a board member of NESMA, and president of ALU, the Association of Lisp Users. QSM has developed navigation instruments for the area of software development: it has found laws and invariants wherever they are present in data SLIM Estimate, Control and Metrics. Essential characteristic of the SLIM product line is that it is based on a large set of project data. Data about projects can be collected (Control) and used (Estimate and Metrics). Reliable plans can be made with SLIM Estimate and executed with SLIM Control. The purpose of SLIM Metrics is to benchmark projects against other projects or market trends. These products give insight in market performance and project performance in an objective way. Together these products allow the systematic management of the risks involved in software development projects.
DEVELOPMENT

about software development. These laws and invariants have found their place in

113

OFTWARE

LIFECYCLE MANAGEMENT: PREDICTABILITY

IN

SOFTWARE

114

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

20% CONTROL, A GUIDE FOR COST REDUCTION, QUALITY IMPROVEMENT AND IT GOVERNANCE
HENNIE HUIJGENS

gardener, Karel Capek recommended in The year of the gardens (1929). In the ITworld the last few years the opposite seems to happen. Especially those who should be busy raking the IT-landscape the consultants and advisors are subject to heavy criticism. Bad mutual cooperation, IT-service suppliers that scrupulously promote outsourcing, models and frameworks that do not t together. Everything seems to good-for-nothing. Yet all these differences are not insuperable. They are just part of a professional group that is growing up. One thing is clear: gaps appear in the theoretical models and frameworks, in the stories that advisors tell their customers, in the way the gap between strategy and operations is bridged, and with that in the way organizations are changing. 20% Control deals with bridging that gap. It tells a story about the no mans land between strategyplanning and performing all those daily IT-activities. Just there I call the border region summarizing IT control lays a huge challenge for all the people that are involved in a professional way in continuous improvement and professionalization of organizations. If somewhere innovation and professionalization of the IT industry will take shape, then it will be at that borderline of business and information technology, of strategy and operations. 20% Control is about changing. Because why should you wish to control the steering between strategy and operations of a business if you did not beforehand think of the fact that steering always leads to change? To monitor, measure, and quantify your own performance not only forces you to reect, yet it forces you also to take direct action.

117

20% CONTROL,

A GUIDE FOR COST REDUCTION

QUALITY IMPROVEMENT AND

IT

GOVERNANCE

here are a hundred ways to lay out a garden. The best is to hire a

Models and frameworks


Yet control is not about gures, measurement results, scorecards, and benchmarking only. The softer side of it is about how you can use that hard data to persuade people to start doing their work in a different way. Its about the fear to choose and about dreaming. Its about loyalty and condence. To get a clear view at the borderline between strategy and operations we use all kinds of models and frameworks. As a brainstorm for 20% Control I decided to build my own model too: three circles stand for strategic, tactical and operational activities, combined with the plan-do-check-act circle borrowed from Deming. In that model I drawed up the various processes that do occur in a number of many used improvement models (e.g. ITIL, ASL, CMMI, CobiT). The result is a multi-coloured collection of bigger and smaller balls.

Act

Strategic Tactical

Plan

Operational

Check

Do

108
YEARBOOK

Figure 31: An overview of the functionality within a number of many used models and frameworks.

Two things did attract my attention right away. All models are especially focussed on operational activities, and not on steering. The models do focus at translation of strategy into plans, and at putting these plans in action. Monitoring aspects do almost not come up for discussion. Measuring and evaluation of the performed activities is nowhere really worked out. I was dazzled by my short investigation, because this did not resemble by far at the message that all these models and frameworks where spreading: the primary goal is to improve the organization!

118

THE

NESMA

ANNYVERSARY

Practice proves: twenty percent really matters


Practical experience within a major nancial service supplier in The Netherlands showed that the headofce and all forty foreign ofces together used more than eight thousand resources to deliver their products and services to its customers.

Figure 32: Practical experience proves that less than twenty percent of the (IT) resources that an organisation is using, contibutes to the delivery of the most important products and services.

infrastructure, people, data only fteen hundred (therefore less than twenty percent) was necessary to support the delivery of its most important products and services. The remaining eighty percent was not direct indespensible or they turned out to be only nice-to-have. What such practical experience teaches is that focus is really important. You cant just want everyting anymore. I want it now, I want it all! wont hold in ITland. You will have to choose. Thats why the title of this article: 20% Control.

One hundred percent is just too much!


Decision makers and their advisors nowadays have to know better what they want than ever before. You cant just get away by saying that you take decisions based at your intuition. Therefore the risks and the importance are just too big. Too many bankruptcies, scandals and nancial ascos have proved the last years that such an attitude leads to a disasterscenario. And besides that: law- and regulations and external supervisors ask for it. Since Sarbanes-Oxley, Basel II and IFRS (International Financial Reporting System) are at the agenda of every nancial service supplier and every enterprise quoted on the stock exchange the need to really do something with IT control is only getting bigger. It will be clear that an organisation cannot do everything and react on every incident without thinking. Making choices is a must! And to make the right

119

20% CONTROL,

GUIDE

FOR

COST

REDUCTION

QUALITY

IMPROVEMENT

AND

Of all these resources e.g. information technology, embedded software,

IT

GOVERNANCE

choices becomes more and more the critical success factor. Especially at the territory of IT control, where so much seemes to be going on, decisions have to be founded and have to be solid choices. One hundred percent is simlpy too much. Thats why twenty percent should be enough.

IT control? Whats that?


Where one when mentioning the idea of IT control thinks of checking or auditting, another looks at it from a steering point of view and speaks about having control over the information provision. Many people think of IT control as being the same as IT governance or as Planning & Control. I do like to describe it as IT control is about the steering activities between strategy on the one hand related to an enterprises information technology and performing the operational activities on the other. Control builds the bridge between strategy and operations.

act IT strategie IT strategie

check

IT control IT control

plan

IT activiteiten IT activiteiten do
Figure 33: Control bridges the gap between strategy and operations. Control is about steering, the two main processes are planning daily activities and checking results.

YEARBOOK

ANNYVERSARY

Strategy is about making choices. It gives direction to an organisation. Youll have to steer, measure whether youre at the right track, and when needed make corrections. This steering and checking has often to do this as distinct from planning a strategy with the medium-long-term strategy of an organisation. Planning of operational activities then is an inportant goal. With regard to steering activities planning is the common factor.

120

THE

NESMA

Except for planning IT control is about costs, quality and risks. One thing strikes when you observe how enterprises handle these activities: many of them do prefer one of these three aspects. An enterprise steers the one moment especially at cost savings, or especially at quality improvement, while another moment controlling the risks is put in the centre. That preference almost allways is joined with the strategy of the organisation. A pursuit for operational excellence almost automatically leads to steering at cost savings. And a focus at product leadership often goes together with optimizing quality and sometimes with innovation and risk management.

Costs Processes Optimize the efficiency Costs, revenues and processquality ITIL, ASL Process specialists Operational focus: with your boots in the mud

Quality Products Optimize th effectivity Customer satisfaction and product quality CMM, EFQM, Six Sigma Quality specialists Tactical focus: stafdepartment analyzes and advizes

Risks Events Control the risks Laws and rules and external supervisors Cobit Auditors Strategic focus: the supervisors set the rules of the game

gureFigurepag. Control recognizes three management aspects: costs, quality and risks. Each aspect is 34, 34: 111
characterized by its own practical implementations. The message: choosing is essential.

To join creative and innovative growth with management of costs, quality and risks: thats what undertaking the coming years is about. Focus at the gist of the matter, get rid of the superuous fat and streamline the organisation and its processes. The full attention should be given at the centre of growth of an enterprise. To do so knowing your strenghts and your weaknesses is an essential prerequisite.

121

20% CONTROL,

GUIDE

FOR

COST

REDUCTION

QUALITY

IMPROVEMENT

AND

IT

GOVERNANCE

Choosing as the distinguishing factor


Planning acts as a preparing role for all other steering activities. It is a repercussion of the demands and the ongoing priorities with regard to the IT organisation that are set in the customers organisation. A good planning process sets a baseline for business alignment. Planning reects how the organisation wants to create value with the help of information technology. Thats why planning is put in the centre of determination of an organisations focus with regard to IT control. The 20% Control approach is based at value chain management. It links products and services that an enterprise delivers to its customers with the processes that are performed to do so and with the resources that are needed within these processes.

Product Product Process Process Resource Resource Resource Resource Process Process Resource Resource Process Process Resource Resource Resource Resource

Figure 35: A value chain consists of products or services that an organization deleivers to its customers, the processes that are performed to deliver those products or services, and the resources that are used within these processes. Each object within a value chain is marked with a control prole (in the gure

Figuur 10.5 as white, grey or black). represented

Each object within such a value chain is marked with a prole that tells
YEARBOOK

something about the importance of that product, process or resource weight up to the control aspects. So a product with a high control prole contributes highly to the value creation of an organisation. Such a prole is focussed at costs, quality or at risks. Besides the importance a control prole tells an organisations something about the nature of the actions that are needed. In the eld of the eighty percent less important resources it will be possible to save costs without too great risks. The twenty percent really important resources are subject to quality improvements and risk management. And value chain management helps making the right choices and to argue your decisions.

122

THE

NESMA

ANNYVERSARY

Value chain Value chain management management

Efficiency Efficiency

Effectivity Effectivity

Risks Risks

Figure 36, pag. 1


Figure 36: Where value chain management represents planning, three focusses are worked out: one based at cost reduction and productivity improvement, one at quality improvement, and one at risk control and IT governance. Where cost management focusses at the eighty percent less important resources, quality improvement and risk management are mainly focussing at the twenty percent important resources.

The bottom level of the value chain the resources often are the objects in an organisation that are subject to IT control. Here youll nd the IT infrastructure, the computercenter, the PCs and servers, the software applications, yet also the people that are operating all those machines, write and maintain the programs, and the business managers that support end-users of all these. 20% Control is based at an umbrella model in which youll nd all activities that are performed by any IT organisation. With the help of that model cost-, quality-, and risk management suddenly gets an actual meaning that is rlated to

123

20% CONTROL,

GUIDE

FOR

COST

REDUCTION

QUALITY

IMPROVEMENT

AND

IT

GOVERNANCE

Cost management Cost management Process efficiency Process efficiency Productivity Productivity Performance Performance Service level mgt. Service level mgt. Outsourcing Outsourcing

Quality management Quality management Product improvement Product improvement Customer satisfaction Customer satisfaction Delivery time Delivery time Time to Market Time to Market Defect Density Defect Density Service level mgt. Service level mgt.

Risk management Risk management Cost management Cost management IT governance IT governance Laws and regulations Laws and regulations Continuity Continuity Security Security SOx, Basle II, IFRS SOx, Basle II, IFRS

the daily operations. When you use this model, shadowy things as the hidden costs of IT do belong to the past. The quality of business IS management, considered as unmeasurable by some people, becomes visual. And quantitative risk management no longer is something you do talk about, but you dont really do
Infrastructure management Strategic activities
Quality IT service management organisation Planning & Control

Application management
Organisation Application cycle cycle management management

Business IS management
Organisation Future of information information management management

Tactical activities

IT customer IT service Financial Planning & Planning & Relationship continuity management Control Control Management management Service Service Service Availability Capacity Cost Quality Cost Quality Level level level management management management management management management Management management management ConfiguraIncident Problem Change Incident Availability User Data Security tion management management management management management support management management management Configuratie Release management management Security management Service Desk Capacity Continuity Change Release Change management management management management management ProgramControl and distribution Realisation Impact analysis Testing Design Implementation Specification Procedures management Infrastructure Infrastructure Testing & checking Implementation

Operational activities

Network Operations service management management Management Computer Systemof local installation & management processors acceptance Hardware lease / rent Buy hardware Infrastructure Infrastructure

Costs of use Fixed costs

Software licences Buy software

Infrastructure Infrastructure

Figure 37, pag. 114

Figure 37: An umbrella model, based at ITIL, ASL and BISML, that contains all processes that are performed within an IT organisation. Implementing cost management based at such a model reduces for the greater part the occurrence of so-called hidden costs of IT.

YEARBOOK

Based on the IT-process model in Figure 37: An umbrella model, based at ITIL, ASL and BISML, that contains all processes that are performed within an IT organisation. Implementing cost management based at such a model reduces for the greater part the occurrence of so-called hidden costs of IT. 37 a detailed ITscorecard is developed (see Figure 38: An IT-scorecard, based at the umbrella model shows a limited number of metrics, that can be used to measure the control within an IT organisation. The scorecard is balanced: all metrics are weighted against the organisations average. 38). In this balanced scorecard the score of a project or enhancement release is measured against the average value of the organisation (internal benchmarking). By measuring the score against an

124

THE

NESMA

ANNYVERSARY

external average of a complete market (e.g. all nancial service suppliers, or all garbage disposers, or all ITIL/ASL users) external benchmarking is becoming reality.

Above 100
80 60 40 20

Average

0 -20 -40 -60 -80

Project A

Project B
GOVERNANCE

Below -100
Costs Productivity Time-tomarket Defect Density Customer Staisfaction Risk score

Figuur 10.8

Figure 38: An IT-scorecard, based at the umbrella model shows a limited number of metrics, that can be used to measure the control within an IT organisation. The scorecard is balanced: all metrics are weighted against the organisations average.

Summary
20% Control is about steering, it lls the gap between strategy and information planning on the one side, and daily operations in the eld of infrastructure management, application management and buisness IS management on the other. The most important processes within control are planning in the 20% Control approach visualized by value chain management and besides that cost management, quality management and risk management. By guiding the decision makers in an organisation in assessing the value chain of products, processess and resources, an organisation gets insight in the most important resources. Besides that the organisation gets a clear view on the most effective direction to implement control in its IT department: focus at efciency improvement and cost reduction, focus at more effective processes,

125

20% CONTROL,

GUIDE

FOR

COST

REDUCTION

QUALITY

IMPROVEMENT

AND

IT

higher customer satisfaction and improved quality, or focus at risk control and IT governance (e.g. compliance towards law and regulations of third parties). 20% Control helps organisations in making the right choices, in an organised and step-to-step approach, and doing the right thing at the right time: rst things rst.

About the author


Hennie Huijgens (hennie@goverdson.nl) is the founder of Goverdson, a small and exible organisation for projectmanagement in the eld of IT control (www.goverdson.nl). He works as an independent consultant and project manager in the eld of IT control, and is a specialist in cost management, quality management and risk management and the use of IT metrics. Besides that he is a board member of the NESMA.

126

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

IT GOVERNANCE REQUIRES QUANTITATIVE (PROJECT) MANAGEMENT


TON DEKKERS

nstead of a demand market for IT services we are now facing a supply

market. Return on investment is more relevant for business then before. Financial affairs in some areas make managers cautious, value for money and transparency are now keywords. Business demands more and more governance and the same applies for IT nowadays. We need IT-governance as much as we need Corporate Governance. And at best, both governance practices arise from the same shared model for managing information and related technology (CobiT). The operationalisation of certain IT-governance aspects requires quantitative
MANAGEMENT

(project) management.

Introduction
Because business management is interested in simple and understandable measurement, the chosen measurement method should t in a simple measurement model. The model should also support decision making based on these performance rates. Projects and suppliers can be benchmarked, when applying size measurement and knowing performance rates. This could be done pro-active (estimation) or reactive (performance measurement). In both cases the supplier (internal or external) needs to explain the performance, showing transparency. The availability of size measurement and performance rates opens possibilities for managing and controlling contracts (essential to return on investment, value for money), projects and risks.

Corporate Governance
The essence of Corporate Governance is managing an enterprise well and to be able to prove it. Corporate Governance is a process, effected by an entitys

127

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

board of directors, management and other personnel, applied a strategic setting and across the enterprise, designed to identify potential events that may affect the entity, and manage risks to be within its risk appetite, to provide reasonable assurance regarding the achievement of entity objectives. In other words organisations must satisfy quality, duciary and security requirements for all assets and in the interest of all stakeholders, e.g. the entity, the shareholders, the employees and the society in general. Recent affairs like Enron, Ahold and Parmalat increased the attention to Corporate Governance. These kinds of affairs have lead to a discussion about tasks and responsibilities of the various roles in governing an enterprise and new regulations have been developed. One of the most signicant changes is Sarbanes-Oxley, introduced in 2002, which applies to all enterprises traded on the American Stock markets and to companies with assets in the U.S.A. above a certain amount. Other examples of regulations are Basel II that applies to banking entities and FDA for the pharmaceutical industry. IT Governance, although not obligatory, is often seen in literature as an equivalent for IT. [1, 2] Following the Turnbull report [3], Corporate Governance and risk management have become increasingly important to businesses, their owners and their managers. A recent issue of Internal Auditing Magazine stated 'the biggest risks facing organisations are now technology-based'. Just as the role of any auditor will include an information systems component, so effective Corporate Governance and risk management necessitates effective IT Governance and risk management. For many organisations, information and the technology that supports it represents their most valuable assets. Moreover, in today's competitive and rapidly changing business environment, management requires
YEARBOOK

increased quality, functionality and ease of use from their IT, delivered faster and faster, constantly available and at lower costs than ever before. The benets of technology are in no doubt. However, to be successful, organisations have to understand and manage the risks associated with implementing new technologies.

ANNYVERSARY

IT Governance
Over the years a number of publications tried to address this issue. Not only does the denition vary also implementations comprise a wide area of IT and the same applies to the proposed models. Two major organisations are actively

128

THE

NESMA

involved in IT Governance: ISACA (Information Systems Audit and Control Association) and the IT Governance Institute. Both use the same denition: A structure of relationships and processes to direct and control the enterprise in order to achieve the enterprises goals by adding value while balancing risk versus return over IT and its processes. The CobiT (Control Objectives for Information and related Technology) model [4] they use describes mainly the processes within the IT organisation. The framework provides managers and auditors guidelines to implement and control the management processes. CobiT will be explained later in a separate paragraph. Another party in IT Governance is Gartner [5]. Gartner identies attention areas that require IT governance policies and need specied goals. These goals should include the business value, along with metrics that measure the achievement of goals, not just the mechanics of the activities to get to them. IT governance areas can be grouped under two main categories: Supply Side: IT Governance over the provision of IT Services. This category, which affects the how of IT activities and scope, includes: standards, supplier policies, centralization vs. decentralization of IT management and resources and ownership and usage policy of data and processes. As IS organizations become leaner and more agile in their efforts to support changing business objectives, they increasingly outsource their IT resources (skills, knowledge and innovation capacity). As a result, a growing area of IT governance is sourcing governance. Demand Side: IT Governance over Decision Processes. This category addresses the what of IT services and includes: the alignment and integration of business and IT planning, the total amount of nancial and other resources to be devoted to IT in an enterprise, the allocation of IT spending and resources between business units; the criteria for assessing the value of proposed investments in IT-related projects, the relative priorities to be established for investment alternatives, the accountability for realizing the benets of investment project and IT investment funding, and usage and chargeback policy.
MANAGEMENT

security policy, business continuity policy, IT architecture, development

129

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

Other models are mixed bags. Clearly in all other models the Governance and Management are positioned separately. Governance is about ruling and regulation (Gartner viewpoint). Management is about execution, decision making and responsibilities in relation to IT activities (CobiT). Elements from both approaches (CobiT and Gartner) are more or less present depending on the objectives of the model.

CobiT
CobiT bridges the gaps between business risks, control needs and technical issues. It presents IT activities in a manageable and logical structure, and documents good practice across this structure. CobiT's 'good practices' are the consensus of the world's experts - they will help optimise information investments and provide a benchmark to be judged against when things do go wrong - and indeed to prevent things from going wrong in the rst place. In addition it is independent of the technical IT platforms adopted in an organisation. The main theme is business orientation. It provides comprehensive guidance for management and business process owners and is rmly based in business objectives and designed to help three distinct audiences: Management, who need to balance risk and control investment in an often unpredictable IT environment. Users, who need to obtain assurance on the security and controls of the IT services upon which Business Processes depend. Auditors, who can use it to substantiate their opinions and/or advice management on internal controls.
YEARBOOK

The CobiT Framework explains how IT processes deliver the information that the business needs to achieve its objectives. This delivery is controlled through 34 high-level control objectives, one for each IT process, contained in the four domains: Planning and Organisation, Acquisition and Implementation, Delivery and Support, and Monitoring. On the other hand the Framework identies which of the seven information criteria (effectiveness, efciency, condentiality, integrity, availability, compliance and reliability), as well as which IT resources (people, applications, technology, facilities and data) are important for the IT processes to fully support

130

THE

NESMA

ANNYVERSARY

the business objective. Later on a summery table (gure 1) [4] shows the domains, the IT processes and the criteria and resources.

Control objectives
More than Gartner, CobiT provides a framework for control over IT processes, so that management can map where the organisation is today, where it stands in relation to its peer group and to international standards, and where the organisation wants to be. Critical Success Factors, which dene the most important management-oriented implementation guidelines to achieve control over and within its IT processes, can be identied. The direction and level of improvement is set by the Key Goal Indicators, which dene measures that tell management whether an IT process has achieved its business requirements, and the Key Performance Indicators, which are lead indicators that dene measures of how well the IT process is performing in enabling the goal to be reached. Developing, tracking and supporting the Goal Indicators and the Performance Indicators of the IT governance processes is usually the responsibility of a project management ofce, reporting to the CIO, which coordinates the processes and coordinated with the IT governance process. Over the years numbers of measurements and various measurement programs have been developed, implemented and cancelled again. Two approaches proved to be valid over the years: the Goal Question Metric (GQM) [6] and performance measurement based on functional size. GQM is more or less an approach to develop relevant measures. Performance measurement comprises a set of measures and standard measurement methods that are already available. Therefore performance measurement is used to show how measurement can support IT Governance.
MANAGEMENT

participants and acts as a gatekeeper to ensure that IT and business activities are

For management measurements and performance rates are also important. To identify which ones are relevant and what should be measured, the IT processes dened in CobiT are mapped on the information criteria and the resources, reected in the summary table.

131

IT

GOVERNANCE

REQUIRES

Management view

QUANTITATIVE

(PROJECT)

YEARBOOK

Figure 39: CobiT framework summary table, ISACA, IT Governance Institute.

The application software is far off the most relevant product delivered by IT. In the model the column applications is present. In CobiT the full resource is application systems and is dened: Application systems are understood to be the sum of manual and programmed procedures. For this paper the interest part of this denition is the programmed procedures, the software. The summary table also identies information criteria. In relation to performance the most applicable are Effectiveness and Efciency. So the interesting CobiT processes are the ones that are marked with a in the

132

THE

NESMA

ANNYVERSARY

column applications and with a P (or S) in the columns Effectiveness and Efciency. Although application systems are input in the summary table, in the processes where provision of programmed procedures is a relevant issue for control, the software can be seen as a product to deliver. When there is a product, it is more easy to manage on quantitative information. Looking at the rst process P01 - Dene a strategic IT plan both Effectiveness and Efciency are marked and so is applications. The applications should meet the intentions of the business as marked in the criteria but also support the business. In this situation the provision of software is not relevant for controlling this process, application systems are not products to deliver, they are or should be available. Processes where the provision of applications is relevant are: PO05 Manage the IT investment; PO10 Manage Projects; AI02 AI06 Acquire and maintain application software; Manage Changes;
MANAGEMENT

DS02 Manage third-party services; DS03 Manage performance and capacity; DS06 Identify and allocate costs; M01 Monitor the process.

In some other processes the applied measurement (processes) and methods support decision making as well: PO09 Assess risks; PO11 Manage Quality; DS01 Dene and manage service levels; M02 M04 Assess internal control adequacy; Provide for independent audit.

Next to productivity, performance indicators like software development cost ratio, delivery rate, defect rate, main time to repair and capacity rate are important. Other relevant measurements for each process can be identied with the use of the GQM method.

133

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

In this paragraph performance (measurement) is mentioned several times. Before going into the details of Quantitative Project Management and the compliance with CobiT, performance measurement has to be explained rst.

Performance measurement: Input - Process - Output


All performance measurement is based on the activities in relation to results. The principle applied here is the Input-Process-Output model. In order to carry out an activity (process component), resources (input) are required. Each activity results in a product (output). To be able to improve the process you need to know what goes in and what comes out. In other words, you need to measure in recognisable, objective and relevant units from a well dened and consistent measurement process.
effort material

activities

product

input input
costs =

process process
price per unit x

output output
units

Figure 40: The input/process/output model.

An example of this model (from the construction industry) is the building of a wall. The product delivered is a wall. The unit of measurement money does not describe the wall; money is needed to pay for the construction of the wall and
YEARBOOK

the bricks. Hours is no solution either; it is a measure for the resources for constructing the wall. A better unit for the wall is the number of square meter. A square meter described the size of the wall comprising the components brick as well as effort. The number of bricks in a square meter is easy to measure (recognisable, objective and relevant) and so is the number of hours needed for the construction of the wall. With this information, the performance can be measured quite easily: the performance is the number of square meter per hour. For the next wall we can calculate the expected time (numbers of hours based on one square meter) and

134

THE

NESMA

ANNYVERSARY

costs (the expected hours at an hourly rate and the expected bricks a price each). Afterwards the performance can be evaluated with the used bricks and the spent time. This type of performance rate is usually called productivity rate. Consistent measurement is not possible without dening what should be measured and how the measurement will take place. More formal denitions with scale, target, et cetera can be used like described in Planguage [7], for management a more simplied way to dene the measures that are used. In the following table the most relevant / often used performance indicators to manage and control IT are dened:

ratio productivity development cost delivery defect capacity

measurement time spent / size of the application(s) costs / size of the application(s) leap time needed for delivery / size of the application(s) (major) defects / size of the of the application(s) available time per period/ productivity rate

unit hours/ size unit costs / size unit hours / size unit defects / size unit
MANAGEMENT

size units / period

Looking at the ratios, in all size is a relevant factor. Before going into details, the way to measure and to implement the measurement has to be dened. Structure, denitions and standards are required to ensure the right measurement and the right interpretation. Only then all (processes) can benet from the effort of measurement.

The Model
The measurement model described here is based on the I-P-O model. To apply the model for both analysis and prediction (estimation), the model is set up the opposite way.

135

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

size productivity gross hours

risk analysis risks opportunities measures influences consequences

hours (& money)


Figure 41: The measurement model.

The model starts with the output (product) and ends with the input (hours). For software the main cost driver is the effort needed to develop and maintain the application. This is reected in the model with size on top, the productivity rate as a multiplier in the middle and hours (and money) down. Looking at the example of the construction of the wall, when the number of square meter of the wall is known and the cost ratio (includes productivity and materials like bricks) is derived from measurements the expected costs can be calculated. In addition the number of hours needed for construction is input for the schedule. The construction example works ne when the bricks, the type of the wall, the structure and surface are always the same. When the size of the bricks is bigger, perhaps building could be faster. When the bricks are used upright instead of at, additional things have to be considered to keep the construction stable.
YEARBOOK

Risk Analysis / Mitigation


It is a correct conclusion that the I-P-O model needs some enhancement. It is only useable when the circumstances are completely equal in all cases. When circumstances vary, you need to dene your standards rst. In case of the construction of the wall, using a normalised brick size, at building, use the same points and no ornaments. Back to software development, the development process is rst step to standardise (compare at building, no ornaments). This condition is easier to full as organisations maturity increases, CMM level 2 and up. The development

136

THE

NESMA

ANNYVERSARY

platform (type of bricks, points) unfortunately is more diverse in a number of ways then desired, so in software development standards have to be set for a number of platforms. For each relevant platform, organisation dependant, the productivity rate in the standard situation (norm) has to be determined. This productivity rate is basis for benchmarking (performance -improvement- measurement) and estimation. Preferably estimates and benchmarking should be made based on organisations own performance rates, but when these rates are not available, rates provided by third parties can be used. Very useful are the project delivery rates (productivity rates) of the International Software Benchmarking Standards Group (ISBSG) [8]. This not for prot organisation provides rates based on a project database with over 2,000 projects from many countries, branches and development platforms. When development standard conditions are dened, each project can be compared with these standards. The circumstances that differ from the standard are potential risks or opportunities when using the measurement model for estimating purposes. When using the measurement model for performance The risk analysis and the impact of the risks and opportunities (risk mitigation) on the expected hours or hours spent are reected in the right section of the measurement model.
MANAGEMENT

analysis, deviant circumstances could have inuenced the hours spent.

Measurement
Applying the measurement model for measurement purposes means using the model bottom-up. The inuences of risks and opportunities have to be analysed to clean-up the gross hours. With the gross hours the productivity rate is calculated. The outcome of the analysis of the specic circumstances of the evaluated project contributes to process improvement, lessons learned and future mitigation. The reliability of the results of the measurement model depends on the starting point, the size of the project, application developed or enhanced. To assure this, one should use a well dened method of size measurement and apply this method consistently. In case one of the business goals is benchmarking (with peer groups), this method must be a general accepted standard.

137

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

ISO 14143
The importance of having a standard Functional Size Measurement Method (FSMM) is recognised. Although there was already a formal standard method available and widely used, Function Point Analysis [9, 10, 11], in some areas of software development new initiatives came up. Some made addendums to the existing methods, other tried to create new methods. One recent development is COSMIC Full Function Points [12]. To avoid misunderstanding and communication problems, there was a need for solid and useful denitions for functional size measurement. With ISO standard 14143-1 [13] a set of denitions is available. The most relevant denitions are: Functional Size Measurement (FSM): The process of measuring Functional Size. Functional size: A size of the software derived by quantifying the Functional User Requirements, the size in the measurement unit is derived through the assessment of Base Functional Components. Functional User Requirements (FUR): The representation of the practises and procedures the software must support to full users needs. Base Functional Component (BFC): A dened category of elementary units recognised in FURs dened and used by a FSM for measurement purposes. Functional Size Measurement Method: A specic implementation of FSM dened by a set of rules, which conforms to the mandatory features of ISO/ IEC 14143 - part 1: A measure of the amount of information processing required to be carried out by the software [what the user wants the software to do, not how] and excludes the inuence of technical and quality requirements (ISO 9126).
YEARBOOK

ISO introduced a certication for FSM-Methods that complies with these denitions. Using a certied (standard) method contributes to increase reliability and integrity, important in IT Governance. At the moment only four FSM-Methods are certied by ISO: Function Points Analysis according IFPUG (ISO 20926), Function Points Analysis according NESMA (ISO 24570), Mark II Function Points (ISO 20968) and COSMIC Full Function Points (ISO 19761). All in this paper is valid for these standards. An organisation should investigate which method matches best the (future) process of software development.

138

THE

NESMA

ANNYVERSARY

In practice this means that the required functionality is measured more or less independent of the application that provides this functionality. Using the measurement model and the performance indicators, the impact of the provision of the requested functionality can be done in an objective way. Different applications offering the same functionality, different suppliers offering the same application or the same functionality on a different development platform, these kind of scenarios can be compared overall to support decisions on inhouse, outsourced or offshore development, project revision (budget or time overrun), migrate or enhance the application or look for a standard solution (COTS, package software). In CobiT terms: estimation and performance measurement based on functional size measurement supports the processes PO05, PO09, PO10, AI02, AI06, DS02, DS03, DS01, DS06 and M01.

Quantitative Project Management: Applicability of Performance Measurement


As seen in the previous paragraph, a number of CobiT processes are supported by consistent estimation, risk analysis and performance measurement. When an these indicators show their value in various project approaches and contracts. It is not important whether the principal or the supplier has measured the performance, it gives transparency and the decisions will be made on controllable and comparable starting points. In regard to corporate governance the performance indicators help to show the stakeholders that risk mitigation in IT costs is taken care of, in the business cases controllable time schedules and partly the calculation of ROI and it also possible to calculate the asset value of the software and the write-off on the software. Functional size measurement is also applicable in maintenance situations [14]. So it can be used for change management and in that respect the CobiT process AI02, AI06, DS03 and DS06 in maintenance situation and overall monitoring of the process (M01). The use of dened measurement (model, process) and a standard functional size measurement method provide a basis for the assessment of internal control (M02) and for an independent audit (M03). The last one opens also possibilities for quality control (PO11).
MANAGEMENT

organisation is familiar with performance indicators based on functional size,

139

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

In the next paragraphs some specic situations are explained to which extend the quantitative control information derived from FSM results contributes to management, service and contracts.

Scope Management
Scope management allows project stakeholders to treat software acquisition the same way as many other services, that is, pay for the service based on an agreed price per delivered unit. It is a xed cost rather than a xed price method. The delivered unit is the agreed upon size unit (new development, maintenance). The southern SCOPE [15] method, an implementation of this approach, suits the acquisition of software. It can work with both package customisation and custom development approaches to the provision of software. Based on the high level functional and technical requirements of the application software, the scope manager performs preliminary size measurement and from this provides early estimates of cost and duration. Based on the result the principal denes scope (priority) in relation to budget. The supplier with the best proposal (often the one with the lowest quoting price per size unit) gets the job. The scope manager controls based on functional size measurement the baseline functionality for the software, and thus the remaining project budget for software and the required delivery dates. Also the scope creep is controlled applying FSM in enhancement situations. In addition methods like Evolutionary Project Management (EVO) [7] support the development of software especially when the project starts with not clear dened requirements. EVO implies in every cycle a number of items relevant to the CobiT processes to assure optimal control on requirements and budget.
YEARBOOK

The scope manager services parties, principal and supplier, to get the best out of the project within budget, value for money. In Australia and Finland a (certied) scope manager becomes a common participant in a project. CobiT processes supported: PO05, PO10, AI02, AI06, DS02, DS03, DS06 and M01.

ANNYVERSARY

Service Level Agreement


In a SLA principal and supplier agree upon services and the conditions to perform these services. From the principal point of view with a SLA a number of CobiT processes is taken care of. A SLA minimises risks but is it is also a kind of

140

THE

NESMA

assurance so the opportunities are for the supplier. The selection of and negotiations with suppliers can be improved by making performance indicators an issue in requests for information or proposal. So decision will not be made only personal relations, fancy stories and faith. In order to full principals request the supplier has to something on functional size measurement and performance measurement. CobiT processes supported: PO05, PO10, AI02, AI06, DS03, DS06, PO09 and DS01.

Release Management
Application of functional size and performance indicators in release management in an operational environment. In case 1 the implementation and benets are described within a governmental organisation.

Case 1: Governmental Organisation


The IT department of the public organisation has to provide three releases a year. Due to budget limitations these releases have to be delivered with the with the release process, there were always problems getting the release ready in time. Most of the releases did not contain all of the agreed functionality. This had an adverse effect upon the next release. Introducing functional size measurement could help to make the release process more manageable. Three previous releases were sized with a maintenance approach [14]. With the size delivered and the hours spent the productivity rate was derived: 12.5 h/ fp. Based on that, the number of (maintenance)fp that can be delivered in one release was calculated. The available capacity was 128 man months per release, one man month is 120 hours (21.75 days * 8 h/day * 0.7 effective). To take summer holidays into account the IT department calculated with a man month of 110 hours for the summer release and 125 hours for the other three releases (ts nice with productivity rate as well). Experience in the last three releases showed that about 10% of the time was spent on maintenance of the previous release and about 15% was on emergency changes. The support of acceptance test and production test takes another 5%. This means only 70% of the time was effectively available for a release. In a regular release one man month equals 7
MANAGEMENT

available staff. The business departments and IT management were not happy

141

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

(m)fp, with 128 man months this is approximately 900 fp. The pilot release was limited to 800 fp. At rst the business departments showed little condence and were not pleased. The users had to agree upon a smaller than desired release and were aware of previous experiences. When the pilot release was delivered without the usual stress and contained all the agreed functionality, the departments became very positive. The four subsequent releases showed the same results. Due to downsizing of the IT department the releases are now smaller but because of improved productivity (11.2 h/fp) the releases contain sufcient functionality and match the users expectations. All in all the users are more satised than before and the release management process is under control. CobiT processes supported: PO10, AI02, AI06, DS03, DS06, M01, PO09 and PO10.

Outsourcing / Offshore Control


The same for the application of Outsourcing. In this specic example, described in Case 2, a SLA was also part of the deal. Monitoring the process (M01) and controlling performance and capacity (DS03) in regard to software development are the responsibility of the supplier.

Case 2: Utility Organisation


In this case a utility company has a system operational and a software services company has carried out the maintenance for over 10 years. Activities include software repair (bug xing), software enhancement, help desk and knowledge maintenance.
YEARBOOK

The management of the utility company wants to get an insight into the performance of the software services company to get a grip on costs. An IT service supplier was asked to assess the current contract between both parties and to draw up a blue print for a new contract between the two parties in which pricing would be based upon delivered performance. The rst step was to determine the size of the application. Because the system had been operational for almost 15 years and was not well documented, the sizing was done based on the user manual and the operational application itself. The size agreed upon was 6,900 fp. Time spent for sizing was 104 hours including preparation and reporting.

142

THE

NESMA

ANNYVERSARY

The performance analysis was based upon a comparison between some previous projects and the corresponding invoices, this analysis took about 40 hours. For drawing up the blue print of the new contract another 16 hours were needed. In total 168 hours was spent to get to achieve a contract that was acceptable to both parties. The basis of the delivered performance was a productivity rate of 8.0 hours / fp. Because of the architecture of the system (modelling and reusable routines) the productivity rate was xed at 6.5 hours / fp in the contract. The size of releases is the size according to a method to measure enhancements projects. The invoice of the software services company should state the delivered size in mfp. If required the size of the enhancement project could be audited. For the other activities performance indicators on a yearly basis were agreed upon. For knowledge maintenance, corrective maintenance and helpdesk the following performance indicators will be used: respectively 0.15 h/fp, 0.10 u/fp and 0.10 u/fp. If the utility company was to outsource these activities to the same supplier, one can expect synergetic advantages and work with an all-in indicator of 0.3 h/fp. updated if necessary. After the rst year the all-in indicator was updated to 0.35 h/fp. Average maintenance costs decreased by almost 10% and customer satisfaction increased. The latter was caused by the fact that estimated delivery time per enhancement project was more accurate and realistic. CobiT processes supported: PO05, PO10, AI02, AI06, DS02, DS06, PO09 and DS01.
MANAGEMENT

The performance indicators will be reviewed after a one years period and

Conclusions
Quantitative project management is not equal to IT Governance, but a great number of processes in the CobiT framework are supported. Functional size and performance indicators help to get control on provision of software. These measurements provide objective, controllable information to direct and control by adding value while balancing risk versus return over IT and its processes. CobiT supports the corporate governance initiatives as well as the ITgovernance initiatives, and Quantitative Project management ts right in. Quantications of facts can only result in a working system when all parties

143

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

know and accept these methods. Let's accept CobiT and do quantitative project management to support it.

About the author


Ton Dekkers (ton.dekkers@sogeti.nl) is working as a practitioner, manager and consultant within the area of software metrics and software quality for a great number of years. Within this area he specialises in estimation, performance measurement, risk analysis, priority management and QMap (Quality Management approach). He is a regular speaker both at national and international conferences and a trainer in software estimation, risk management and QMap: Quality Tailor-Made (QTM QMap in practise). Ton Dekkers is senior project consultant of the division Managed Delivery of Sogeti Nederland B.V. He is responsible for the Expertise Centre Metrics and R & D in the area of estimating & performance measurement.

References
1 Bloem, Jaap, Doorn, Menno van, Realisten aan het roer (Realists At The Wheel), ViNT, Sogeti Nederland B.V., 2004. 2 Knowledge group IT Governance, NOREA, IT-Governance een verkenning, June 2004. 3 Turnbull Report, Guidance for Directors on the Combined Code, UK, 1999. 4 ISACA/IT-Governance Institute, CobiT third edition, Control Objectives, July 2000. 5 Gerrard, M., Creating An Effective IT Governance Process, Research note COM-21-2931, 2004.
YEARBOOK

6 Solingen, Rini van, Berghout, E., The goal/question/metric method, McGrawHill, Columbus (USA), 1999. 7 Gilb, Tom, Competitive Engineering, manuscript July 15, 2003, ch. 1 (Planguage) - ch. 10 (EVO), http://www.gilb.com. 8 ISBSG, The ISBSG Estimation, Benchmarking & Research Suite (release 8), International Software Benchmark Standards Group, 2003, http:// www.isbsg.org.au. 9 IFPUG, Function Point Counting Practices Manual, version 4.2, International Function Point Users Group, 2004, http://www.ifpug.org.

144

THE

NESMA

ANNYVERSARY

10 NESMA, Denitions and counting guidelines for the application of function point analysis A practical manual, version 2.2, Netherlands Software Measurement user Association, 2004 (in Dutch), http://www.nesma.org. 11 UKSMA, MK II Function Point Analysis, Counting Practices Manual, version 1.3.1, United Kingdom Software Metrics Association, 1998, http:// www.uksma.co.uk. 12 COSMIC, COSMIC FFP Measurement Manual 2.2, Jan. 2003, http:// www.lrgl.uqam.ca/cosmic-fpp. 13 ISO ,Information Technology - Software Measurement - Functional size measurement: ISO/IEC 14143-1, International Organization for Standardization 1998. 14 Dekkers, Ton, (Extended) Functional size measurement methods are also applicable in enhancement projects, Software Measurement European Forum - SMEF 2004, January 28-30, Rome (Italy), 2004. 15 Wright, Terry, SouthernSCOPE, Victorian Government Australia, http:// www.egov.vic.gov.au.

145

IT

GOVERNANCE

REQUIRES

QUANTITATIVE

(PROJECT)

MANAGEMENT

146

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

10

FPA, MORE USEFUL NOW THAN EVER BEFORE!


JOLIJN ONVLEE AND ADRI TIMP

unction point analysis, does that still exist? We hear this question

frequently. Noticeably, almost everybody in the industry has heard of FPA. Many associate FPA with the hype of the eighties which was linked closely with the development tools in use then. But now the world is different: object oriented design, component based development, rapid application development and web applications, followed by, consequently FPA is not usable. We will not deny that the world has changed but FPA has withstood the test of time. Moreover, the position of software metrics in general and FPA specically is stronger in 2004 than ever before.

ISO-recognition
Starting in 1986 FPA has spread, slowly but surely, throughout the world. The method was originally developed by IBM in the United States in 1979. In 1986 the user group IFPUG (International Function Point Users Group) was founded in (EFPUG, now UKSMA) and Australia (ASMA) in 1989. These countries have worked together intensively to attract attention for FPA and software metrics. FPA took off in many countries. It started slowly. Through the development of internet the dissemination of FPA really started to get going. The rst important international milestone was hit with the cooperation of several countries in 1998. ISO published the generic requirements which a method called Functional Size Method needed to satisfy. After the necessary adaptations and modernizations, the FPA method was recognized by ISO as the international standard for the scope denition of user requirements in 2003. This process of obtaining the ISO-acknowledgement has a surprising side effect. The management attention for FPA is suddenly showing a world-wide
BEFORE

the USA, followed by the Netherlands (NEFPUG, now NESMA), Great Britain

149

FPA,

MORE

USEFUL

NOW

THAN

EVER

growth. FPA is no longer seen as a toy by developers, but as a proper management tool. Increasingly international conferences about SPI (Software Process Improvement), Quality Management, Project Control and CMM, present FPA as a method to provide more predictable pronouncements concerning system development processes, and to enable agreements on measurable objectives to be made in client / supplier relations (software houses, outsourcing, contract management).

International rise
The process of acquiring ISO acknowledgement has also boosted the international spread of FPA. There are presently more than twenty countries with independent user platforms like NESMA. The important new players are countries like Brazil, India, Japan and South-Korea. People are convinced that a software metric such as FPA is an indispensable instrument to increase the credibility of ICT in the world. Concrete indicators and parameters instead of woolly ICTjargon, is the maxim. Characteristically, FPA offers a unit of measurement (the function point) for systems and projects that is independent of the technical environment. It is a unit of measurement that users, projects leaders and developers can all understand and handle. FPA does this at an abstract level and in the language of the user by referring to the user functionality that the system offers in terms such as: New account, Process payment, Create customer, etc. Notice that not a single word of jargon is used here. Each function supplies a number of function points. The sum of the function points of the system to be developed is the system functional size.
YEARBOOK

Project tenders increasingly ask for function point counts or an indication thereof. Each system has a specic number of function points. A supplier regularly has a different interpretation of user requirements for a system. If a system really has a size of 6000 FP, but the cheap suppliers base their proposal on 3000 FP then, sooner or later, the supplier will get a nasty surprise. This risk can be avoided if the customer includes a request in the RFP (request for proposal) that any offer brought out should supply a Function point analysis. Only then will it be clear on what the proposal price is based and the level of insight the developer has in the system.

150

THE

NESMA

ANNYVERSARY

The size (the number of function points) of the system to be created is thus an important indicator in the tendering phase. A second key factor is the number of hours per function point that a supplier thinks he needs. The third one is the hourly tariff. In this way, the three separate indicators (size, productivity and cost) are available in the proposal so that contract negotiations become more concrete. A transparent cost structure gives both the customer and supplier the opportunity to be open and above board in reaching a mutually acceptable deal. All of this makes FPA an appropriate tool for monitoring the progress of projects and to determine if customers are getting value for money.

Trends anno 2004


The upcoming FPA countries, Brazil and South-Korea are a show case. In both countries they have succeeded in bringing about their governments adoption of function point calculation as a requirement for tenders precisely for the reasons of transparency as described above. There is a common misconception concerning FPA and it is that it can only be used it if you fully specify the application that is to be developed. The parties involved, especially in cases of contract management, want to have an indication as to the function count of the application as soon as possible. Much international research is directed at the analysis of the reliability of methods which, based on high level user requirements, approximate the system size in function points as closely as possible. The NESMA has carried out ground breaking work in this area with its methods. The indicative method is even known in international literature as the Dutch method, where the word Dutch is not being used in a pejorative sense for a change. These methods are successfully being used for early function point counting throughout the world. Recently, a leading American company even revealed at an international congress that they used the indicative function point count when they had to estimate the size of the installed base within a reasonable margin for error at a customer site. The existence of these methods for calculating function points early on in a project has contributed to the increased use of FPA in contract management with outsourcing and in xed price contracts based on function points.
BEFORE

estimated function point count and the indicative function point count

151

FPA,

MORE

USEFUL

NOW

THAN

EVER

The outsourcing of counting function points is a new phenomenon as well. Organisations that do not have enough expertise with system sizing in function points have independent specialists do this according to a xed price per function point and a xed duration. COSMIC-FFP (Cosmic Full Function Points) is a new method for functional size measurement. This method was developed during the last six years and expresses the size of the system in CSU (cosmic size units). The method was developed with the typical characteristics of real time systems in mind. The COSMIC-FFP method is more laborious than FPA, certainly for administrative applications. This method was also certied by ISO in 2003. NESMA sees the possible application of COSMIC-FFP in sub-sections of systems development. A work group has been formed by NESMA that oversees COSMIC-FFP in the Netherlands. The rise of SPI processes and quality models such as CMM and CMMI see to it that companies want their process to be repeatable but also under control. They set themselves specic productivity goals (such as hours/fp by the year 2009 or we will outsource these systems at 3 FTE for 10,000 FP). The use and application of software metrics is necessary to achieve this.

Modern development environments


FPA is not difcult to use with modern development methods such as object orientation, CBD, RAD and DSDM. The use of FPA is independent of the development environment and methods. FPA measures the size of functional user
YEARBOOK

requirements. A development method is a roadmap leading to a goal. The goal is a working system that satises the requirements of the user. The methods that have been certied by ISO do not measure the path to the goal (development method) but they measure the goal itself. The models created by a method (such as ERDs, use cases, object diagrams) are only representations of the user requirements. They have nothing to do with the application of measurement methods such as FPA and COSMIC-FFP.

152

THE

NESMA

ANNYVERSARY

FPA mandatory for government contracts


As already mentioned, function points have been mandatory for government contracts in South Korea and Brazil since 2003. These governments want a better grip on software development contracts: FPA offers an clear shopping list of the functionality to be developed; FPA creates a clear measurement of the functionality (function points); FPA makes contracts transparent (size and the proposed norm for hours/fp). Through doing this, these governments want to achieve more control over the development processes and the deliverables. They do not want to have the feeling that they are at the mercy of the software houses. The software producers cannot have things all their own way any more. The success with the governments of South Korea and Brazil has motivated other countries to do the same. The DoD (Department of Defence) in the United States is currently being lobbied, along with other departments. History teaches that the DoD is very sensitive to the need for clear agreements with its suppliers on all fronts. The chance of success is therefore anything but remote. You can see a world wide trend in which national institutes concerned with software metrics and FPA are becoming less timorous and seek the limelight. In the Netherlands this trend is clearly visible with the NESMA. NESMA has recently published a modernised version of the manual with FPA counting guidelines that are now 100% in line with the ISO standard. NESMAs goal is to have the use of as well. Also, it will cooperate with other European countries to get a foot in the door in Brussels. The importance of the unambiguous ISO standard can be illustrated by the following example. A Dutch software house has closed a contract with a ministry for the development of a salary system. It was stated in the contract that payment will take place based on the delivered function points. A price per function point was agreed to this end. The software house had used function points for years and, over time, they had developed their own set of guidelines such as the counting of function points based on a data model in the third normal form. The software house took
BEFORE

function points made a requirement for government contracts in the Netherlands

153

FPA,

MORE

USEFUL

NOW

THAN

EVER

as their starting point their own guidelines because their project productivity rates are based on them. The general terms and conditions of the contract stated that the standards of the software house would be applied. However, the ministry assumed that the counting would take place according to the NESMA guidelines. The issue surfaced at the time of delivery, the moment that the number of function points was xed, and from this the price determined. After consulting an FPA expert it was concluded that the software house was formally in the right. Eventually it was all settled amicably. The example does show however the importance of a generally accepted standard that contracts can refer to.

One measure worldwide


Achieving ISO recognition for FPA in 2003 was an important result of worldwide cooperation. Since 1990 the premise of the country user platforms has been that the function point, just like the meter unit of measurement, can stand surety for the same amount of functionality anywhere in the world. Initially there were a variety of differences between the FPA versions. For example the USA used FPA to determine productivity on completion of projects. The counting guidelines of the IFPUG were consequently tted to the counting of physical items such as programs and les. Since its formation NESMA has tailored the counting guidelines to the functionality and user requirements, and thus the determining of the size of the system in advance. Due to international developments and the trend towards bringing projects under control, the need arose in the USA to apply functionality based FPA. Since 1990 there has been an intensive and constructive cooperation between NESMA
YEARBOOK

and IFPUG (USA). Over time IFPUG has adopted the guidelines recommended by NESMA where differences existed. The last major points of difference were smoothed out in 2003. Therefore the new release of the IFPUG manual (01-022004) does no longer contain any substantial differences with NESMA recommendations.

ANNYVERSARY

NESMA

Background: short historical sketch of the measurement phenomenon in ICT


Measurement within ICT is a process that has made only slow progress over the years. Everybody is convinced that the measurement of software

154

THE

development processes and information systems is important. Many a company presents itself in a better light than is the reality. Many companies wrestle with the questions what should be measured and how. Organizations have to overcome a number of obstacles to develop an effective measurement programme. First, the endless discussion about the measurement itself: which metrics should be used and how do you measure the metrics. Secondly, the volume of measurement data can become a impediment: there may be insufcient data available to achieve a reliable insight. Finally, the organization to be measured is an obstacle: the commitment to the measurement process may be promised but not fullled. The rst obstacle, as already mentioned, concerns the measurement itself. Three important metric categories can be distinguished in ICT: The size of the software or the application; The quality level of the software targeted at the nal product evaluation; The process quality, which is the measure of maturity of the process used to develop the software, such as productivity. Let us zoom in on the rst category: to determine the size the Source Lines of Code (SLOC) and the Function Point Analysis (FPA) are the most utilized techniques worldwide. As FPA is within the scope of this article here follows a short history. Function Point Analysis was developed by IBM in the seventies as a method for sizing, estimating and monitoring software development. FPA was presented IBM/Share/guide symposium in 1979. The rst international publication written by Albrecht appeared in 1981. Albrecht and IBM introduced a major amendment to the original theory in 1984. The International Function Point Users Group (IFPUG) was founded in 1986 and was followed in 1989 by the NEFPUG (which became the NESMA) in the Netherlands. The supervision of the method is now in the hands of the user groups in the individual countries. FPA is based on the intended functionality of an information system and is therefore a functional sizing method. The idea behind FPA is the measurement of the functionality, in other words the quantity of data processing the user will have at his disposal in the application. Furthermore, the system is considered a
BEFORE

and discussed in public for the rst time by the originator Allan Albrecht at an

155

FPA,

MORE

USEFUL

NOW

THAN

EVER

black box, the technology behind the system is ignored. This has a great advantage; applications can be compared with each other regardless of the (technical) environment in which they are developed. Precisely this independence from the technology used and the fact that the function points can be determined early on in the system development process, has meant that FPA is still gaining ground on SLOC and will over time replace it as a unit of measurement.

Benchmarking
A technique that makes use of external comparisons to better evaluate current performance and identify possible actions for the future. Benchmarking is a growing phenomenon. An ICT company or department can compare itself with similar organizations. Commercial advice bureaus such as Gartner have developed services in this area. Comparisons are done using FPA along with other techniques. Tools are also available in the marketplace, in which project data can be gathered and people can reect on their own projects and organizations. There is also a non-prot initiative (ISBSG) by software metrics organizations worldwide. The goal of this initiative is to furnish standard, veried, up to date and representative software benchmark data for current technologies. ISBSG stands for International Software Benchmarking Standards Group, a users group established in Australia. Leading national umbrella organizations in the area of software metrics and benchmarking work together within the ISBSG to gather and analyze productivity data: Australia (ASMA), Germany (DASMA), Finland (FiSMA), Great Britain (UKSMA), Italy (GUFPI), Japan (JFPUG), the
YEARBOOK

Netherlands (NESMA), USA (IFPUG), Spain (AEMES), India, South Korea and Switzerland (SWISMA). Analysis means relating the size of a project (dened in function points) to the required effort and the discovery of trends in specic environments or types of project. ISBSG strives towards this goal through: The exchange of knowledge about software benchmarking; The development of standards and tools for software benchmarking; The stimulation and support of consistent application of software benchmarking.

156

THE

NESMA

ANNYVERSARY

A database of project productivity rates has been compiled with over 2000 projects. Companies can send in their project data by lling in the Project Questionnaire. You can also use this questionnaire to nd the kind of data that must be gathered to create a good metrics database. When a company supplies project data they receive a free analysis of where the project stands compared with the database. The data provided is treated in the strictest condence and anonymity by the ISBSG. There is one central database where project data is gathered. The data is made available anonymously to universities for research purposes. A yearly report is published containing a comparison of the quantitative and qualitative characteristics of software projects and products. A strong increase in the number of projects supplied has been seen in the last six months.

Certication
People can achieve FPA certication conforming to IFPUG or NESMA guidelines. IFPUG has a certication programme leading to CFPS (FPA: Certied Function Point Specialist). Currently there are 477 CFPS certied specialists worldwide. In the Netherlands, you can be certied as CFPA (Certied Function Point Analyst) since 1998. The responsibility for certication is wholly owned by the examination institute EXIN, thus safeguarding independence and professionalism. CFPA was included in I-tracks in 2004. I-tracks consists of a range of complementary courses, practice oriented exams and study material. Those certied are capable role with FPA, the areas where FPA can be applied and the relevance of FPA to these areas. They can also deploy FPA in quality control, for example in the assessment of a functional design. The names of the certied specialists are published at the EXIN website. Currently there are a total of 80 names included in this list. A certicate is valid for three years, after which the exam has to be taken again. CFPS and CFPA exams are of an equivalent level as far as knowledge is concerned.
BEFORE

of independently creating an FPA. Moreover, they know the concepts that play a

The introduction of a measurement programme within organizations is frequently underestimated. On one hand you need time to build up your

157

FPA,

MORE

Establishing a measurement programme is not a sinecure

USEFUL

NOW

THAN

EVER

historical data and on the other hand the impact on existing working practices is greater than expected. You are often confronted with the idea that by running a function point training the measurement program is as good as implemented. Measurement has in fact not been solidly established in the working practices and apart from a few exceptions the initiative is doomed to a slow death. The collection of even a few quantitative data points is difcult to get going. Just like it is common practice to produce a project plan for each project, an evaluation must take place at the completion of each project. It does not necessarily have to take much time; these are data that every good project leader should have available at the end of a project. Here you can consider a minimal set of the following metrics: the size in function points, the actual time spent on the activities, the number of defects found during the acceptance test, the duration time for the phases and the development environment. Keep it simple, especially in the beginning. The commitment from management is essential with this kind of implementation. This should be seen in concrete action, such as freeing up budget, management team attention for the measurement programme and making the reporting of the function point count for the system and the changes to the system mandatory at the start of a project (for example with the budget submission). In the beginning people will have to chase up the evaluation data until it becomes routine. How long it takes before it becomes routine depends on the level of maturity of the organization. It is important to have staying power! The measurement of projects and products is not a one time activity but a continuous process without end. Once the process is established people will experience that with little effort
YEARBOOK

much valuable management information becomes available.

CMM and metrics


It is obvious that measuring plays an important role in the CMM/CMMI model. CMM recognises ve levels of maturity: Initial, Repeatable, Dened, Managed and nally Optimizing. Within the maturity levels Key Process Areas (KPA) are differentiated. KPAs indicate the topics that have to be structured within an organization to achieve a particular maturity level. Each KPA denes a cluster of related activities that, when they are executed together, represent a maturity level.

158

THE

NESMA

ANNYVERSARY

At level two (Repeatable) the metrics are mostly related to the output of the projects, the effort and the planning. These data are named, collected and retained. The data across multiple projects are exchanged via informal channels. The metrics at this level are especially linked to the KPA Software Project Tracking and Oversight. At level three (Dened) each project has its own development plan based on the standard working practices with the organization. Measurements are collected and determined for each project but are stored in the database at the organizational level. The data collection can be different for each project but the measurement data in the organizational database are well dened. At level four (Managed) there is a standard set of metrics dened by the organization derived from the standard development process of the organization. All projects gather the standard set of data and they are stored in the organizational level database. The data are used by the projects to gain better control and stabilize the process in a quantitative manner. At the organizational level the data are used to dene a process capability baseline. You nd references to metrics at the level of the following KPAs: Quantitative Process Management and Software Quality Management. At level ve (Optimizing) measurement data are used to indicate the areas where technological and process improvements can be made. They are also used to plan and evaluate these improvements. Within each KPA there are common features dened. One of these common features concerns the standard: Measurement & Analysis. This gives examples of what should be measured to be able to assess the level of maturity. When you look at the measurement programmes of organizations then you can also discern maturity levels within them. The table below shows the rst three levels:
BEFORE

159

FPA,

MORE

USEFUL

NOW

THAN

EVER

Themes Formalizing the development process

Initial

Repeatable

Dened

Process unpredictable Project dependent on professionals Little or no process focus Few or none

Projects repeat dened tasks Process is dependent on experienced people

Processes are dened and understood

Formalizing the measurement process

Formal procedures implemented Metrics standard developed Used in projects by experienced people Project estimation method exists Metrics have a project focus Data are available on a project basis

Dened metrics standard Standard applied

Scope of the metrics

Used randomly by projects or not at all

Date gathered and retained Specic tools for gathering data Metrics have a product focus

Support

No historical data No database

Product-level database Standardized database across projects Product level metrics and the management thereof operational Product-level metrics and control

Evaluation of the metrics


YEARBOOK

Few or no metrics

Project metrics and management thereof operational Some metrics support the management Basic monitoring

NESMA

ANNYVERSARY

Metrics support for the sake of management control

Management not supported with metrics

It has to be emphasized that software metrics maturity and general process maturity are not the same. Metrics maturity is only a dimension of process

160

THE

maturity. There is a clear connection between the two; it cannot be that an organization at level two would have a measurement level of 1. The opposite can be true. When an organization is at level 2 the measurement level can be higher. In other words, a specic level of measurement is required to achieve a particular process maturity level. Informative websites: www.exin.nl www.nesma.nl www.isbsg.org www.functiepuntanalyse.nl www.cosmicon.com

Summary
In this article we have touched upon a number of topics around measurements and FPA: covering the history up to the latest international developments and the current state of affairs, as in ISO-recognition; discussing the setting up of measurement programs through to benchmarking; covering measuring within CMM and certication programs. All in all, we can draw the conclusion that within this wide range of topics, the function point has become the worldwide unit of measurement of
BEFORE

functionality, enabling an improved comparability of productivity data.

Jolijn Onvlee (ooa@onvlee.com) works as an independent consultant at Onvlee Opleidingen & Advies, and within the FPA Expertise Centrum. She is a board member of the NESMA (Netherlands Software Metrics Association) and a member of the NESMA FPA counting practices committee and of the COSMIC-FFP committee. She is a member of the EXIN-examination commitee. Adri Timp (a.timp@interpay.nl) works for Interpay Nederland bv. He is chairman of the FPA counting practices committee of the NESMA and vice-

161

FPA,

MORE

USEFUL

NOW

THAN

EVER

About the authors

chairperson of the FPA counting practices committee of the IFPUG (International Function Point Users group). He is chairman of the EXIN-examination commitee.

162

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

11

BENCHMARKING OF APPLICATION
MAINTENANCE AND SUPPORT
THEO KERSTEN

or years now IT-projects have been benchmarked. Productivity studies are

conducted especially in application development projects. There is, however, no public benchmark available for application maintenance and support. This article focuses on the benchmarking of applications, in particular the productivity of application maintenance and support. It aims to highlight the issues surrounding the benchmarking of maintenance and support, and the prerequisites for benchmarking, rst in general, later focused on the so-called ISBSG method. This article tries to offer support to beginners and those already working with developments. An additional advantage of benchmarking application
SUPPORT

a method. It will also offer a general survey of what is available and the latest maintenance and support lies in the quantication of the concept of 'quality', resulting from an extensive benchmark in which defects and other issues are registered as well.

The structure of this article


The second section of this article will deal with general issues in benchmarking and the benchmarking of software in particular, and it will indicate the application in current projects. Section 3 will cover how to benchmark application maintenance and support and the specic issues surrounding this topic. Section 4 will explain how, on this basis, the ISBSG (International Software Benchmarking Standards Group) has developed a benchmark. Finally section 5 will cover how to make progress in dening application maintenance and support in the Netherlands, and specically within NESMA. The appendix contains references to the most relevant literature.

165

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

ISBSG
The ISBSG is a non-prot organisation of co-operating Software Metrics Associations and Function Point Groups. This group has developed a denition of the requirements for benchmarking IT-projects, which concerns especially the building and enhancement of applications. On this basis a questionnaire was developed to register all required items. The data can than be forwarded to the ISBSG. After validation the ISBSG will include these data in a database, and make it possible to compare similar projects. In order to do this a project prole report was created. After sending in a questionnaire the submitter will receive a project prole report in return, in which his project is compared with similar projects. The ISBSG also supplies information based on the database content ([4] to [7]).

Size and effort


This article focuses on benchmarking the productivity of building and maintaining application software. In measuring productivity at least two measures have to be related to each other: a functional size and a measure of effort. Many studies have been made into building applications, leading to the conclusion that 'hours per function point' are the best measure of productivity. This measure is widely accepted by now. There have been considerably fewer studies covering application management, and so the number of tools for metrics support is limited. Here, productivity is based on the number of managed function points per person, measured for example as FTE (Full Time Equivalent).
YEARBOOK

Function points are the best measure for application projects and management.

Project: hours per function point


The size of an application is indicated by, for example, Lines of Code or Function points. The concept of Lines of Code, or Source Lines of Code, can then be specied further as Effective Lines of Code (ELOCS), in order to count only real contributions to productivity while excluding comment lines etc. Furthermore, LOCS are usually based on counting only codes created by hand. Generated code differs from code created manually. The size of hand-written code is a measure of productivity, where generated code is not. In addition, function points have come

166

THE

NESMA

ANNYVERSARY

to be used as a measure of functional size. This is now almost universally accepted as a better size for application software. The effort can be calculated as some measure of time: hours, months or years. This often leads to a discussion on the denition of a man-month. Hours give fewer discussions, but it will of course remain tricky to determine which hours to include: direct hours or indirect hours. Benchmark studies spend a lot of time on properly dening concepts so as to provide an unambiguous picture. In practice this means that considerable knowledge is presupposed in order to generate benchmark data. Everybody wants to keep things simple, but a simple productivity measure like hours per fp can lead to major misunderstandings for the above mentioned reasons. These are the pitfalls of benchmarking. This article does not deal with them. In this document we want to focus on a method to measure application maintenance and support.

Maintenance: number of function points managed by 1 FTE


Within projects a function point is the best measure of size, this is true for application maintenance and support as well. However, we use man-years for effort measurement instead of man-hours. This is a measure which, when expressed in hours, can result in enormous differences. Though it is a clear-cut concept. 1 Man-year is also often referred to as 1 FTE (1 Full Time Equivalent) Based on these two measurements, the management productivity measurement is expressed in the number of function points managed by 1 FTE.
SUPPORT

Project Benchmarking
Productivity has been studied for longest in relation to projects. An important publication is Software Engineering economics by Barry W. Boehm (1981). It featured the rst extremely systematic study of productivity. The COCOMO model developed as a result is in its second revision. Amongst further studies another relevant one is by Putnam, as described in Controlling Software Development (Larry H. Putnam en Wayne Myers, 1996). He found that the productivity curve was a Rayleigh curve. Besides the two measurements size (in ELOCS in Putnam's rst study) and effort (hours), Putnam considered duration the most relevant. His research focussed on resource allocation in a project over time. He discovered this

167

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

to be similar to the course of hardware projects, where the Rayleigh curve had been in use for some time. Benchmarks are also performed by research bureaus contracted for this purpose, such as for instance the Gartner ADS benchmark (Application Development and Support). The users will still be left with the question: 'What exactly was compared?' For this purpose the collective Software Metrics Associations of several countries - amongst others the NESMA, the UKSMA (United Kingdom Software Metrics Users Association), the ASMA (Australian Software Metrics Users Association) and the function point user groups - as for instance the IFPUG (International Function Point Users Group), and the JFPUG (Japanese Function Point Users Group) - have decided to co-operate using the name of ISBSG (popularly pronounced as 'icebags'). Information about the ISBSG and this mutual co-operation can be found at the ISBSG 's website (www.isbsg.org). ISBSG has developed a database of 2.500 projects. For a small fee these data can be acquired in the form of an Excel spreadsheet, so one can use them to do a benchmark oneself. A reality checker for projects was also developed on this basis. Information on this reality checker can be found on the website mentioned earlier, as well as at the NESMA website. Everybody can send in new projects so the database is still expanding. After checking the reliability of the information, a project is included together with the reliability indicator.

Productivity attributes in projects and management


YEARBOOK

As indicated in the last section, the two measures size and effort are not alone in determining productivity. The environment also has a strong inuence on the number of hours spent per function point. The following issues come to mind: The important factor duration mentioned earlier; The hardware environment used (PC, mini, mainframe or combination); The programming environment used (3GL, 4GL, generator); The importance of the application to the company; The number of users;

168

THE

NESMA

ANNYVERSARY

The type of company; The type of application; The number of interfaces; The characteristics of the project phases. Attention is paid to these points in the brochure Profonder Productivity attributes (NESMA, 1995). Any tool gives rise to the question as to how far environmental factors have been taken into account and to which extent do we compare projects under the same conditions. The ISBSG makes inquiries into all these issues and examines the database as far as the productivity attributes concerned have been reported. The ISBSG analysis for projects resulted in several reports: The Benchmark Release 6 (ISBSG, 2000); Practical project Estimation (ISBSG, 2001); The software Metrics Compendium (ISBSG, 2002); The Benchmark Release 8 (ISBSG, 2004).

Maintenance directed by size or by number of defects


In the preceding text we conveniently assumed that management productivity can be expressed in a measure of size, namely function points, and an effort, namely man-years. A totally different approach will arise when one considers that application software management is also to a great extent inuenced by the number of defects in the software. Some will thus take the number of detected defects as the starting point for the management effort. These are actually the two most common choices. There is of course a relation between the defect count and the size. In projects, as well as management, quality as an issue is developing as a separate subject in benchmarking, next to productivity. Measuring of the size (in function points), increased with the number of defects, supplies a good indication for the quantication of quality. Benchmark data from many commonly used tools as well as the ISBSG benchmark, will thus generate a useful contribution to the quantication of quality.
SUPPORT

169

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

In projects we see this in the number of defects found in the rst month of production. Maintenance deals with the number of defects per thousand function points per year. Usually an error categorisation is applied, such as critical, important, normal and minor. The combination of size and number of defects gives an indication to the quality of the application software.

Maintenance and support benchmarking


We have seen the importance of keeping environmental factors/productivity attributes the same with projects. It is important in projects to establish to which project phase the number of hours per function point should apply. Within a project one can dene the following phases: Feasibility study; Specication phase; Technical design phase; Build phase; Unit test; System test; User acceptance test; Implementation. Especially the phases starting with technical design to system test can be executed in a production line manner, and are best suited for benchmarking. The scope of the phases can differ per company. Many disciplines work together and time tracking is often kept separate. In management we found a similar situation. The maintenance of applications can be seen as a number of services.
YEARBOOK

This will be elaborated upon in the next section.

Services
In management a rst distinction can be made between: Process Maintenance; Functional Maintenance; Technical Application Maintenance; Operations Maintenance; Network Maintenance; User Maintenance.

170

THE

NESMA

ANNYVERSARY

This key classication is especially important in dening application maintenance and support so as not to include time spent in other areas. There is a strong similarity with project phases, where you select production line phases as best suited for benchmarking. In maintenance one looks for production linelike services. This list does not include extra services such as: Queries; Quick Services; Answering (user) enquiries; Training. These are after all not mandatory for application management. Without these services management will roll on. These services are not indispensable to 'keep things going'. They are often referred to as 'Support'.

Application maintenance
After specifying which services do not fall under application management, it is important to recognise which do. The core of technical application maintenance exists of the following sub-services: Corrective maintenance; Preventive maintenance; (Minor) enhancement maintenance; Perfective maintenance.
SUPPORT

Scope of maintenance and project


The last two services in application maintenance (minor) enhancement maintenance and perfective maintenance - can blur the area between maintenance and project. That is why larger companies often make use of release management. As a rough distinction between common terms like xes, upgrades en releases, one can say xes relate to corrective/preventive maintenance, and releases to projects. Upgrades will then mostly be dened as interim changes that can not wait till the moment the release is ready. Upgrades can be dealt with as minor enhancement maintenance in application management, but also as a project. If minor enhancement maintenance is ascribed to application maintenance, it will

171

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

be possible to control this service, just as with projects, on the basis of agreements about hours per function point. In the Netherlands sometimes hours per enhancement function point are used. At the moment, however, this complicates international comparisons. When enhancement function points are adopted by the IFPUG as well, then internationally comparative material will be created.

The ISBSG Questionnaire, ISO/IEC 14764 and ITIL


An important problem in benchmarking application maintenance and support is the fact that the methods in maintenance are far less standardised than in projects, where one nds uniform project phasing. The best-known standard in maintenance is ITIL. On the one hand the ISBSG questionnaire follows the terminology used in ITIL. On the other hand it also follows ISO/IEC 14764, which is the ISO-standard for information technology software maintenance. The ISBSG questionnaire Data Collection Questionnaire Application Software Maintenance and Support (ISBSG, version 1.1, 2003) can be found at www.isbsg.org, under 'Maintenance and Support Collection Package'.

The services
ISBSG rst makes a distinction between Development and Enhancement on the one side, and Maintenance and Support on the other. Maintenance comprises: Corrections; Corrective maintenance; Preventive maintenance.
YEARBOOK

Enhancements: Adaptive maintenance; Perfective maintenance. In order to dene the scope of project and management, the ISBSG has opted for enhancements to mean modications taking fewer than ve day's work. Support comprises: Problem analysis; Queries; Quick service;

172

THE

NESMA

ANNYVERSARY

User support and assistance. The most important data about these services are therefore the number of hours per service. Furthermore a distinction is made according to the type of maintenance: Application software maintenance; Functional maintenance; Operational maintenance; User maintenance. Furthermore, the number of interventions triggered by these services has to be mentioned as well: The number of incidents for corrective maintenance; The number of problems to be solved for preventive maintenance; The number of problems to be analysed for problem analysis. This will enable an analysis based on the number of hours per service on one side, and the number of triggers on the other side. Then for example, it can be analysed whether corrective maintenance correlates better with the defect count or with the functional size of the application.
SUPPORT

Productivity attributes
Earlier in this article a number of productivity attributes where mentioned. In the ISBSG repository the following productivity attributes (amongst others) have to be registered: Type of organisation; Type of application; Age of application; Importance of application; Hardware platform; Programming languages; Technical size (lines of code); Error count per period; Total number of calls to Helpdesk concerning application; User experience; Documentation;

173

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

Application change rate per period; Environments alongside the production environment (development and test environment amongst others); Use of test tools; Number of interfaces; Interface similarities. All terms are dened in the questionnaire itself.

The future: recording experiences in ISBSG


Literature about application maintenance and support experience can only be found sporadically. The number of modules and tools for maintenance estimates and evaluations available on the market is minimal. There are ITIL modules on the market which sometimes have metric features available. The ISBSG questionnaire could play an important role in benchmarking maintenance. An frequently consulted project database was created in only a short time, which is available to everybody at a low price, for research and also for benchmarking purposes. The accessibility is high, due to data being available in spreadsheet format. This enables everyone to link the data easily with personal benchmark data to gain a better insight into productivity as well as quality. The rst completed maintenance questionnaires have reached the ISBSG. During the latest workshop of the ISBSG, in Bangalore (India) in September 2004, plans were made for the next steps. In the meantime the ISBSG tries to make the questionnaires conform even more to best practice and international standards.
YEARBOOK

In order to do so it is being researched how to make the questionnaire conform better to the ISO standards, and what can be achieved with certication in this area.

ANNYVERSARY

Experiences in the Netherlands


The Netherlands are also represented in the ISBSG, NESMA has been involved from the start. There is a separate NESMA benchmarking group that in the rst instance tried to develop a Dutch benchmark, but in the end decided to join in with ISBSG. This group monitors the benchmarking activities in the Netherlands. Most large companies have several benchmark tools and try to match their

174

THE

NESMA

standards to the tool standards. On the one hand experience shows that this will raise many questions and problems. On the other hand there turns out to be considerable conformity as to the manner in which projects and maintenance are organised. With projects the methods followed appear to be more in agreement than with management. So it is just as well there is an association like ITSMF (IT Service Management Forum) that monitors standardisation in management. In 2004 ITSMF Netherlands characterises itself as the one and only platform for IT-service organisations, clients and suppliers of IT-services. Its goal is to further the innovation and support the discipline of IT-service management and stimulate the exchange of knowledge with related elds. ITSMF is an association with 450 member companies, and it has the exchange of knowledge and experience between ITIL-practitioners as its core concern. It is a special association, where clients and suppliers are equally represented. It is benecial to the exchange of knowledge to have both parties represented. It gives ITSMF the capacity to offer a full and independent view of the eld in its activities, as it has the opportunity to use the knowledge of both groups. Consequently the NESMA Benchmarking group has joined forces with the ITSMF. Just like NESMA, ITSMF Netherlands has a good reputation internationally. We hope our co-operation will make us stronger.
SUPPORT

Personal experience
One way to start doing benchmarks is by rst registering the most important items in your environment and building up a personal benchmarking database. The items to be registered can be found in the ISBSG questionnaires. It is important to conform to the general denitions in order to enable comparisons. This entails scrutinising your denitions as to their relation with the more generic denitions. After gaining some experience it might be possible to ll out an ISBSG questionnaire and send it in. With this experience you might order an ISBSG CDrom to use the ISBSG data as external comparison material. This will be especially useful when you have acquired a feeling for the relation of your work to the ISBSG database. Then you might for example be able to make estimates concerning the impact of new environments on one's own company.

175

BENCHMARKING

OF

APPLICATION

MAINTENANCE

AND

About the author


Theo Kersten (theo.kersten@atosorigin.com) is IT consultant within ATOS Origin. He has more then 8 year experience in IT-metrics and is member of the ISBSG and the NESMA benchmarking group.

References
1 Software Engineering economics, Barry W. Boehm, 1981. 2 Controlling Software development, L.H. Putnam and W. Myers, 1996. 3 Productiviteitsattributen, NESMA, 1995. 4 The Benchmark Release 6, ISBSG, 2000. 5 Practical project Estimation, ISBSG, 2001. 6 The software Metrics Compendium, ISBSG, 2002. 7 The Benchmark Release 8, ISBSG, 2004. 8 Data Collection Questionnaire Application Software Maintenance and Support Version 1.1, ISBSG, 2003

176

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

12

COSMIC FULL FUNCTION POINTS, THE


NEXT GENERATION
FRANK VOGELEZANG

unction point analysis is a very solid method for measuring the functional

size of a complete data-driven information system. It is a proven method for a quarter of a century will still be useful for many years. But more and more we can see a trend that software developments do not deliver complete systems anymore but are becoming an assembly of components. We also see more and more devices with event-driven software instead of data-driven software. This asks for a next generation of functional sizing techniques to deal with this kind of software with a different nature than the software Albrecht designed function point analysis for. One of the more promising next generation techniques for functional sizing is the Full Function Point-method of COSMIC, the COmmon Software Metrics International Consortium. In this article the method will be described in brief and the advantages and disadvantages of this next generation technique will be explained to give you a clear picture when it is useful to deploy COSMIC Full Function Points instead of function point analysis.
GENERATION

Function point analysis is a very solid method for measuring the functional size of a complete data-driven information system. No method lasted for so long and gained such widespread acceptance. That's why function point analysis is celebrating its 25th anniversary this year. But the eld of software engineering has made a tremendous progress in these years. Today we see for example information systems that are composed of smaller components instead of being built as complete systems, embedded software in devices that is event-driven instead of data-driven, web-based software that only presents information

179

OSMIC

FULL FUNCTION POINTS,

Historical perspective

THE

NEXT

without direct data-related functionality and hybrids of all kinds. Nowadays a lot of software is being built by entirely different principles than for which Albrecht designed function point analysis in 1979. In 1994 a working group of ISO/IEC was set up to establish an international standard for functional size measurement. In 1997 this working group produced ISO/IEC standard 14143-1 which covered the general concepts of software functional size measurement. In a later publication (14143-3:2003) criteria were added to verify if proposed functional sizing methods were compliant with this generic standard. The current NESMA method for function point analysis is recognized as a valid functional sizing method in the ISO/IEC standard 24570 [1]. In late 1998, some members of this working group decided to develop a new functional sizing method, starting from basic established software engineering principles. This method should be equally applicable to data-driven business application software, real-time event-driven software and to infrastructure software and was aimed to be compliant with ISO/IEC 14143 from the outset. The development of this new method resulted in the foundation of COSMIC, the COmmon Software Measurement International Consortium. The rst public version of the method, COSMIC-FFP v2.0, was published in October 1999. Extensive eld trials were carried out in 2000 and 2001 [2]. COSMIC published its latest denition of the method, v2.2, in January 2003.

A next generation functional sizing method


Within the 1980s and 1990s, researchers have documented a number of theoretical aws in function point analysis. These studies had little impact on the practical value of the method, but it discredited function point analysis as a valid
YEARBOOK

scientic research topic. COSMIC Full Function Points (or COSMIC-FFP for short) is the rst so-called next generation functional sizing method that is specically designed to meet the generic scientic principles of ISO/IEC 14143 [3]. Its development started from basic established software engineering principles instead of empirical models and does not contain the theoretical aws found in function point analysis. It was designed to be able to meet the constraints of the many new (and complex) types of data-driven and event-driven software as well as the type of software served by rst generation functional sizing methods.

180

THE

NESMA

ANNYVERSARY

Functional user requirements


Software to be measured
Functional processes

Sub-process types
Data movements Data manipulations

AND

Figure 42: Functional User requirements. ag 167

For example COSMIC-FFP is able to recognize the use of different layers in software and is able to measure functional size from different measurement viewpoints, thus helping to overcome the uncertainty on what is meant by functional in the user requirements. It has also been designed to be easy to train, understand, and use with consistency without recourse to inter-related rules and exceptions. In the design of COSMIC-FFP also some of the concepts of of measurement. In addition, all the denitions within COSMIC-FFP are aligned with the international metrology vocabulary, as well as with measurement related standards dened by ISO.
GENERATION

metrology were introduced, such as the introduction of a fairly clear dened unit

As prescribed by ISO/IEC 14143 COSMIC-FFP derives the functional size of a piece of software from its functional user requirements. These are the part of the user requirements that represent the user practices and procedures that the software must perform to fulll the users need and do not include technical and quality requirements. Functional user requirements are known before the software engineering starts and are therefore a good starting point for estimation. They can be broken down into a number of functional processes that are independently executable sets of elementary actions that the software should perform in response to a triggering event. The elementary actions that software can perform are either data movements or data manipulations.

181

OSMIC

FULL FUNCTION P

OINTS

THE

The basic principles

NEXT

As a reasonable approximation COSMIC-FFP assumes that each data movement has an associated constant average amount of data manipulation. This approximation means that COSMIC-FFP is not suitable for algorithmic software because of the manipulation-rich nature of such software. For the vast majority of the currently developed software this is a valid approximation.

FUR

Functional process type

Data movement types

Figure 43: FUR.

With this approximation the COSMIC-FFP model of software is now that the functional user requirements can be broken down into a number of functional processes, which in turn can be broken down into a number of data movements. The data movements are the base functional components that will be used for establishing the size of the software. A data movement moves a unique set of data attributes (data group) where each included data attribute describes a complementary aspect of the same, single thing or concept (object of interest) about which the software is required to store and/or process data [4].
YEARBOOK

COSMIC-FFP distinguishes four different types of data movements:

Entry
An entry is a data movement that moves a data group from a user across the software boundary into the functional process where it is required. An entry does not update the data it moves. An entry is considered to include certain associated data manipulations (for example validation of the entered data).

182

THE

NESMA

ANNYVERSARY

Write
A write is a data movement that moves a data group lying inside a functional process to persistent storage.

Read
A read is a data movement that moves a data group from persistent storage within reach of the functional process, which requires it.

Exit
An exit is a data movement that moves a data group from a functional process across the software boundary to the user that requires it. An exit does not read the data it moves. An exit is considered to include certain associated data manipulations (for example formatting and routing associated with the data to be exited). The value of a functional process is determined by the sum of the constituting data movements. The smallest functional process consists of two data movements: an Entry containing the triggering event and either a Write or an Exit containing the action the process has to perform. Every identied data movement receives the value of 1 cfsu (COSMIC functional sizing unit). The additional data movements to an unlimited number. This is a great advantage over function point analysis where all base functional components have an upper size limit. COSMIC-FFP counts in base functional components that are directly related to the size units. This is slightly different from function point analysis, which counts at an abstraction level that can be compared to the level of the functional processes. In function point analysis there is a weighing function between the base functional components and the size units. Figure 44 shows the relation between the base functional components of COSMIC-FFP and those of function point analysis. From this gure it is evident that both methods have a different approach to measuring the functional size of software.
GENERATION

size of the smallest functional process is 2 cfsu and increases with 1 cfsu per

183

OSMIC

FULL FUNCTION P

OINTS

THE

NEXT

Data

Software

Users

EI ILF

e
EQ

EIF

EO

Figure 44: Base functional components from FPA and COSMIC-FFP.

Measurement viewpoints
Function point analysis measures software on functionality that can be seen from outside the software: data structures that can be used to store or retrieve data and functions that can bring data into the data structures, can manipulate data that already has been stored or retrieve data from the data structures. There is no discussion of what should (not) be counted: if it is not visible outside the software it should not be counted. COSMIC-FFP counts the user practices and procedures that the software must perform to fulll the users need. Since users can be either human users or software users, the users need can be at very different levels of abstractions. A
YEARBOOK

human user will dene its needs to word processing software in terms of spellchecking or changing the appearance of the typed words from a normal font to bold. The operating system will dene its needs in terms of knowing to what device it should send the bit streams that it receives. Both ways of looking to the actions the software should perform are valid, but lead to a very different sizing value. In COSMIC-FFP it is therefore essential to record the viewpoint with which the software is measured. The viewpoint is a form of abstraction achieved using a selected set of architectural concepts and structuring rules, in order to focus on

184

THE

NESMA

ANNYVERSARY

particular concerns within the software to be measured. In the measurement manual the two most commonly used viewpoints are dened: 1 The end-user measurement viewpoint only reveals the functionality of application software that has to be developed and/or delivered to meet a particular statement of functional user requirements. It is the viewpoint of users which are either human who are aware only of the application functionality they can interact with, or of peer application software that is required to exchange or share data with the software being measured, or of a clock mechanism that triggers batch application software. It ignores the functionality of all other software needed to enable these users to interact with the application software being measured. 2 The developer measurement viewpoint reveals all the functionality of each separate part of the software that has to be developed and/or delivered to meet a particular statement of functional user requirements. For this denition the User whose requirements must be met is strictly limited to any person or thing that communicates or interacts with the software at any time. The effect of both viewpoints can be illustrated with these two message sequence diagrams, where the downward arrow represents a functional process and the horizontal arrows represent data movements.
GENERATION

<
User Exit Read

Application layer
Figure 45: Functionality revealed by the end user measurement viewpoint

185

OSMIC

FULL FUNCTION P

OINTS

THE

Entry

NEXT

FP

FP Entry Application layer Exit Read

Device driver layer


Figure 46: Additional Functionality recealed by the developer measurement viewpoint.

Both diagrams represent a functional process that retrieves data from some kind of data storage. From the end-user measurement viewpoint we only see the functionality of the application software: The functional process receives a trigger from the User (E); It reads the required data from the data storage (R); It displays the retrieved data to the User (X). The size of this functional process is 3 cfsu from this viewpoint6. From the developer measurement viewpoint there is a second layer of functionality involved: the device driver which communicates with the data storage. The Read data movement in the application layer corresponds with a functional process for the device driver that is similar to the functional process in the application layer:
YEARBOOK

The functional process receives a trigger from the application layer (E); It retrieves the required data from the data storage device (R); It communicates the retrieved data to the application layer (X). The size of this functional process is also 3 cfsu. The same functional user requirement is thus 3 cfsu in the end-user measurement viewpoint, but 6 cfsu in

NESMA

ANNYVERSARY

6. In the end-user measurement viewpoint this function can have a size of 4 cfsu, because by convention all software messages generated without user data are counted as a single additional exit. Introducing this convention in the main text might confuse readers who are not familiar with the details of the COSMIC-FFP method.

186

THE

the developer measurement viewpoint. This may look confusing at rst, but can be very helpful if we take into account what use both viewpoints have. The end-user measurement viewpoint will be the designated choice for measuring software from a 'human' perspective, being either business application software or real-time software that can interact with humans. This measurement viewpoint is the viewpoint from which rst generation functional sizing methods such as the NESMA method were designed to measure a functional size. This is important to realize if one wants to compare COSMIC-FFP measurements with measurements done with rst generation functional sizing methods. The developer measurement viewpoint may see that more than one separate component has to be developed and/or delivered. This can arise if parts of the software have to be developed using different technologies, or will execute on different processors or belong to different layers of a given architecture. This measurement viewpoint will be used for measuring software from a 'technology' perspective. Since both viewpoints represent a different view on measuring software they cannot be compared. Although there may be a relation between measures in a very strictly dened software environment, this relation cannot be translated into a general formula to translate a functional size in the end user measurement viewpoint to a functional size in the developer measurement viewpoint. Other than the sort of functionality that is revealed by the measurement the choice of viewpoint has no consequence for the application of the COSMIC-FFP method. In the following, any statement about the COSMIC-FFP method in the rest of this article can apply to any viewpoint.
GENERATION

Measuring enhancement projects


Most software projects are enhancements to existing software. In the early nineties a working group of NESMA rst proposed a method for measuring enhancement [5] using function point analysis. In 1998 this method was published as a professional guide, not as a part of the NESMA standard. This method uses the change in the data element types and le types referenced rather than the absolute number to calculate a factor that can be applied to the weight of a function to calculate the enhancement value of this

187

OSMIC

FULL FUNCTION P

OINTS

THE

NEXT

function. For changed functionality this factor ranges from 0,25 to 1,50 (in steps of 0,25). For deleting and retesting existing functionality this method also contains rules. The NESMA method distinguishes between project size (which can have a fractional value) and application size (which is always a whole number). This method has substantial acceptance in the Netherlands, but very little acceptance in the rest of the world, where the IFPUG view on measuring enhancement projects is most common that does not work with an enhancement factor. In COSMIC-FFP measuring changed functionality is part of the method. Section 4.3b of the measurement manual [4] describes that the size of a changed functional process is an aggregation of the number of modied data movements (added, modied and deleted). As with new functionality this results in a size of a whole number of cfsu, with only one difference: the smallest changed functional process can have a size of 1 cfsu. Dividing the size of the changed functional process by the original size results in a factor. This factor is usually in range with the NESMA factors, but can theoretically be any factor greater than zero. Measuring changed functionality is not quite the same as measuring enhancement projects. Enhancement projects usually also contain deleting functionality and retesting existing functionality that is linked to the changed and/or deleted functionality. Not everyone will agree that the last aspect should be accounted for in a functional size measurement. COSMIC-FFP has no rules about how to deal with retesting existing functionality that is linked to the changed and/or deleted functionality because in its denition retesting existing functionality has nothing to do with functional size.
YEARBOOK

Strict application of the description on how to deal with changed functionality means that deleting a functional process has the same impact on the functional size as creating new functionality. For the application size this is obviously true, but for the project size it will overestimate the corresponding work effort. To deal with this problem for now COSMIC-FFP offers the possibility of using local extensions to the method, but this should be resolved in a general way.

188

THE

NESMA

ANNYVERSARY

Estimation
Developing software is for most organizations no longer an independent software project, but is part of a business case which includes all disciplines involved. This means that the cost of building the software must be balanced by a prot somewhere else in the organization. So organizations want to have a good estimate of the effort of developing and/or delivering the software as early as possible. The NESMA method [1] contains, in addition to the detailed method, a rough estimation technique and an indicative estimation technique, which can be used if not all detailed data are known yet. The early estimation techniques draw on long years of experience with the detailed method. Next to the ofcial method NESMA published a handbook for estimation in the very early stages of software development [6]. Since COSMIC-FFP is a fairly new method there is no early estimation technique that can draw upon long experience with the detailed method. In the measurement manual two techniques for doing early COSMIC-FFP are described: the approximate technique (comparable to NESMA's indicative technique) and the rened approximate technique (comparable to NESMA's rough technique). In the approximate technique the average size of a functional process is multiplied with the number of functional processes the software should provide. In the rened approximate technique the functional processes to be provided can already be classied as small, medium, large or very large, each with its own average size. The average numbers have to be established rst. Since functional processes do not have a xed range for their size, early estimation can lead to different values in different environments. For example: in a banking environment the average size of a functional process can be 7,3 cfsu and in an avionics environment it can be 8,0 cfsu [7]. This may seem a little difference, but in projects with a large number of functional processes this will lead to signicant different estimates. By using the rened approximate technique the differences only increase. For the average value of large and very large functional processes the differences between banking (6,3 cfsu and 14,9 cfsu) and avionics (10,5 cfsu and 23,7 cfsu) are nearly a factor two.
GENERATION

189

OSMIC

FULL FUNCTION P

OINTS

THE

NEXT

The precision of the COSMIC-FFP approximate technique is good enough with less than 10% deviation on a portfolio and less than 15% on a project [7] within a specied environment. There is a drawback that the values for approximate estimation must be determined for any different environment.

Benchmarking
More and more software developers have to prove their value for money. This is the effect of the fact that software development is no longer an isolated project, but part of a business case where the cost of developing and/or delivering software must be justied. One of the ways of proving value for money is comparing productivity with external standards. For projects sized with function point analysis or lines of code there are enough benchmarks available. Since 2003 the ISBSG-repository also accepts data from projects sized with COSMIC-FFP. At the moment the repository is still in its early days for COSMIC-FFP and the values resulting from it should not yet be relied on as benchmarks to the same extent as the rst generation functional sizing methods [8]. ISBSG has established that there is an interesting take-up of COSMIC-FFP, with a balance between application domains that only can be sized with next generation functional sizing methods (real-time, message switching, infrastructure) and business application software, which can also be sized with rst generation functional sizing methods. Benchmarking directly against projects sized with COSMIC-FFP will be possible in the near future. At the moment however there is no solid benchmark. If there is a need for benchmarking other sizing methods should be used or
YEARBOOK

COSMIC-FFP size gures must be converted to size gures from a method with a solid benchmark.

ANNYVERSARY

Converting function points data


As part of the implementation of COSMIC-FFP at Rabobank the possibility of converting functional size values of NESMA function point analysis and COSMIC-FFP has been investigated [9] for eleven projects. To ensure that this conversion exercise could lead to useful results only projects were taken into account that could be counted without 'interpretation' of the counting rules for both methods and were counted with a comparable

190

THE

NESMA

view on the functionality, using only the end user measurement viewpoint for COSMIC-FFP measurements. From this small number of projects a conversion formula could be derived: Y(cfsu) = -87 + 1,2 X (fp) Comparison with a similar study gave similar results, only with a different offset in the formula [10] which may be caused by the effect of ILF and EIF on the function point values. For projects with a certain size (approximately 300-600 points in both methods) size values can be converted. In this way existing sizing gures can be reused when converting from function points to COSMIC-FFP as the standard functional sizing method. On the other hand COSMIC-FFP size measurements can be converted to function points to use benchmarks with a large number of projects. Since there should be only a minimal difference between the NESMA method and the most recent IFPUG method [11] the above mentioned formula is valid for both methods.

Future developments
COSMIC-FFP made a lot of progress in a short time, but there is still work to be done to be ready to be a mainstream functional sizing method: Although the design of the method is very simple, translating the principles into a functional size measurement requires thorough understanding of those principles. COSMIC is working on several guidelines for applying COSMIC-FFP in different domains. The rst guideline, for business application software will be available at the beginning of 2005. Some concepts of the method need improved denition to be unambiguous in all domains. COSMIC will release method update bulletins for these concepts. Two method update bulletins are already planned: the rst on the concept of layers and the second on the concept of data groups. COSMIC-FFP should be integrated within the education infrastructure of software engineers so that all software engineers will graduate with a working knowledge of measuring functional size with COSMIC-FFP. Techniques for early size estimation must be developed further to better equip those responsible for estimating software projects.
GENERATION

191

OSMIC

FULL FUNCTION P

OINTS

THE

NEXT

The ISBSG-repository should contain considerably more projects to serve as a good benchmark. One promising advantage of the simplicity of the design of COSMIC-FFP is the possibility of automated sizing. The University of Quebec already demonstrated a fairly good working prototype plug-in for the Rational suite that can size the design directly.

Will COSMIC-FFP replace function point analysis


COSMIC-FFP has made an enormous progress over a limited amount of time. Is the 25th anniversary of function point analysis the last anniversary to be celebrated? I don't think so. Function point analysis has proven to be a valuable tool for developers of business application software and will remain to be that for a number of years to come. COSMIC-FFP development started to serve those areas of software engineering that could not be served by function point analysis such as eventdriven and real-time software development. The method was designed to serve the type of software served by function point analysis as well, so that hybrid software could be served with one functional sizing method. For application software that can be counted completely with function point analysis there is no need to abandon that method. But today we see more and more software being built by principles that cannot be served with function points. For that software COSMIC-FFP can be used. Today we can see a trend towards information systems that are composed of smaller components, which might in part be of a nature that cannot be served
YEARBOOK

with function point analysis. Since COSMIC-FFP is designed to meet the sizing demands of most of the current software I expect a slow migration towards the use of COSMIC-FFP, unless a better option will be available soon. I'm convinced function point analysis will also celebrate its 30th anniversary as an actively used standard. About the 35th anniversary I'm not so sure.

ANNYVERSARY

About the author


Frank Vogelezang (frank.vogelezang@sogeti.nl) has been working as a practitioner and consultant within the area of software metrics for over ve years. Within this area he specialized in estimation and performance

192

THE

NESMA

measurement within client organizations. He is a consultant for the Expertise Center Metrics of Sogeti Nederland B.V. He is a member of the Measurement Practices Committee of COSMIC and a member of the COSMIC working group of NESMA.

References
1 Barth, M.A., Onvlee, J., Spaan, M.K., Timp, A.W.F., Vliet, E.A.J. van, Denities en telrichtlijnen voor de toepassing van functiepuntanalyse NESMA functional size measurement method conform ISO/IEC 24570, versie 2.2, NESMA 2004. 2 Abran, A., Symons, C., Oligny, S., An Overview of COSMIC-FFP Field Trial Results, 12th European Software Control and Metrics Conference ESCOM 2001, April 2-4, London (England), 2001. 3 Abran, A., Meli, R., Symons, C., COSMIC-FFP (ISO 19761) Software size measurement: State of the art 2004, Software Measurement European Forum SMEF 2004, January 28-30, Rome (Italy), 2004. 4 Abran, A., Desharnais, J.M., Oligny, S., St-Pierre, D., Symons, C. (eds), COSMICFFP Measurement Manual (The COSMIC implementation guide for ISO/IEC 19761:2003), version 2.2, january 2003. 5 Engelhart, J.T., Langbroek, P.L., Dekkers, A.J.E., Peters, H.J.G., Reijnders, P.H.J., Function point analysis for software enhancement, A professional guide of the Netherlands Software Metric Users Association, NESMA 2001 (in 1998 the dutch version was already published). 6 Jacobs, M.A.J., Vonk, H., Wiering, A.M., Handboek FPAi: Toepassing van functiepuntanalyse in de eerste fasen van systeemontwikkeling, versie 2.0, NESMA 2001. 7 Vogelezang, F.W., Dekkers, A.J.E., One year experience with COSMIC-FFP, Software Measurement European Forum SMEF 2004, January 28-30, Rome (Italy), 2004. 8 International Software Benchmarking Standards Group, An analysis of software projects sized using COSMIC Full Function Points, ISBSG, january 2004. 9 Vogelezang, F.W., Lesterhuis, A., Applicability of COSMIC Full Function Points in an administrative environment: Experiences of an early adopter, Proceedings of the 13th International Workshop on Software Measurement IWSM 2003, September 23-25, Montral (Canada), 2003.
GENERATION

193

OSMIC

FULL FUNCTION P

OINTS

THE

NEXT

10 Fetcke, T., The warehouse software portfolio, a case study in functional size measurement, technical report no. 1999-20, Software engineering management research laboratory, Universit du Quebec Montral (Canada) 1999. 11 NESMA, FPA volgens NESMA en IFPUG; de actuele stand van zaken, versie 2.0, NESMA, juni 2004.

194

THE

NESMA

ANNYVERSARY

YEARBOOK

MEASURE!

KNOWLEDGE! ACTION!

The Netherlands Software Metrics Users Association (NESMA)

13

VIEW OF THE FUTURE


LAWRENCE H. PUTNAM AND WARE MYERS

he means now exist to develop software several hundred times more

effectively, as measured by the process productivity parameter, than most organizations are actually accomplishing. QSMs more or less normal curve of process productivity shows an enormous range of productivity. The curve indicates that these means, or the accomplishment of them, are not getting across to most organizations. Software has become increasingly important in economic activity and will no doubt become even more so in the next decade. So the question we are pondering becomes: Why dont organizations on the lower two-thirds of this curve take advantage of the software development capability that already exists? Can anything be done to encourage them to do so? Let us start out by setting a little background - the four efforts to improve the software process with which we have been working the last few years:

Paul Bassetts frame engineering


As we know from the QSM Associates study, frame engineering improves software development by a factor of 10. Some of Pauls clients, not included in the study, are doing even better. But the company he founded, Netron, reached only a few hundred clients in its rst 16 years of existence.

Martin Griss software reuse


effectiveness of software development. His book on the subject, however, suffers from the deciency of being excessively technical, detailed, and lengthy. It also implies that reuse is based on object-oriented methods, thus limiting its inuence to a small part of the eld.
FUTURE

Griss collected some cases indicating that reuse greatly improves the

195

VI

EW

OF

THE

Unied Modeling Language


Based on the work of Ivar Jacobson, Grady Booch, and James Rumbaugh, and the feedback from 40 or so industrial participants, this language was standardized in December 1997 by the Object Management Group. The idea of something comparable to engineering blueprints, usable throughout the software process, should increase the effectiveness of software development. Of course, many companies already have something of the sort that they use internally. Bill Cave has a batch of drawings that greatly aid the development of large real-time systems. However, a standard set, intelligible world wide, sounds like a good idea. Again, as in Griss case, the documents imply that it depends on object-oriented technology.

Unied Software Development Process


A consistent process, recording its ndings in a standardized language, certainly promises more effective development. Again, it seems to be based on object-oriented technology. We might add object-oriented technology itself to this list. All these authors seem to believe that it is a substantial advance over previous development methods. Unfortunately, it has a track record of having been around for several decades without penetrating the eld very extensively. The Capability Maturity Model is a now nearly 16-year attempt to improve software development. It has some value as a training mechanism. However, only a few hundred organizations have used it, so overall it has had little effect. Computer-aided Software Engineering and, indeed, tools in general, have looked promising, but overall, the effect seems small.
YEARBOOK

There are other methodologies that claim some success, such as SAP, Oracle, middleware, and Microsofts reusable components . As a matter of fact, all these methodologies and other inuences have had some effect. Software productivity has increased over the past two decades, as QSMs records of increasing process productivity establish. However, most of the pack still lag far behind the leaders. QSMs normal curve of the distribution of process productivity continues to demonstrate this fact. It may make the situation more concrete to look at a few examples of what seem to us to be rather obvious better ways.

196

THE

NESMA

ANNYVERSARY

Establish feasibility rst


A small team should develop the beginnings of an architecture, pointed especially at novel aspects of the proposed system. These risky aspects should be sufciently resolved (probably not totally) before the organization tries to proceed to full-scale development. Yet Capers Jones estimates that 25 to 65 percent of large systems, depending on size, are cancelled before completion. We suspect many of these systems were not feasible in the rst place. In other words, a good many organizations do not employ a feasibility-study stage.

Establish a plan
A valid plan is necessarily based on some degree of architecture and some degree of risk reduction. (In our books we called it functional design or high-level design. Some people call it architecture.) The degree has to be sufcient to enable the software organization to estimate the cost and time schedule of the construction phase, and the level of reliability that will be attained. Apparently only about one quarter of projects get under way with much of a plan in place. At least, Capers estimates that only one quarter of software organizations employ automated estimating systems. These estimating systems generally require an estimate of the functionality, often expressed in terms of size, of the proposed project, necessitating a degree of planning.

Have a repeatable process


You cant carry out a plan within an estimate if you dont have an organization that can repeat the process with which it builds a system. About two thirds of organizations rest in CMM level 1; they dont have a repeatable process.

Review and inspect


It has been pretty thoroughly established that reviews and inspections along
UTURE

the way are a good idea. Most software organizations do little of these things.

All the experts agree that test planning should start early. Most organizations start test planning late.

197

VI

EW

OF

THE

Plan tests

Possible Reasons Why Progress Is Slow


Reason one is psychological. People resist change; executives resist change. Managers dont want to spend time on the messy preliminaries to software development. They are anxious to see code coming out. Reason two may be that people and managers just dont know about these better ways. A plethora of conferences, short courses, books, journals, magazines, and consultants are standing by to help. The U.S. Department of Defense even puts out directives. These means have some effect, but dont bring the majority of the industry around to better ways. Reason three may be that a few score of better ways have already inundated the software eld and software people have had negative experiences with many of them. They are gun-shy. Sad to say, many of these highly touted better ways were not effective in practice. Reason four may be that even those of the better ways that are valid, that is, that have demonstrated their worth in some organizations, are not easy to implement. Reason ve may be our competitive system itself. On the one hand, it pushes organizations to improve, while on the other hand, it removes from marginal organizations the resources needed to support long-range improvement efforts. This is not to say that some other system would work better. Those we know about seem not to have. The issue is to nd a way to foster more long-range improvement within the competitive system. So, we conclude that process improvement is always going to be difcult; it is always going to take a period of years; it is always going to replace organizations that lag with those that modernize - Schumpeters creative destruction. But
YEARBOOK

creative destruction is a painful way for humankind to advance. It is worth searching for a better way.

ANNYVERSARY

A Software Development Process That Works


One of the things that the software industry needs is a software process that works. Of course, the present processes do work to a considerable degree. There are billions of lines of working software out there. But in many organizations the process they employ does not work very well. For example, if you dont have a feasibility stage, you are going to undertake some projects that are certain to fail. If you dont have a functional design stage, you are not going to dene the work

198

THE

NESMA

to be done with sufcient precision to support a valid plan and its related estimate. What we mean by a software process is something like the waterfall model that Winston Royce described in 1970. It has since been modied by Barry Boehms spiral model, rapid prototyping, rapid application development, and other models. Ivar Jacobson, Grady Booch, and James Rumbaugh have the Unied Software Development Process. Most of the models (includingthe latter) have four stages under various names: Feasibility (Inception); Functional design (Elaboration); Main build (Construction); Maintenance or operations (Transition).

Feasibility
This phase should accomplish ve goals: 1 Develop an architectural core, covering particularly the novel parts of the proposed system, those parts that the organization does not know how to do. 2 Sort out the major risks that appear in these novel parts and reduce them to manageable risks. 3 Establish feasibility. Provide assurance that the project can be accomplished by the organization at hand. 4 Show with a ballpark estimate that the project falls within the limits of the resources the organization can afford. 5 Make the business case at a general level. The data to be precise does not yet exist. Show that the project is economically worth doing.

Functional design
This phase carries preparatory work to the point at which the main build can 1 Extends architecture or functional design to the entire system. 2 Identies further risks uncovered in the process of extending the architecture. Reduces them to manageable proportions. 3 Carries preparatory work far enough to plan the main build, something on the order of a rst draft of a PERT diagram.
UTURE

begin:

199

VI

EW

OF

THE

4 Carries work far enough to estimate effort, cost, and schedule of main build and reliability level of completed system. 5 Similarly, extends the business plan, showing the proposal makes business sense. In practice, folks generally dont do enough in this phase. Still, they have been getting a little better year by year.

Main build
Constructs the system.

Maintenance
Covers activities after delivery up to beginning of planning for the next generation.

Principles Underlie Process


Do the hard stuff rst - key requirements, core architecture, major risks, ballpark estimate, feasibility, signicant risks, mainline architecture, construction estimate. Have a method for ascertaining the requirements. sorting out the key requirements, and transposing the requirements into the rst stage of analysis (here Ivar Jacobsons use-case methodology plays a role). Since there is an order in which to accomplish the hard stuff, base the process on iteration. Since successive teams of developers must pick up the iterations, document
YEARBOOK

them adequately, a goal to which the Unied Modeling Language is pointed. Since development occurs in an economic framework, collect metrics, maintain a database of them, base estimates on them, and control the execution of the processactuals against planwith them.

ANNYVERSARY

What We Have To Do
What has to be done seems clear in broad outline. We need valid software development methods; we need to communicate them; they need to be applied; they need to be simplied; we need to measure; we need to implement pertinent

200

THE

NESMA

metric technology; and the effectiveness of development methods in implementation needs to be measured with effective metrics.

Need valid methods


The methods outlined earlier in this article seem to be generally valid to us. There are no doubt other approaches. However, the fact that they have not reached a large percentage of the industry suggests that they are not accepted as valid by many.

Need to communicate the methods


There is a great deal of communication in the industry - countless magazines, journals, books, short courses, conferences, not to say a good deal of one-on-one effort. Much of it is effective, as progress in the industry demonstrates. Some of it is not very effective. It seems to us that there are, broadly speaking, three audiences to be reached: Those working on the development of new ideas; they need means of interfacing to others working in the same eld. Academics have transactions and journals; methodologists work through technical magazines and newsletters. There are also conferences and personal contact. Communication at this level does not work perfectly, but there is a lot of it. We might expect incremental improvement over time, but no great breakthroughs. Those applying new ideas; practitioners; they need detailed books, manuals, short courses, personal contact. These means are often available. Again, they might be improved over time, but there is no reason to expect great breakthroughs. Those sponsoring and nancing the new ideas; executive management; they need communications tailored to their function in adopting and nancing new methods. In this case, the level of communication could be greatly improved. In fact, there is little of it. (That is, there is little of this sort. There are many business job.) If executive communication can be made more effective, it could lead to a substantial enhancement of the rate of progress in the software industry. One possible approach could be through venture capital rms. They have a strong
UTURE

magazines and hundreds of books at the level of Fortune, but they dont do this

201

VI

EW

OF

THE

relationship with companies seeking funds. Of course, that requires getting the word to the venture capitalists in the rst place.

Need to apply the methods


New ideas do sometimes get to the point of being applied, only to have the implementation fail. There are factors such as weak change management, changing business conditions (such as downsizing), and lack of sufcient time and money, that cause these failures but there is one, we believe, that is often overlooked. It is the fact that many new ideas are unduly complicated.

Need to simplify the methods


The big job ahead of us is to simplify them. For example, complex computations can be relegated to a computer program; the user needs to understand only what to put in and what the output means. Object-oriented technology is, in our opinion, an example of a technology that is still overly complicated. Where it has been made to work, it has led to gains in development effectiveness. But it has often failed. And even more often executives have been reluctant to try it. The fact that it has been around for about 35 years and present as a well worked out concept for about 20 years, but has not been widely adopted, suggests that simplied approaches to it are not widely known. People are fearful of object technology. Grady Booch, a long-time advocate of the technology, agrees that objectoriented thinking is hard; only one in a thousand get it at the full-understanding level, from architecture on. That may be why it has been slow to get taken up. Similarly, the Unied Modeling Language is complicated. To meet the
YEARBOOK

concerns of the many critics, considerable complexity seems to have resulted. Perhaps the current version is as good as we can expect at this juncture. It should be a subject for simplication in the future. Is it possible to simplify these better ways so that ordinary-level organizations could make them work? Is it possible to have softwaredevelopment processes that would make it possible for these ordinary organizations to implement difcult ways of working? We think it is! We think so because it has already been going on for several generations. The general approach is to turn more of the work over to the computer. Specically, the general approach breaks down into two approaches.

202

THE

NESMA

ANNYVERSARY

The rst is to turn over work that a computer can do to the computer. For instance, computers can do statistical calculations and draw statistical curves. Human beings can concentrate on what the results and the curves mean, The second approach is to enable a computer to supply the background information that a knowledge worker needs at a particular point in his work process, rather than training the worker in advance in every detail he needs to do a complicated job. The Help programs lurking behind computer screens are early examples of this genre. A more advanced approach ts an instructional screen to each step in a standardized process. The developer, then, need not learn that step months ahead in a formal class. He can acquire what he needs when he needs it. Obviously, each approach requires an enormous amount of work to implement. That work wont be done in a day, or a year! In fact, it will be done only as fast as bold entrepreneurs can develop a market for such products. That brings us back full circle: management has to support this kind of development by buying the products of it. History demonstrates that all this takes time.

Need to measure
Metrics is another area that often has been over complicated. Scores of metrics have been suggested and sometimes found a place in practice. Again, software organizations should at least begin with a simplied set. With the ve core metrics - time, effort, functionality (size), process productivity, reliability (defects) - developers can control the key functions..

Need to implement technology


The next consideration, if some technology is inescapably complicated, is how to implement it. A good deal is known about how to do this. For example, complicated implementation takes investment money up front; the
UTURE

implementation period may extend for years before payback begins. It takes dedicated people (how to select them and motivate them over years); it takes a champion; it takes long-term consistent management (nancial) support; it takes training, followed by supportive supervision and mentoring. Department of Defense sponsorship of research and development through the Defense Advanced

203

VI

EW

OF

THE

Research Projects Administration is an example of long-term support that has led to many breakthrough products and processes. Again, however, we have the situation that a good deal is known, but it does not seem to be known to all the managements that need to utilize it. Or perhaps the forces of break-neck competition overwhelm what knowledge they have. That leads to the question: is it possible to organize new technology in a way that permits it to be implemented in a series of pieces, so that each piece is clearly paying off before an organization initiates the next piece? Of course, it is possible. Intel and Hewlett-Packard, for example, have a long record of doing this in the technology area. Software development has been less successful. It takes a lot of discipline built into the development process. It takes the ability to handle complesxity on a regular basis. Doing all that day in and day out is not easy.

Need for metrics


Moreover, accomplishment has to be evident to all, as reected in some kind of metrics. The ultimate driver that inuences practitioners and executive management alike to pursue a new course is results that establish that the course is succeeding. For this purpose, in our opinion, no amount of well-meant philosophizing can take the place of pertinent metrics. Pertinent, of course, is a big word. Not everything can be measured. For instance, company-wide outcomes, such as improved morale and better products, are not easily measurable nor directly traceable to better software. Yet there may be an unmeasurable feeling that some of these favorable outcomes have been inuenced by better software. ROI methods are notoriously difcult to quantify
YEARBOOK

accurately. Similarly, not everything that can be measured is pertinent, as the reams of justication gures that computers can turn out demonstrate all too often. Sometimes these reams are just overwhelming the review/decision process with junk and bulk. For instance, in software development conventional productivity (Size divided by Effort) can be measured, but it is not helpful. It is not a satisfactory measure of process improvement. In fact, it is misleading. In this eld the Time that development takes is an important factor. The more satisfactory relationship

204

THE

NESMA

ANNYVERSARY

includes Size, Effort, and Time. The use of an inadequate measure of development productivity has been misleading. At best the measurement of productivity is elusive. A simple statement of output/input is often inadequate in complex elds. A more thoughtful expression - some quantity of function, in less time, for less effort/cost, at higher reliability - is a better basis for analysis. Note that these units for analysis are also the core metrics. We have come full circle!

205

VI

EW

OF

THE

UTURE

206

THE

NESMA

ANNYVERSARY

YEARBOOK

Anda mungkin juga menyukai