Anda di halaman 1dari 20

e-Informatica Software Engineering Journal, Volume 3, Issue 1, 2009

Satisfying Stakeholders’ Needs – Balancing Agile


and Formal Usability Test Results
Jeff Winter∗ , Kari Rönkkö∗

School of Engineering, Dept. of Interaction & System Design, Blekinge Institute of Technology
jeff.winter@bth.se, kari.ronkko@bth.se

Abstract
This paper deals with a case study of testing with a usability testing package (UTUM), which
is also a tool for quality assurance, developed in cooperation between industry and research. It
shows that within the studied company, there is a need to balance agility and formalism when
producing and presenting results of usability testing to groups who we have called Designers and
Product Owners. We have found that these groups have different needs, which can be placed on
opposite sides of a scale, based on the agile manifesto. This becomes a Designer and a Product
Owner Manifesto. The test package is seen as a successful hybrid method combining agility with
formalism, satisfying organisational needs, and fulfilling the desire to create a closer relation
between industry and research.

1. Introduction in a world of continuously changing business de-


cisions often makes this approach impractical or
Product quality is becoming the dominant suc- impossible, pointing to a need for agility. At a
cess criterion in the software industry, and Os- keynote speech at the 5th Workshop on Soft-
terweil states that the challenge for research is to ware Quality, held at ICSE 2007 [45], Boehm
provide the industry with the means to deploy stated that both agility and quality are becom-
quality software, allowing companies to compete ing more and more important. Many areas of
effectively [23]. Quality is multi-dimensional, technology exhibit a tremendous pace of change,
and impossible to show through one simple mea- due to changes in technology and related infras-
sure, and research should focus on identifying tructures, the dynamics of the marketplace and
various dimensions of quality and measures ap- competition, and organisational change. This is
propriate for it [23]. A more effective collab- particularly obvious in mobile phone develop-
oration between practitioners and researchers ment, where their pace of development and pen-
would be of great value [23]. Quality is also etration into the market has exploded over the
important owing to the criticality of software last 5 years. This kind of situation demands an
systems (a view supported by Harrold in her agile approach [6].
roadmap for testing [14]) and even to changes This article is based on two case studies of a
in legislation that make executives responsible usability evaluation framework called UIQ Tech-
for damages caused by faulty software. nology Usability Metrics (UTUM) [39], the re-
One approach to achieving quality has been sult of a long research cooperation between the
to rely on complete, testable and consistent re- research group “Use-Oriented Design and Devel-
quirements, traceability to design, code and test opment” (U-ODD) [37] at Blekinge Institute of
cases, and heavyweight documentation. How- Technology (BTH), and UIQ Technology (UIQ)
ever, a demand for continuous and rapid results [38]. With the help of Martin et al.’s study [21]
120 Jeff Winter, Kari Rönkkö

and our own case studies, it presents an ap- – Do the information needs, and preferred
proach to achieving quality, related to an organi- methods change during different phases of a
zational need for agile and formal usability test design and development project?
results. We use concepts such as “agility under- – Can results be presented in a meaningful way
stood as good organizational reasons” and “plan without the test leader being present?
driven processes as the formal side in testing”, The structure of the article is as follows. An
to identify and exemplify a practical solution to overview of two testing paradigms is provided.
assuring quality through an agile approach. The A description of the test method comes next,
research question for the first case study was: followed by a presentation of the methodology,
– How can we balance demands for agile re- and the material from the case studies, examin-
sults with demands for formal results when ing the balance between agility and formalism,
performing usability testing for quality as- the information needs of different stakeholders,
surance? the relationship between agility, formality and
We use the term “formal” as a contrast to quality, and the need for research/industry co-
the term “agile” not because we see agile pro- operation. The article ends with a discussion of
cesses as being informal or unstructured, but the work, and conclusions.
since “formal” is more representative than “plan
driven” to characterise the results of testing
and how they are presented to certain stake- 2. Testing – Prevailing Models vs.
holders. We examine how the results of the Agile Testing
UTUM test are suitable for use in an agile
process. eXtreme Programming (XP) is used Testing is performed to support quality assur-
as an illustrative example in this article, but ance, and an emphasis on software quality re-
note that there is no strong connection to any quires improved testing methodologies that can
particular agile methodology; rather, there is a be used by practitioners to test their software
philosophical connection between the test and [14]. Since we regard the test framework as an
the ideas behind the agile movement. We ex- agile testing methodology, this section presents
amine how the test satisfies requirements for a discussion of testing from the viewpoints of
formal and informal statements of usability both the software engineering community and
and quality. the agile community.
In the first study, we identify two groups Within software engineering, there are many
of stakeholders that we designated as Designers types of testing, in many process models, (e.g.
(D) and Product Owners (PO), with an inter- the Waterfall model [30], Boehm’s Spiral model
est in the different elements of the test data. A [4]). Testing is often phase based, and the typical
further case study was performed to discover if stages of testing (see e.g. [33], [25]) are Unit test-
these findings could be confirmed. It attempted ing, Integration testing, Function testing, Per-
to answer the following research questions: formance testing, Acceptance testing, and Instal-
– Are there any presentation methods that are lation testing. The stages from Function testing
generally preferred? and onwards are characterised as System Test-
– Is it possible to find factors in the data that ing, where the system is tested as a whole rather
allow us to identify differences between the than as individual pieces [25]. Usability testing
separate groups (D & PO) that were tenta- (otherwise named Human Factors Testing) has
tively identified in the case study presented been characterised as investigating requirements
in the previous chapter? dealing with the user interface, and has been
– Are there methods that the respondents regarded as a part of Performance testing [25].
think are lacking in the presentation meth- The prevailing approach to testing is reliant on
ods currently in use within UTUM? formal aspects and best practice.
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 121

Agile software development changes how soft- area of usability and testing, and generalists in
ware development organisations work, especially the area of the product and process as a whole.
regarding testing [34]. In agile development, ex-
emplified here by XP [1], a key tenet is that
testing is performed continuously by developers. 3. The UTUM Usability Evaluation
Tests should be isolated, i.e. should not inter- Framework
act with the other tests that are written, and
should preferably be automatic, although not UTUM is a usability evaluation framework for
all companies applying XP automate all tests mass market mobile devices, and is a tool for
[21]. Tests come from both programmers and quality assurance, measuring usability empiri-
customers, who create tests that serve to in- cally on the basis of metrics for satisfaction,
crease their confidence in the operation of the efficiency and effectiveness, complemented by a
program. Customers specify functional tests to test leader’s observations. Its primary aim is to
show that the system works how they expect measure usability, based on the definition in ISO
it to, and developers write unit tests to en- 9241-11, where usability is defined as “the extent
sure that the programs work how they think to which a product can be used by specified users
it does. These are the main testing methods in to achieve specified goals with effectiveness, ef-
XP, but can be complemented by other types of ficiency and satisfaction in a specified context
tests when necessary. Some XP teams may have of use” [17]. This is similar to the definition of
dedicated testers, who help customers translate quality in use defined in ISO 9126-1, where us-
their test needs into tests, who can help cus- ability is instead defined as understandability,
tomers create tools to write, run and main- learnability and operability [18]. The intention
tain their own tests, and who translate the cus- of the test is also to measure “The User eXperi-
tomer’s testing ideas into automatic, isolated ence” (UX), which is seen as more encompassing
tests [1]. than the view of usability that is contained in
The role of the tester is a matter of debate. It e.g. the ISO standards [39], although it is still
is primarily developers who design and perform uncertain how UX differs from the traditional
testing. However, within industry, there are seen usability perspective [41] and exactly how UX
to be fundamental differences between the peo- should be defined (for some definitions, see e.g.
ple who are “good” testers and those who are ([15], [16], [42]).
good developers. In theory, it is often assumed In UTUM testing, one or more test leaders
that the tester is also a developer, even when carry out the test according to predefined re-
teams use dedicated testers. Within industry, quirements and procedure. The test itself takes
however, it is common that the roles are clearly place in a neutral environment rather than a lab,
separated, and that testers are generalists with in order to put the test participant at ease. The
the kind of knowledge that users have, who test is led by a test leader, and it is performed to-
complement the perspectives and skills of the gether with one tester at a time. The test leader
testers. A good tester can have traits that are in welcomes the tester, and the process begins with
direct contrast with the traits that good devel- the collection of some data regarding the tester
opers need (see e.g. Pettichord [24] for a discus- and his or her current phone and typical phone
sion regarding this). Pettichord claims that good use. Whilst the test leader is preparing the test,
testers think empirically in terms of observed the tester has the opportunity to get acquainted
behaviour, and must be encouraged to under- with the device to be tested, and after a few
stand customers’ needs. Thus, although there minutes is asked to fill in a hardware evaluation,
are similarities, there are substantial differences a questionnaire regarding attitudes to the look
in testing paradigms, how they treat testing, and feel of the device.
and the role of the tester and test designer. In The tester performs a number of use cases on
our testing, the test leaders are specialists in the the device, based on the tester’s normal phone
122 Jeff Winter, Kari Rönkkö

Test User User Interested


knowledge satisfaction
Appraised
inefficiency
satisfaction
Appraised
efficiency parties

User
User User
dissatisfaction dissatisfaction
Appra Appra

Influence

Knowledge Metrics/
graphs

UTUM test data

RESULTS

Figure 1. Contents of the UTUM testing, a mix of metrics and mental data
use or organisational testing needs. The test Figure 1 illustrates the flow of data and
leader observes what happens during the use knowledge contained in the test and the test re-
case performance, and records any observations, sults, and how the test is related to different
the time taken to complete the use cases, and groups of stakeholders. Stakeholders, who can
answers to follow-up questions that arise. After be within the organisation, or licensees, or cus-
the use case is complete, the tester answers ques- tomers in other organisations, can be seen at
tions about how well the telephone lets the user the top of the flow, as interested parties. Their
accomplish the use case. requirements influence the design and contents
When all of the use cases are completed, the of the test. The data collected is found both as
tester completes a questionnaire based on the knowledge stored in the mind of the test leader,
System Usability Scale (SUS) [7] about his or and as metrics and qualitative data in spread-
her subjective impressions of how easy the in- sheets.
terface is to use. It expresses the tester’s opin- The results of the testing are thereby a com-
ion of the phone as a whole. The tester is finally bination of metrics and knowledge, where the
thanked for their participation in the test, and different types of data confirm one another. Met-
is usually given a small gift, such as a cinema rics based material is presented in the form of
ticket, to thank them for their help. diagrams, graphs and charts, showing compar-
The data obtained are transferred to spread- isons, relations and tendencies. This can be cor-
sheets. These contain both quantitative data, roborated by the knowledge possessed by the
such as use case completion times and attitude test leader, who has interacted with the testers
assessments, and qualitative data, such as com- and who knows most about the process and con-
ments made by testers and information about text of the testing. Knowledge material is often
problems that arose. The data is used to cal- presented verbally, but can if necessary be sup-
culate metrics for performance, efficiency, effec- ported and confirmed by metrics and visual pre-
tiveness and satisfaction, and the relationships sentations of the data.
between them, leading to a view of usability for UTUM has been found to be a customer
the device as a whole. The test leader is an im- driven tool that is quick and efficient, is eas-
portant source of data and information in this ily transferable to new environments, and that
process, as he or she has detailed knowledge of handles complexity [44]. For more detailed in-
what happened during testing. formation on the contents and performance of
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 123

the UTUM test and the principles behind it, see search and deliberating improvements, and in-
([39], and [44]). A brief video presentation of the volving the practitioners in the improvements.
whole test process (6 minutes) can be found on This approach is inspired by a participatory de-
YouTube [8]. sign (PD) perspective. PD is an approach to-
wards system design in which those who are ex-
pected to use the system are actively involved
4. The Study Methodology and and play a critical role in its design. It includes
the Case Studies stakeholders in design processes, and demands
shared responsibility, participation, and a part-
This work has been part of a long-term research nership between users and implementers [32].
cooperation between U-ODD and UIQ, which These studies have been performed as case
has centred on the development and evaluation studies, defined by Yin as “an empirical en-
of a usability evaluation framework (for more in- quiry that investigates a contemporary phe-
formation, see [44], [40]). The case studies in this nomenon within its real-life context, especially
phase of the research cooperation were based on when the boundaries between phenomenon and
tests performed by together by UIQ in Ronneby, context are not clearly evident” ([46], p. 13). Yin
and by Sony Ericsson Mobile Development in presents a number of criteria that are used to
Manchester. establish the quality of empirical social research
The process of research cooperation is ac- and states that they should be applied both in
tion research (AR) according to the research the design and conduct of a case study. They
and method development methodology called deal with construct validity, internal validity, ex-
Cooperative Method Development (CMD), see ternal validity and reliability ([46], pp. 35–39).
[11], [10], [12] and ([28], chapter 8) for further Three tactics are available to increase con-
details. AR “involves practical problem solving struct validity, which deals with establishing
which has theoretical relevance” ([22] p. 12). It correct measures for the concepts being stud-
involves gaining an understanding of a problem, ied, and is especially problematic in case study
generating and spreading practical improvement research. These are: using multiple sources
ideas, applying the ideas in a real world situa- of information; ensuring a chain of evidence
tion and spreading the theoretical conclusions and; using member checking, i.e. having the
within academia [22]. Improvement and involve- key participants review the case study report.
ment are central to AR, and its purpose is to In this study, we have used many different
influence or change some aspect of whatever the sources of information. The data was obtained
research has as its focus ([27] p. 215). A cen- through observation, through a series of unstruc-
tral aspect of AR is collaboration between re- tured and semi-structured interviews [27], both
searchers and those who are the focus of the re- face-to-face and via telephone, through partici-
search. It is often called participatory research pation in meetings between different stakehold-
or participatory action research ([27] p. 216). ers in the process, and from project documents
CMD is built upon guidelines that include the and working material. The interviews have been
use of ethnomethodological and ethnographi- performed with test leaders, and with staff on
cally inspired empirical research, combined with management level within the two companies.
other methods if suitable. Ethnography is a re- Interviews have been audio taped, and tran-
search strategy taken from sociology, with foun- scribed, and all material has been stored. The
dations in anthropology [29]. It relies upon the second case study involves the use of a survey.
first-hand experience of a field worker who is The mix of data and collection methods has
directly involved in the setting that is under given a triangulation of data that serves to val-
investigation [29]. CMD focuses on shop floor idate the results that have been reached.
development practices, taking the practitioners’ To ensure a chain of evidence a “study
perspective when evaluating the empirical re- database” or research diary has been main-
124 Jeff Winter, Kari Rönkkö

tained. It collects all of the information in the research colleagues, giving them the chance to
study, allowing for traceability and transparency react to the analysis and suggest and discuss
of the material, and reliability [46]. It is mainly alternative explanations or approaches.
computer based, and is an account of the study External validity, knowing whether the re-
recording activities performed in the study, tran- sults of a case study are generalisable outside
scriptions of interviews and observation notes, the immediate case study, has been seen as a
and records of relevant documents and articles. major hinder to doing case studies, as single case
The audio recordings are also stored digitally. studies have been seen as a poor basis for gener-
The written document contains notations of alisation. However, this is based on a fallacious
thoughts concerning themes and concepts that analogy, where critics contrast the situation to
arise when reading or writing material in the survey research, where samples readily gener-
account of the study. The chain of evidence is alise to a larger population. In a case study, the
also a part of the writing process. investigator tries to generalise a set of results to
The most important research collaborators a wider theory, but, generalisation is not auto-
in the industrial organisation have been an in- matic, and a theory must be tested by replicat-
tegral part of the study, and have been closely ing the findings, in much the same way as exper-
involved in many stages of the work. They have iments are replicated. Although Yin advises per-
been available for testing thoughts and hypothe- forming multiple-case studies, since the chances
ses during the study, giving opportunities for of doing a good case study are better than us-
member checking. They have also been involved ing a single-case design ([46], p. 53), this study
as co-authors when writing articles, which also has been performed as a single-case study and
means that member checking has been an inte- has been performed to generate theory. The case
gral part of the research. here represents a unique case ([46], p. 40), since
Internal validity is especially important in the testing has mainly been performed within
exploratory case studies, where an investigator UIQ, and it is thereby the only place where it has
tries to determine whether one event leads to been possible to evaluate the testing methodol-
another. It must be possible to show that these ogy in its actual context. One particular threat
events are causal, and that no other events have is in our study is therefore that most of the data
caused the change. If this is not done, then comes from UIQ. Due to close proximity to UIQ,
the study does not deal with threats to inter- the interaction there has been frequent and in-
nal validity. Some ways of dealing with this are formal, and everyday contacts and discussions
via pattern matching, explanation building, ad- on many topics have influenced the interviews
dressing rival explanations, and using logic mod- and their analysis. Interaction with Sony Er-
els. This study has been a mix of exploratory icsson has been limited to interviews and dis-
and explanatory studies. To address the issues cussions, but data from Sony Ericsson confirms
of internal validity in the case studies, we have what was found at UIQ. A further threat is that
used the general repertoire of data analysis as most of the data in the case study comes from in-
mentioned in the previous paragraph. The ma- formants who work within the usability/testing
terial in the research diary has been analysed area, but once again, they come from two differ-
to find emerging themes, in an editing approach ent organisations and corroborate one another,
that is consistent with Grounded Theory (see have been complemented by information from
Robson [27] p. 458). The analysis process has af- other stakeholders, and thus present a valid pic-
fected the further rounds of questioning, narrow- ture of industrial reality.
ing down the focus, and shifting the main area A threat in the second case study is the
of interest, opening up for the inclusion of new fact that only ten people have participated. This
respondents who shed light on new aspects of makes it difficult to draw generalisable conclu-
the study. A further method for ensuring valid- sions from the results. Also, since the company
ity has been through discussions together with is now disbanded, it is not possible to return
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 125

to the field to perform cross checking with the the data collection and analysis process. This
participants in the study. The analysis is there- ensures that the study is theoretically replica-
fore based on the knowledge we have of the con- ble. One problem regarding the replicability of
ditions at the company and the context where this study, however, is that the rapidly changing
they worked, and is supported by discussions conditions for the branch that we have studied
with a people who were previously employed mean that the context is constantly changing,
within the company, whom we are still in contact whereby it is difficult to replicate the exact con-
with. These people can however mainly be char- text of the study.
acterised as Designers, and therefore may not In the following, we begin by presenting
accurately reflect the views of Product Owners. the results of the first case study, and dis-
Thus, since this research is mainly based on cuss in which way the results are agile or
a study of one company in a limited context, it plan-driven/formal, who is interested in the dif-
is not possible to make confident claims about ferent types of results, and which of the organisa-
the external validity of the study. However, we tional stakeholders needs agile or formal results.
can say that we have created theory from the
study, and that readings appear to suggest that
much of what we have found in this study can 5. Agile or Formal?
also be found in other similar contexts. Further
work remains to see how applicable the theory The first focus of the study was the fact that
is for other organisations in other or wider con- testing was distributed, and the effect this had
texts. Extending the case study and performing on the testing and the analysis of the results
a similar study in another organisation is a way During the case study, as often happens in
of testing this theory, and further analysis may case studies [46], the research question changed.
show that the case at UIQ is actually represen- Gradually, another area of interest became the
tative of the situation in other organisations. elements of agility in the test, and the bal-
Reliability deals with the replicability of a ance between the formal and informal parts of
study, whereby a later investigator should be the testing. The framework has always been re-
able to follow the same procedures as a previ- garded as a tool for quality, and verifying this
ous investigator, and arrive at the same findings was one purpose of the testing that this case
and conclusions. By ensuring reliability you min- study was based on. Given the need for agility
imize errors and bias in a study. One prerequi- mentioned above, the intention became to see
site for this is to document procedures followed how the test is related to agile processes and
in your work, and this can be done by main- whether the items in the agile manifesto can be
taining a case study protocol to deal with the identified in the results from the test framework.
documentation problem, or the development of a The following is the result of having studied the
case study database. The general way to ensure material from the case study from the perspec-
reliability is to conduct the study so that some- tive of the spectrum of different items that are
one else could repeat the procedures and arrive taken up in the agile manifesto.
at the same result ([46], pp. 35–39). The case The agile movement is based on core val-
study protocol is intended to guide the investi- ues, described in the agile manifesto [35], and
gator in carrying out the data collection. It con- explicated in the agile principles [36]. The agile
tains both the instrument and the procedures manifesto states that: “We are uncovering better
and general rules for data collection. It should ways of developing software by doing it and by
contain an overview of the project, the field pro- helping others do it. Through this work we have
cedures, case study questions, and a guide for come to value: Individuals and interactions over
the case study report ([46], p. 69). As mentioned processes and tools, Working software over com-
previously, a case study database has been main- prehensive documentation, Customer collabora-
tained, containing the most important details of tion over contract negotiation, and Responding
126 Jeff Winter, Kari Rönkkö

to change over following a plan. That is, while who lead the test, and who actually perform
there is value in the items on the right, we value the testing on the devices. The central figure
the items on the left more”. Cockburn stresses here is the test leader, who functions as a
that the intention is not to demolish the house of pivot point in the whole process, interacting
software development, represented here by the with the testers, observing and registering
items on the right (e.g. working software over the data, and presenting the results. This
comprehensive documentation), but claims that interaction is clearly important in the long
those who embrace the items on the left rather run from a PO perspective, but it is D who
than those on the right are more likely to suc- has the greatest and immediate benefit of the
ceed in the long run [9]. Even within the ag- interaction, showing how users reacted to de-
ile community there is some disagreement about sign decisions, that is a central part of the
the choices, but it is accepted that discussions testing.
can lead to constructive criticism. Our analysis • Processes and Tools – The test is based
showed that all these elements could be identi- upon a well-defined process that can be re-
fied in the test and its results. peated to collect similar data that can be
In our research we have always been con- compared over a period of time. This is im-
scious of a division of roles within the company, portant for the designers, but in the short
often expressed as “shop floor” and “manage- term they are more concerned with the ev-
ment”, and working with a participatory design eryday activities of design and development
perspective we have worked very much from the that they are involved in. Therefore we see
shop floor point of view. During the study, this this as being of greatest interest to PO, who
viewpoint of separate groups emerged and crys- can get a long-term view of the product,
tallised, and two disparate groups became ap- its development, and e.g. comparisons with
parent. We called these groups Designers, repre- competitors, based on a stable and standard-
sented by e.g. interaction designers and system ised method.
and interaction architects, representing the shop • Working information – The test produces
floor perspective, and Product Owners, includ- working information quickly. Directly after
ing management, product planning, and market- the short period of testing that is the sub-
ing, representing the management perspective. ject of this case study, before the data was
When regarding this in light of the Agile collated in the spreadsheets, the test lead-
manifesto, we began to see how different groups ers met and discussed and agreed upon their
may have an interest in different factors of the findings. They could present the most im-
framework and the results that it can produce, portant qualitative findings to system and
and it became a point of interest to see how these interaction architects within the two organ-
factors related to the manifesto and which of the isations 14 days after the testing began,
groups, Designers (D) or Product Owners (PO), and changes in the implementation were re-
is mainly interested in each particular item in quested soon after that. An advantage of do-
the manifesto. The case study data was anal- ing the testing in-house is having access to
ysed on the basis of these emerging thoughts. the test leaders, who can explain and clarify
Where the groups were found to fit on the scale what has happened and the implications of
is marked in bold text in the paragraphs that it. This is obviously of primary interest to D.
follow. One of the items is changed from “Work- • Comprehensive documentation – The
ing software” to “Working information” as we documentation consists mainly of spread-
see the information resulting from the testing sheets containing metrics and qualitative
process as a metaphor for the software that is data. Metrics back up qualitative data and
produced in software development. open up ways to present test results that
• Individuals and interactions – The test- can be understood without having to include
ing process is dependent on the individuals contextual information. They make test re-
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 127

sults accessible for new groups. The quanti- • Following a plan – From a short-term per-
tative data gives statistical confirmation of spective, this is important for D, but since
the early qualitative findings, but are re- they work in a rapidly changing situation, it
garded as most useful for PO, who want fig- is more important for them to be able to re-
ures of the findings that have been reached. spond to change. This is however important
There is less pressure of time to get these for PO who are responsible for well func-
results compiled, as the critical findings are tioning strategies and long-term operations
already being implemented. The metrics can in the company.
be subject to stringent analysis to show com-
parisons and correlations between different 5.1. On Opposite Sides of the Spectrum
factors. In both organisations there is begin-
ning to be a demand for Key Performance In this analysis, we found that “Designers”, as in
Indicators for usability, and although it is the agile manifesto, are interested in the items
still unsure what these may consist of, it is on the left, rather than the items on the right
still an indication of a trend that comes from (see Figure 2). We see this as being “A De-
PO level. signer’s Manifesto”. “Product Owners” are more
• Customer collaboration – in the testing interested in the items on the right. Boehm char-
procedure it is important for the testers to acterised the items on the right side as being
have easy access to individuals, to gain in- “An Auditor Manifesto”[6]. We see it as be-
formation about customer needs, end user ing “A Product Owner’s Manifesto”. This is of
patterns, etc. The whole idea of the test is course a sliding scale; some of the groups may
to collect the information that is needed at be closer to the middle of the scale. Neither of
the current time regarding the product and the two groups is uninterested in what is hap-
its development. How this is done in practice pening at the opposite end of the spectrum, but
is obviously of concern to PO in the long run, as in the agile manifesto, while there is value
but in the immediate day to day operation in the items on one side, they value the items
it is primarily of interest to D. on the other side more. We are conscious of
• Contract negotiation – On a high level the fact that these two groups are very coarsely
it is up to PO to decide what sort of coop- drawn, and that some groups and roles will lie
eration should take place between different between these extremes. We are unsure exactly
organisations and customers, and this is not which roles in the development process belong
something that involves D, so this is seen as to which group, but are interested in looking at
most important for PO. these extremes to see their information require-
• Respond to change – The test is eas- ments in regard to the results of usability test-
ily adapted to changes, and is not particu- ing. On closer inspection it may be found that
larly resource-intensive. If there is a need to none of the groups is on the far side of the spec-
change the format of a test, or a new test trum for all of the points in the manifesto. To
requirement turns up suddenly, it is easy to gain further information regarding this, a case
change the test without having expended ex- study has been performed, which we present in
tensive resources on the testing. It is also the next section.
easy to do a “Light” version of a test to check
a particular feature that arises in the every-
day work of design, and this has happened 6. Follow-up Study of Preferred
several times at UIQ. This is the sort of thing Presentation Methods
that is a characteristic of the day to day work
with interaction design, and is nothing that This study is thus an investigation of attitudes
is of immediate concern for PO, so this is regarding which types of usability findings dif-
seen as D. ferent stakeholders need to see, and their pre-
128 Jeff Winter, Kari Rönkkö

D = Designers Product Owner = PO

D PO
Individuals Processes &
Interactions tools

Working
information D PO Comprehensive
documentation

Customer
collaboration D PO Contract
negotiation

Respond to
change D PO Following a plan

Agile Plan Driven

Figure 2. Groups and their diverging interests


ferred presentation methods. In the previous itative findings of the testing. It shows issues
study we identified two groups of stakeholders that have been found, on the basis of each tester
with different information needs, ranging from and each device, for every use case. Comments
Designers, who appear to want quick results, made by the test participants and observations
often qualitative results rather than quantita- made by the test leader are stored in the spread-
tive results, to Product Owners, who want more sheet.
detailed information, are more concerned with Method 2: A spreadsheet containing
quantitative results, but are not as concerned all “raw” data. All of the quantitative data
with the speediness of the results. To test this from a series of tests. Worksheets contain
theory, we sent a questionnaire to a number of the numerical data collected in a specific se-
stakeholders within UIQ and their customers, ries of tests, which are also illustrated in a
who are participants in the design and develop- number of graphs. The data includes times
ment process. taken to complete use cases, and the results of
A document was compiled illustrating ten attitude assessments.
methods for presenting the results of UTUM Method 3: A Curve diagram. A graph
tests. It contained a brief description of the pre- illustrating a comparison of time taken to com-
sentation method and the information contained plete one particular use case. One curve illus-
in it. The methods were chosen together with trates the average time for all tested telephones,
a usability expert from UIQ who often presents and the other curves show the time taken for
the results of testing to different groups of stake- individual phones.
holders. The methods were chosen on the basis Method 4: Comparison of two factors
of his experience of presenting test results to dif- (basic version). An image showing the results
ferent stakeholders and are the most used and of a series of tests, where three telephones are
most representative ways of presenting results. rated and compared with regard to satisfaction
The methods range from a verbal presentation and efficiency. No more information is given in
of early findings, to spreadsheets containing all this diagram.
of the quantitative or the qualitative data from Method 5: Comparison of two factors
the testing, plus a number of graphical represen- (brief details). The same image as Method 4,
tations of the data. The methods were as follows with a very brief explanation of the findings.
Method 1: The Structured Data Sum- Method 6: Comparison of two factors
mary (the SDS). A spreadsheet with the qual- (more in depth details). The same image as
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 129

Methods 4 and 5. Here, there is a more exten- e-mail, and have been summarised in a spread-
sive explanation of the results, and the findings sheet and then analysed on the basis of a number
made by the test leader. The test leader has also of criteria to see what general conclusions can be
written suggestions for short term and long term drawn from the answers.
solutions to issues that have been found. The method whereby the participants were
Method 7: The “Form Factor” – an asked to prioritize the presentation methods
immediate response. A visual comparison was based on cumulative voting [19], [43], a well
of which telephone was preferred by men and known voting system in the political and the cor-
women, where the participants were asked to porate sphere ([13], [31]), also known as the $100
give an immediate response to the phones, and test or $100 method [20]. Cumulative voting is
choose a favourite phone on the basis of “Form a method that has previously been used in the
Factor” – the “pleasingness” of the design. software engineering context, for e.g. software
Method 8: PowerPoint presentation, requirement prioritization [26] and the prioritiza-
no verbal presentation. A PowerPoint pre- tion of process improvements [3], and in [2] where
sentation, produced by the test leader. A sum- it is compared to and found to be superior to
mary of the main results is presented graphically Analytical Hierarchy Process in several respects.
and briefly in writing. This does not give the The questionnaire was sent to 29 people,
opportunity to ask follow-up questions in direct mostly within UIQ but also to some people
connection with the presentation. from UIQ’s licensees. Only six respondents had
Method 9: Verbal presentation sup- replied to the questionnaire within the stipu-
ported by PowerPoint. A PowerPoint pre- lated time, so one day after the first deadline, we
sentation, given by the test leader. A summary sent out a reminder to the respondents who had
of the main results is presented graphically and not answered. This resulted in a further three
briefly in writing, and explained verbally, giv- replies. After one more week, we sent out a final
ing the listener the chance to ask questions reminder, leading to one more reply. Thus, we
about e.g. the findings and suggestions for im- received 10 replies to the questionnaire, of which
provements. This type of presentation takes the nine were from respondents within UIQ. On fur-
longest to prepare and deliver. ther enquiry, the reason given for not replying
Method 10: Verbal presentation of to the questionnaire was in general the fact that
early results. The test leader gives a verbal the company was in an intensive working phase
presentation of the results of a series of tests. for a planned product release, and that the staff
These are based mainly on his or her impressions at the company could not prioritise allocating
of issues found, rather than an analysis of the the time needed to complete the questionnaire.
metrics, and can be given after having observed This makes it impossible to give full answers to
a relatively small number of tests. This is the the research questions in this study, although it
fastest and most informal type of presentation, helps us to answer some of the questions, and
and can be given early in the testing process. gives us a better understanding of factors that
The participants in the study were chosen to- affect the answers to the other questions. This
gether with the usability expert at UIQ. Some of study helps us formulate hypotheses for further
the participants were people who are regularly work regarding these questions.
given presentations of test results, whilst others The division of roles amongst the respon-
were people who are not usually recipients of the dents, and the number of respondents in the
results, but who in their professional roles could categories was as follows:
be assumed to have an interest in the results of • 2: UI designers
usability testing. They were asked to read the • 2: Product planning
document and complete the task by filling in • 4: System design
their preferences in a spreadsheet. The results • 1: Other (Usability)
of the survey were returned to the researcher via • 1: Other (CTO Office)
130 Jeff Winter, Kari Rönkkö

Distribution of allocated points

70

60

50
Points allocated

40
Mean

30

20

10

0
g

g
er

er
er

er
n

n
in

in
ig

ig

ig

ig
n

n
th

th
nn

nn
s

s
ig

ig
O

O
De

De

De

De
es

es
la

la
tP

tP
ID

ID
em

em

em

em
uc

uc
U

U
st

st

st

st
od

od
Sy

Sy

Sy

Sy
Pr

Pr

Test respondents

Figure 3. Distribution of points allocated per respondent


We have divided the respondents according spread of points differs greatly from person to
to the tentative schema found in the first case person. Although this reflects the actual needs
study, between Designers (D) and Product Own- of the respondent, the way of allocating points
ers (PO). Some respondents were difficult to could also reflect tactical choices, or even the
place in a particular category. The roles the re- respondent’s character. To get more informa-
spondents held in the company were discussed tion about how the choices were made would
with a member of the management staff at UIQ, require a further study, where the respondents
with long work experience at the company, who were interviewed concerning their strategies and
was well versed in the thoughts we had regarding choices.
the difference between Designers and Product In what follows, we use various ways of sum-
Owners. Due to turbulence within the company, marising the data. To obtain a composite picture
it was not possible to verify the respondents’ at- of the respondents’ attitudes, the methods are
titudes to their positions, and would have been ranked according to a number of criteria. Given
difficult, since they were not familiar with the the small numbers of respondents in the study,
terminology that we used, and the meaning of this compilation of results is used to give a more
the roles that we had specified. complex picture of the results, rather than sim-
Five respondents, the two UI designers, the ply relying on one aspect of the questionnaire.
usability specialist and two of the system de- The methods are ranked according to: the total
signers, belonged to the Designer group. The re- number of points that were allocated by all re-
maining five respondents, the two members of spondents; the number of times the method has
product planning, the respondent from the CTO been chosen, and; the average ranking, which is
office and two of the system designers, were rep- the sum of the rankings given by each respon-
resentatives of the group of Product Owners. dent, divided by the number of respondents that
Figure 3 is a box and whisker plot that shows chose the method (e.g., if one respondent chose a
the distribution of the points and the mean method in first place, whilst another respondent
points allocated per person. As can be seen, the chose it in third place, the average position is
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 131

Comparison of different stakeholder groups

10
9
8

Assigned rank
7
6 All respondents
5 Designers
4 Product Owners
3
2
1
0

et
ls)

)
l
t


ils
ba
in

s
in

or
ra

he
SD
l
i

ai
Po

Po

ta
ta

ct
er

ag
Cu det

ls
de
de

fa
.V
er

er

e
di

ce
Th
w

m
w

o
10
e

f
rie

e
Po

(n

Ex
Po
or

or
rv

1.
(b
(m

“F
on
&

2.
8.

on

7.
r is
on
al

3.
rb

r is

pa
r is
Ve

pa

m
pa

Co
m
9.

Co
Co

4.
5.
6.

Method number

Figure 4. Comparison: All, Designers and Product Owners (lowest point is best)

(1+3)/2 = 2). A lower average ranking means a possible to do a follow-up study of the attitudes
better result in the evaluation, although it gives of the participants, so the analysis is based on
no information about the number of times it has the knowledge we have of the operations at the
been chosen. company and the context where they worked. To
Figure 4 shows a summary of the results for verify these results, further studies are needed.
all respondents and a comparison with the re- Method 3: The Curve Diagram. Design-
sults for the group of Designers and Product ers ranked this presentation highly because if
Owners. For all respondents, total rank is very it is interpreted properly, it can give a great
similar to ranking according to points allocated, deal of information about the use case as it
and only two methods (ranked 5 and 6) have is performed on the device. If the device per-
swapped places. Three methods head the list. forms poorly in comparison to the other de-
Two are verbal presentations, one being sup- vices, which can easily be seen by the placement
ported by PowerPoint and the other is purely and shape of the curve, this indicates that there
verbal. are problems that need to be investigated fur-
Even within the two groups of Designers and ther. Use case performance time indicates the
Product Owners, there is little discrepancy be- performance of the device, which can be corre-
tween the results for total rank and position ac- lated with user satisfaction. The shape of the
cording to points awarded. Table 1 illustrates curve illustrates when problems arose. If prob-
the ranks. The Methods are ordered according lems arise when performing the use case, these
to the points allocated by all respondents. The will be visible in the diagram and the Design-
next columns show the composite results, for all ers will know that there are issues that must
respondents and according to the two groups. be attended to.
Cases where the opinions differ significantly be- Product Owners ranked this method poorly
tween Designers and Product Owners (a differ- because the information is on the level of an in-
ence of 3 places or more) will be the subject of dividual use case, whilst they need information
a brief discussion, to see whether we can draw about the product or device at a coarser level of
any tentative conclusions about the presentation detail that is easy to interpret, giving an overall
requirements of the different stakeholder groups. view of the product. They trust that problems
These methods, which are shown in italics in Ta- at this level of detail are dealt with by the De-
ble 1, are Methods 1, 3, 4 and 8. Since the com- signers, whilst they have responsibility for the
pany has now ceased operations, it is no longer product and process as a whole.
132 Jeff Winter, Kari Rönkkö

Table 1. Comparison of ranks: All, Designers and Product Owners


Rankings
Method All respondents Designers Product Difference
Owners between groups
9. Verbal & PowerPoint 1 1 3 2
6. Comparison (more details) 2 3 2 1
10. Verbal 3 4 4 0
8. PowerPoint 4 7 1 6
5. Comparison (brief details) 6 6 5 1
3. Curve diagram 5 2 9 7
1. The SDS 7 5 10 5
4. Comparison (no details) 8 8 5 3
7. “Form factor” 9 9 7 2
2. Spreadsheet 10 10 8 2

Method 8: PowerPoint presentation, no Owners are often schooled in an engineering tra-


verbal presentation. This can contain several dition and are used to this way of presenting
ways of presenting the results of testing. De- information.
signers find this type of presentation of lim- Method 1: The Structured Data Summary
ited use because of the lack of contextual in- (the SDS). Designers value this method of pre-
formation and the lack of opportunity to pose sentation because of the extent and character of
follow-up questions. It gives a lot of information, the contextual information it includes, and be-
but does not contain sufficient details to allow cause of the way the data is visualised. For ev-
Designers to identify the problems or make de- ery device and use case, there is information on
cisions about solutions. Without details of the issues that were observed, and records of com-
context and what happened in the testing situ- ments made by the testers. It is easy to see
ation, it is hard to interpret differences between which use cases were problematic, due to the
devices, to know which problems there are in number of comments written by the test leader,
the device, and thereby difficult to know what and the presence of many user comments also
to do about the problems. The length of time suggests that there are issues that need inves-
taken to produce the presentation also means tigation. The contextual information gives clues
that it is not suitable for Designers, who are to problems and issues that must be dealt with
concerned with fixing product issues as early and gives hints on possible solutions. The effort
in the development process as possible. We also required to read and summarise the information
believe that there is also a difference in “cul- contained in the spreadsheet, leading to a de-
ture” where Designers are still unused to be- gree of cognitive friction, means however that it
ing presented with results in this fashion, and is rated in the middle of the field rather than
cannot translate this easily to fit in with their higher.
work practices. Product Owners rate this method poorly
This type of presentation is of primary in- because they are uninterested in products on
terest to Product Owners because it provides the level of use cases, which this presentation
an overall view of the product in comparison gives provides, and it is difficult to interpret for
to other devices, without including too much the device as a whole. The information is not
information about the context and test situa- adapted to the broad view of the product that
tion. It contains sufficient text, and gives an the Product Owners need. The contextual in-
indication of the status of the product. It is formation is difficult to summarise and does not
also adapted to viewing without the presence give a readily understandable of the device as a
of the test leader, so the recipient can view the whole. Product Owners find it difficult to make
presentation and return to it at will. Product use of the information contained in this spread-
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 133

sheet and thereby rank it as least useful for their the interaction designers are most active, was
needs. best satisfied through the verbal presentations of
Method 4: Comparison of two factors (ba- early results and verbal presentation supported
sic version). The lack of detail and of contextual by PowerPoint, whilst a non-verbal presenta-
information make if difficult for Designers to tion, in conjunction with the metrics data in the
read any information that allows them to iden- spreadsheet and the SDS would be more appro-
tify problems with the device. It simply provides priate later in the project, where the project ac-
them with a snapshot of how their product com- tivities were no longer as dependent on the work
pares to other devices at a given moment. tasks and activities of the interaction designers.
Product Owners ranked this in the middle A second respondent (D) stated that the ver-
of the field. This is a simple way of visualising bal presentations are most appropriate in the re-
the state of the product at a given time, which quirements/design processes. Once the problem
is easy to compare over a period of time, to see domain is understood, and the task is to iterate
whether a device is competitive with the other towards the best solution, the metrics data and
devices included in the comparison. This is typ- the SDS would become more appropriate, be-
ically one of the elements that are included in cause the problem is understood and the quali-
the PowerPoint presentation that Product Own- tative answers are more easily interpreted than
ers have ranked highest (Method 8). However, the qualitative answers.
this particular method, when taken in isolation, Another respondent (PO) wrote that it was
lacks the richness of the overall picture given in important to move the focus from methods that
Method 8 and is therefore ranked as lower. were primarily concerned with verification to-
To summarise these results, we find that the wards methods that could be of assistance in re-
greatest difference between the two groups con- quirements handling, in prioritisation and deci-
cerns the level of detail included in the presen- sion making in the early phases of development.
tation, the ease with which the information can In other words, the methods presented are most
be interpreted, and the presence of contextual appropriate for later stages of a project, and
information in the presentation. Designers pri- there is a lack of appropriate methods for early
oritise methods that give specific information stages.
about the device and its features. Product Own- Given the limited number of answers to these
ers prioritise methods that give more overarch- questions, it is of course difficult to draw any
ing information about the product as a whole, general conclusions, although it does appear to
and that is not dependent on including contex- be the case that the verbal results are most im-
tual information. portant in the early stages of a project, to those
who are involved in the actual work of design-
6.1. Changing Information Needs ing and developing the product, whilst the more
quantitative data is more useful as reference ma-
Participants were informed that the survey was terial in the later stages of a project, or further
mainly focused on the presentation of results projects.
that are relevant during ongoing design and de-
velopment. We pointed out that we believed that 6.2. Attitudes Towards the Role of
different presentation methods may be impor- the Test Leader
tant in the starting and finishing phases of these
processes. We stated that comments regarding The respondents were asked to judge whether
this would be appreciated. Three respondents or not they would need the help of the test
wrote comments about this factor. leader in order to understand the presentation
One respondent (D) stated that the infor- method in question. Two of the respondents sup-
mation needed in their everyday work as a UI plied no answers to this question, and one of
designer, in the early stages of projects when the respondents only supplied answers regarding
134 Jeff Winter, Kari Rönkkö

methods 9 and 10, which presuppose the pres- presentation is found in several variants in the
ence of the test leader and are therefore excluded study, and those with more explanatory detail
from the analysis. If we exclude these three re- are more popular than those with fewer details.
spondents from the summary, there were seven Following these is a block of graphical presenta-
respondents, of whom four gave answers for all tion methods that are not designed to be depen-
eight methods, one gave five answers, and two dent on verbal explanations. Amongst these is
gave three answers. The three respondents who a spreadsheet containing qualitative data about
did not answer these questions were all Product the test results. At the bottom of the list is a
Owners, meaning that there were five designers spreadsheet that contains the quantitative data
and two Product Owners who answered these from the study. This presentation differs in char-
questions. acter from the SDS, the spreadsheet containing
Analysis of the answers showed that, with qualitative data, since the SDS offers a view of
the exception of Method 7 the methods that are the data that allows the identification of prob-
primarily graphical representations of the data lem areas for the tested devices. This illustrates
do not appear to require the presence of the the fact that even a spreadsheet, if it offers a
test leader to explain the presentation. Method graphical illustration of the data that it con-
7 was found to require the presence of the test tains, can also be found useful for stakeholders,
leader, presumably because it was not directly even without an explicit explanation of the data
concerned with the operations of the company. that it contains.
The spreadsheets however, one containing qual- Concerning the second question, we could
itative and one containing quantitative data, identify differences between the two groups of
both require the presence of a test leader to ex- stakeholders, and the greatest difference be-
plain the contents. tween the groups concerns the level of detail in-
Given the fact that the Designers were in the cluded in the presentation, the ease with which
majority, there were few obvious differences be- the information can be interpreted, and the pres-
tween Designers and Product Owners, although ence of contextual information in the presenta-
the most consistent findings here regard meth- tion. Designers prioritise methods that give spe-
ods 4, 5, and 6, variations of the same presenta- cific information about the device and its fea-
tion method with different amounts of written tures. Product Owners prioritise methods that
information. Here, Product Owners needed the give more overarching information about the
test leader to be present whilst Designers did product as a whole, and that is not dependent
not. on including contextual information. We also
found that both groups chose PowerPoint pre-
6.2.1. In Summary sentations as their preferred method, but that
the Designers chose a presentation that was pri-
We now summarise the results of the research marily verbal, whilst Product Owners preferred
questions posed in this case study. The answer the purely visual presentation. Another aspect
to the first question, whether any presentation of this second question is the attitude towards
methods are generally preferred, is that the re- the role of the test leader, where there were
spondents as a whole generally preferred verbal few obvious differences between Designers and
presentations. The primarily verbal methods are Product Owners. The most consistent findings
found in both first and third place. The most here concern variations of the same presenta-
popular form was a PowerPoint presentation tion method with different amounts of writ-
that was supported by verbal explanations of the ten information. Here, Product Owners needed
findings. In second place is a non-verbal illustra- the test leader to be present whilst Designers
tion showing a comparison of two factors, where did not.
detailed information is given explaining the di- Regarding the third question, if there are
agram and the results it contains. This type of methods that are lacking in the current presen-
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 135

tation methods, it was found that taking into for quality assurance in development and design
account and visualising aspects of UX is be- processes.
coming more important, and the results indi-
cate that testing must be adapted to capture
these aspects more implicitly. There is also a 7. Discussion
need for a composite presentation method com-
bining the positive features of all of the cur- We begin by discussing our results in relation
rent methods – however, given the fact that to academic discourses, to answer our first re-
there do appear to be differences between in- search question: How can we balance demands
formation needs, it may be found to be dif- for agile results with demands for formal results
ficult to devise one method that satisfies all when performing usability testing for quality as-
groups. surance? We also comment upon two related dis-
No clear answers can be found for the fourth courses from the introductory chapter, i.e. the
question, whether information needs, and pre- relation between quality and a need for coop-
ferred methods change during different phases eration between industry and research, and the
of a design and development project. How- relationship between quality and agility.
ever, the replies suggest that the required meth- Since we work in a mass-market situation,
ods do change during a project, that more and the system that we are looking at is too
verbally oriented and qualitative presentations large and complex for a single customer to spec-
are important in early stages of a project, in ify, the testing process must be flexible enough
the concrete practice of design and develop- to accommodate the needs of many different
ment, and that quantitative orientated methods stakeholders. The product must appeal to the
are important in later stages and as reference broadest possible group, so it is difficult for
material. customers to operate in dedicated mode with
Regarding the final question, whether results development team, with sufficient knowledge
can be presented without the presence of the to span the whole range of the application,
test leader, we find that the methods that are which is what an agile approach requires to
primarily graphical representations of the data work best [5]. In this case, test leaders work
do not appear to require the presence of the test as proxies for the user in the mass market.
leader to explain the presentation. The spread- We had a dedicated specialist test leader who
sheets however, containing qualitative and quan- brought in the knowledge that users have, in
titative data, both require the presence of a test accordance with Pettichord [24]. Evidence sug-
leader to explain the contents. gests that drawing and learning from experi-
To verify these results, further studies are of ence may be as important as taking a ratio-
course needed. Despite the small scale of this nal approach to testing [21]. The fact that the
study, the results give a basis for performing a test leaders involved in the testing are usabil-
further study, and allow us to formulate a hy- ity experts working in the field in their every-
pothesis for following up our results. In line with day work activities means that they have con-
the rest of the work performed as part of this siderable experience of their products and their
research, we feel that the this work should be a field. They have specialist knowledge, gained
survey based study in combination with an in- over a period of time through interaction with
terview based study, in order to verify the results end-users, customers, developers, and other par-
from the survey and gain a depth of information ties that have an interest in the testing process
that is difficult to obtain from a purely survey and results. This is in line with the idea that
based study. agile methods get much of their agility from
We continue by discussing the results of the a reliance on tacit knowledge embodied in a
two case studies in relation to the industrial situ- team, rather than from knowledge written down
ation where we have been working, and the need in plans [5].
136 Jeff Winter, Kari Rönkkö

It would be difficult to gain acceptance of case is a confirmation of [21]. Here, it is based


the test results within the whole organisation on the dynamics of customer relationships, us-
without the element of formalism. In sectors ing limited effort in the most effective way, and
with large customer bases, companies require the timing of software releases to the needs of
both rapid value and high assurance. This can- customers as to which features to release. The
not be met by pure agility or plan-driven disci- present paper illustrates how this happens in in-
pline; only a mix of these is sufficient, and or- dustry, since the agile type of testing studied
ganisations must evolve towards the mix that here is not according to “best practice” but is
suits them best [5]. In our case this evolution a complement that meets organisational needs
has taken place during the whole period of the for a mass-market product in a rapidly chang-
research cooperation, and has reached a phase ing marketplace, with many different customers
where it has become apparent that this mix is and end-users.
desirable and even necessary. To summarise our second case study, the
In relation to the above, Osterweil [23] states findings presented here are the results of a pre-
that there is a body of knowledge that could liminary study that indicates the needs of dif-
do much to improve quality, but that there is ferent actors in the telecom industry. They are
“a yawning chasm separating practice from re- a validation of the ways in which UTUM re-
search that blocks needed improvements in both sults have been presented. They provide guide-
communities”, thereby hindering quality. Prac- lines to improving the ways in which the results
tice is not as effective as it must be, and research can be presented in the future. They are also a
suffers from a lack of validation of good ideas confirmation of the fact that there are different
and redirection that result from serious use in groups of stakeholders, the Designers and Prod-
the real world. This case study is part of a suc- uct Owners found in our first case study, who
cessful cooperation between research and indus- have different information requirements. Further
try, where the results enrich the work of both studies are obviously needed, but despite the
parts. Osterweil [23] also requests the identifi- small scale of this study, it is a basis for perform-
cation of dimensions of quality and measures ing a wider and deeper study, and it lets us for-
appropriate for it. The particular understand- mulate a hypothesis regarding the presentation
ing of agility discussed in our case study can of testing results. We feel that the continuation
be an answer to this request. The agility of the of this work should be a survey based study in
test process is in accordance with the “good or- combination with an interview based study.
ganisational reasons” for “bad testing” that are
argued by Martin et al [21]. These authors state
that testing research has concentrated mainly 8. Conclusions and Further Work
on improving the formal aspects of testing, such
as measuring test coverage and designing tools In the usability evaluation framework, we have
to support testing. However, despite advances in managed to implement a working balance be-
formal and automated fault discovery and their tween agility and plan driven formalism to sat-
adoption in industry, the principal approach for isfy practitioners in many roles. The industrial
validation and verification appears to be demon- reality that has driven the development of this
strating that the software is “good enough”. test package confirms the fact that quality and
Hence, improving formal aspects does not nec- agility are vital for a company that is working
essarily help to design the testing that most ef- in a rapidly changing environment, attempting
ficiently satisfies organisational needs and min- to develop a product for a mass market. There
imises the effort needed to perform testing. In is also an obvious need for formal data that can
the results of this work, the main reason for support the quick and agile results. The UTUM
not adopting “best practice” is precisely to ori- test package demonstrates one way to balance
ent testing to meet organisational needs. Our demands for agile results with demands for for-
Satisfying Stakeholders’ Needs – Balancing Agile and Formal Usability Test Results 137

mal results when performing usability testing for References


quality assurance. The test package conforms to
both the Designer’s manifesto, and the Product [1] K. Beck. Extreme Programming Explained. Ad-
Owner’s manifesto, and ensures that there is a dison Wesley, Reading, MA, 2000.
[2] P. Berander and P. Jönsson. Hierarchical cu-
mix of agility and formalism in the process.
mulative voting (HCV) – prioritization of re-
The case in the present paper confirms quirements in hierarchies. International Journal
the argumentation emphasizing ’good organi- of Software Engineering & Knowledge Engineer-
zational reasons’, since this type of testing is ing, 16(6):819, 2006.
not according to “best practice” but is a com- [3] P. Berander and C. Wohlin. Identification of
plement that meets organisational needs for key factors in software process management –
a mass-market product in a rapidly chang- a case study. In 2003 International Symposium
on Empirical Software Engineering, ISESE ’03,
ing marketplace, with many different customers
pages 316–325, Rome, Italy, 2003.
and end-users. This is partly an illustration of [4] B. W. Boehm. A spiral model of software
the chasm between industry and research, and development and enhancement. Computer,
partly an illustration of how agile approaches 21(5):61–72, 1988.
are taken to adjust to industrial reality. In re- [5] B. W. Boehm. Get ready for agile methods,
lation to the former this case study is a suc- with care. Computer, 35(1):64–69, 2002.
cessful cooperation between research and in- [6] B. W. Boehm. Keynote address, 5th Workshop
dustry. It has been ongoing since 2001, and on Software Quality (WoSQ), 2007.
[7] J. Brooke. System usability scale (SUS): a
the work has an impact in industry, and re-
Quick-and-Dirty method of system evaluation
sults enrich the work of both parts. The in- user information, 1986.
clusion of Sony Ericsson in this case study [8] BTH. UIQ, usability test. http://www.youtube.
gave even greater possibilities to spread the com/watch?v=5IjIRlVwgeo, Aug. 2008.
benefits of the cooperative research. More and [9] A. Cockburn and J. Highsmith. Agile Software
more hybrid methods are emerging, where ag- Development. The Agile Software Development
ile and plan driven methods are combined, Series. Addison-Wesley, Boston, 2002.
[10] Y. Dittrich, C. Floyd, and R. Klischewski.
and success stories are beginning to emerge.
Doing Empirical Research in Software Engi-
We see the results of this case study and the neering: finding a path between understanding,
UTUM test as being one of these success sto- intervention and method development, pages
ries. How do we know that the test is suc- 243–262. MIT Press, 2002.
cessful? By seeing that it is in successful use [11] Y. Dittrich, K. Rönkkö, J. Erickson, C. Hans-
in everyday practice in an industrial environ- son, and O. Lindeberg. Co-operative method
ment. We have found a successful balance be- development: Combining qualitative empirical
tween agility and formalism that works in in- research with method, technique and process
improvement. Journal of Empirical Software
dustry and that exhibits qualities that can be
Engineering, 2007.
of interest to both the agile and the software [12] Y. Dittrich, K. Rönkkö, O. Lindeberg, J. Er-
engineering community. ickson, and C. Hansson. Co-operative method
Acknowledgements This work was partly development revisited. SIGSOFT Softw. Eng.
funded by The Knowledge Foundation in Swe- Notes, 30(4):1–3, 2005.
den under a research grant for the software de- [13] J. N. Gordon. Institutions as relational in-
velopment project “Blekinge – Engineering Soft- vestors: A new look at cumulative voting.
Columbia Law Review, 94(4):124–193, 1994.
ware Qualities”, www.bth.se/besq. Thanks to
[14] M. J. Harrold. Testing: A roadmap. In Proceed-
the participants in the study and to my col- ings of the Conference on the Future of Software
leagues in the U ODD research group for their Engineering, Limmerick, Ireland, 2000. ACM
help in data analysis and structuring my writing. Press.
Thanks also to Gary Denman for permission to [15] M. Hassenzahl, E. L. Law, and E. T. Hvannberg.
use the Structured Data Summary. User experience – towards a unified view. In
138 Jeff Winter, Kari Rönkkö

UX WS NordiCHI’06, pages 1–3, Oslo, Norway, In 9th international conference on Software En-
2006. cost294.org. gineering, pages 328–338, Monterey, California,
[16] M. Hassenzahl and N. Tractinsky. User experi- United States, 1987. IEEE Computer Society
ence – a research agenda. Behaviour & Infor- Press.
mation Technology, 25(2):91–97, 2006. [31] J. Sawyer and D. McRae, Jr. Game the-
[17] International Organization for Standardization. ory and cumulative voting in Illinois: 1902–
ISO 9241-11 (1998): Ergonomic requirements 1954. The American Political Science Review,
for office work with visual display terminals 56(4):936–946, 1994.
(VDTs) – part 11: Guidance on usability. Tech- [32] D. Schuler and A. Namioka. Participatory De-
nical report, 1998. sign – Principles and Practices, volume 1st.
[18] International Organization for Standardization. Lawrence Erlbaum Associates, Hillsdale, New
ISO 9126-1 software engineering – product qual- Jersey, 1993.
ity – part 1: Quality model, 2001. [33] I. Sommerville. Software Engineering, volume 8.
[19] Investopedia.com. Cumulative voting. Addison Wesley, 1982.
http://www.investopedia.com/terms/c/ [34] D. Talby, O. Hazzan, Y. Dubinsky, and
cumulativevoting.asp, Apr. 2009. A. Keren. Agile software testing in a Large-Scale
[20] D. Leffingwell and D. Widrig. Managing Soft- project. IEEE Software, 23(4):30–37, 2006.
ware Requirements: A Use Case Approach, vol- [35] The Agile Alliance. The agile manifesto. http:
ume 2nd. Addison Wesley, 2003. //agilemanifesto.org/, Apr. 2009.
[21] D. Martin, J. Rooksby, M. Rouncefield, and [36] The Agile Alliance. Principles of ag-
I. Sommerville. ‘Good’ organisational reasons ile software. http://www.agilemanifesto.org/
for ‘Bad’ software testing: An ethnographic principles.html, Apr. 2009.
study of testing in a small software company. [37] U-ODD. Use-Oriented Design and Devel-
In ICSE ’07, Minneapolis, MN, 2007. IEEE. opment. http://www.bth.se/tek/u-odd, Apr.
[22] E. Mumford. Advice for an action re- 2009.
searcher. Information Technology and People, [38] UIQ Technology. Company information. http:
14(1):12–27, 2001. //uiq.com/aboutus.html, June 2008.
[23] L. Osterweil. Strategic directions in software [39] UIQ Technology. UIQ Technology Usability
quality. ACM Computing Surveys (CSUR), Metrics. http://uiq.com/utum.html, June 2008.
28(4):738–750, 1996. [40] UIQ Technology. UTUM website. http://uiq.
[24] B. Pettichord. Testers and developers think dif- com/utum.html, June 2008.
ferently. STGE magazine, Vol. 2(Jan/Feb 2000 [41] UXEM. User eXperience Evaluation Meth-
(Issue 1)), 2000. ods in product development (UXEM).
[25] S. L. Pfleeger and J. M. Atlee. Software Engi- http://www.cs.tut.fi/ihte/CHI08_workshop/
neering, volume 3rd. Prentice Hall, Upper Sad- slides/Poster_UXEM_CHI08_V1.1.pdf, June
dle River, NJ, 2006. 2008.
[26] B. Regnell, M. Höst, J. N. och Dag, P. Bere- [42] UXNet: the user experience network. http:
mark, and T. Hjelm. An industrial case study //uxnet.org/, June 2008.
on distributed prioritisation in Market-Driven [43] Wikipedia. Cumulative voting. http:
requirements engineering for packaged software. //en.wikipedia.org/wiki/Cumulative_voting,
Requirements Engineering, 6(1):51–62, 2001. Apr. 2009.
[27] C. Robson. Real World Research, volume 2nd. [44] J. Winter, K. Rönkkö, M. Ahlberg, M. Hinely,
Blackwell Publishing, Oxford, 1993. and M. Hellman. Developing quality through
[28] K. Rönkkö. Making Methods Work in Software measuring usability: The UTUM test package.
Engineering: Method Deployment as a Social In ICSE 2007, 5th Workshop on Software Qual-
achievement. PhD thesis, Blekinge Institute of ity, at ICSE 2007, 2007.
Technology, School of Engineering, 2005. Dis- [45] WoSQ. Fifth workshop on software quality, at
sertation Series No. 2005:04; Doctoral Thesis. ICSE 07. http://attend.it.uts.edu.au/icse2007/,
[29] K. Rönkkö. Ethnography. In P. Laplante, edi- June 2008.
tor, Encyclopedia of Software Engineering. Tay- [46] R. K. Yin and S. Robinson. Case Study Re-
lor and Francis Group, New York, 2008. search – Design and Methods, volume 3rd of
[30] W. W. Royce. Managing the development of Applied Social Research Methods Series. SAGE
large software systems: concepts and techniques. publications, 5, 2003.

Anda mungkin juga menyukai