1 s2.0 S0950584912002078 Main

Information and Software Technology 55 (2013) 651659
Contents lists available at SciVerse ScienceDirect
Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof
Concept location using program dependencies and information retrieval (DepIR)

Maksym Petrenko a,, Vclav Rajlich b
a
b
Earth System Science Interdisciplinary Center, University of Maryland, College Park, USA
Department of Computer Science, Wayne State University, Detroit, USA
a r t i c l e
i n f o
a b s t r a c t
Article history:
Received 21 June 2012
Received in revised form 28 September 2012
Accepted 30 September 2012
Available online 13 October 2012
Context: The functionality of a software system is most often expressed in terms of concepts from its
problem or solution domains. The process of nding where these concepts are implemented in the source
code is known as concept location and it is a prerequisite of software change.
Objective: We investigate a static approach to concept location named DepIR that combines program
dependency search (DepS) with information retrieval-based search (IR). In this approach, programmers
explore the static program dependencies of the source code components retrieved by the IR search
engine.
Method: The paper presents an empirical study that compares DepIR with its constituent techniques. The
evaluation is based on an empirical method of reenactment that emulates the steps of concept location
for 50 past changes mined from software repositories of ve software systems.
Results: The results of the study indicate that DepIR signicantly outperforms both DepS and IR.
Conclusion: DepIR allows developers to perform concept location efciently. It allows nding concepts
even with queries that do not rank the relevant software components highly. Since formulating a good
query is not always easy, this tolerance of lower-quality queries signicantly broadens the usability of
DepIR compared to the traditional IR.
2012 Elsevier B.V. All rights reserved.
Keywords:
IR
Dependency search
Concept location
1. Introduction
Concept (or feature) location is one of the most common program comprehension activities and it identies parts of the source
code that implement particular concepts. Concept location has
been addressed in the context of many software engineering tasks
[1,2]; it is an important and well dened part of the software
change process [3].
In this process, concept location starts with the change request,
which can be a new feature request or a bug report, and identies
the location in the code where the core of the change needs to be
made. While concept location identies the specic code snippet
where the change starts, the full extent of the software change is
established via impact analysis, which uses the results of concept
location as an input and identies all the remaining locations in
source code that should be modied. These two activities are recognized to be complementary to each other yet they are methodologically different [4,5].
Developers start concept location by extracting relevant concepts from the change request. Then, they locate them in the
Corresponding author. Tel.: +1-301-614-5830.

E-mail addresses:
(V. Rajlich).
maksym@umd.edu
(M.
Petrenko),
rajlich@wayne.edu
0950-5849/$ - see front matter 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.infsof.2012.09.013
source code, which is a search process, and may be difcult and

time consuming in large software. Concept location tools and techniques assist developers in this activity. Concept locations tools
should make use of a variety of available information in assisting
the developer during the search.
In this paper, we propose a concept location technique, named
Dependency Search combined with Information Retrieval (DepIR),
which guides developers in two ways: (1) by supporting searching
the textual information in the source code (i.e., identiers and
comments) via the use of an Information Retrieval based search engine (IR); and (2) by supporting dependency search (DepS) to navigate the source code based on automatically inferred static
program dependencies.
We evaluate DepIR using the process of automated reenactment, where an algorithm repeats past changes and simulates actions of programmers during the concept location approach, while
using a specic concept location technique. The results of the study
in ve systems indicate that DepIR offers signicant advantages
when compared to the traditional DepS and IR approaches.
The rest of the paper is organized in the following way. Section 2
summarizes the previous work. Section 3 reviews the relevant
background and presents the methodology and the tool support
behind DepIR. In Section 4, DepIR is evaluated through an empirical study in open source software, while Section 5 contains conclusions and future work.
652
M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659
2. Related work
2.2. Concept location by searching program dependencies
Based on types of analyses and underlying information, existing approaches to concept location can be classied as static, dynamic, and hybrid [6]. In this work, we study techniques that
belong to the category of static concept location [2,7]. One of
the most popular approaches to static concept location is textual
pattern matching, an approach where the programmers use a tool
(e.g. grep [8]) to search the source code for patterns that match
queries, formulated as regular expressions. Petrenko et al. described how ontology fragments can be used to formulate and
improve the quality of these queries [9]. IR-based approaches
provide an alternative to the pattern matching tools and rely on
queries, formulated in terms of natural language; we provide a
detailed review and discussion of IR approaches in Section 2.1.
Other static techniques are based on using program dependencies
that guide the search process; we discuss these techniques in
Section 2.2.
Source code components (e.g., classes, methods, elds, etc.) relate to one another via data and control ow. These relationships
between the software components can be modeled as graph, where
the nodes represent the components and the edges represent
dependencies between these components (e.g., method calls,
inheritance, declarations of abstract data types, etc.). Program
dependency search (DepS) is a concept location technique where
the programmers traverse such graph of a software system in a
depth-rst manner, starting from a known location [12]. The following steps describe the process of DepS. The rst step is performed once, while the second step iterates until the concept is
located.
2.1. Concept location using information retrieval

Information retrieval-based concept location (IR) is an extension and improvement over the traditional text search, and is
based on the analysis of relationships between words and documents in large bodies of text (i.e., statically parsed documents from
source code) [10,11]. The following steps summarize the IR:
1. Indexing the software system: The source code of the
software system is parsed to extract software components at a specic granularity (e.g., methods or classes).
Then, a corpus of the software system is created, where
the documents in the corpus correspond to the source
code of the extracted components. After this, each document is analyzed to identify meaningful words; this
may include splitting composite identiers, removing
language-specic stop-words, and so forth.
2. Formulating a query: The developers identify a set of terms
that are relevant to the target concept (e.g., save document, add bookmark, etc.). One or more of these terms
form the initial query.
3. Ranking retrieved documents: Once the query is formulated and executed, the IR engine identies relevant documents (i.e., parts of the source code) and presents them
to the developers. Each document is assigned an IR relevance score on the scale from 0 (not relevant) to 1 (most
relevant) and the results returned by the search engine
are ranked in descending order based on this score.
4. Investigating the retrieved documents: The retrieved documents are inspected according to their ranks. If an
inspected document is a part of the target concept, the
search procedure is over. Otherwise, the investigation
continues to a document with the next highest rank.
5. Reformulating the query: While reviewing a document, if
the programmers acquired knowledge that can help to
formulate a better query, the query is updated and all
the documents are re-ranked.
The effectiveness of the search depends on developers knowledge of the system and its domain, and their ability to formulate
good queries, as well as on the quality of the source code comments and identiers. Framing concept location as a text search
problem simplies the solution, but it ignores much of the inherent
structure of the source code. DepIR addresses this issue by utilizing
the dependencies between software elements to navigate the results (see Section 3 for details).
1. Identifying a starting point: DepS starts with an entry point

of the system; for example, in a Java system, this can be
one of the main or init methods. Alternatively, the
search can start at a node known by the programmer to
be relevant to the current task.
2. Browsing program dependencies: From the starting point,
programmers inspect the nodes of the dependency graph
in a depth-rst order and decide whether these nodes
implement the target concept. The node that is currently
inspected is referred to as the focus node. If the node contains the concept, the search stops. If the programmers
decide that the focus node does not implement the concept but it is related to this concept, the search continues
to one of the adjacent nodes in the dependency graph.
On the other hand, if the focus node is irrelevant to the
concept, the search backtracks to an earlier focus node.
One of the advantages of DepS is that it does not require knowledge of vocabulary of the system, as it operates on the structural
information contained in the dependency graph alone; thus, it is
suitable for locating concepts even in unknown systems with absent comments and poor identier naming conventions. On the
other hand, nding a good starting point for the search and choosing between the neighbors of a node are difcult problems.
2.3. Hybrid concept location techniques

Hybrid static methods combine different types of static information. Zhao et al. described a fully automatic Snia concept location approach that combines structural and textual information
[13]. In this approach, each feature implemented in a program is
assigned a set of potentially relevant methods, based on call relationships of the methods as well as their IR similarity to the
description of the feature. Then, methods truly relevant to a specic feature are determined as a set-theoretical difference between
the methods in its potentially relevant set and potentially relevant
sets of all other features. In this way, Snia uses dependency information to identify methods that implement the concept but do not
contain IR search terms. As Snia requires description of all features implemented in the system, it may be less practical in the
systems, where such descriptions are not available or are incomplete; this is different from DepIR that needs only a description
of a single feature of interest (i.e., the change request).
Shao et al. discussed an approach where IR ranks of relevant
methods are increased if their direct neighbors in dependency
graph are also relevant [14]. A similar approach is discussed by
Hayashi et al., where IR ranks of relevant methods, collected by a
dynamic tracing technique, are adjusted based on the number
and the distance in the dependency graph of their relevant neighbors, both direct and transitive [15].
Unlike DepIR, these approaches do not support inspecting

methods that have a low IR rank; in other words, these approaches
do not allow programmers to explore the dependency graph, but
rather use its dependencies to only adjust IR ranks.
Eaddy et al. explain project CERBERUS, where a limited number of methods returned by dynamic tracing and IR techniques are
used as a seeds for a static code analysis tool [16,17]. The tool
analyzes program dependencies to identify all program elements
that would have to be either removed or modied if the seed methods are removed. Together with the seeds, the identied methods
comprise a subgraph of the complete dependency graph, representing a potential implementation of the concept. Revelle and
Poshyvanyk discuss a similar approach, where the subgraph is
computed by identifying methods that have a high textual similarity to the seeds and appear in a dynamically created trace [18].
Ratanotayanon et al. present a tool where the dependency graph
is used to expand a set of seed methods that are obtained by ranking CVS commits with IR [19]. Hayashi et al. discuss an approach
where a set of seed methods acquired by IR is expanded by neighbors in dependency graph that are neighbors of the seed methods
in a supplied ontology fragment [20]. Shepherd et al. used natural
language processing (NLP) techniques to expand user queries with
terms glanced from source code artifacts and dependencies, in order to produce potentially relevant subgraphs of the dependency
graph [21]. These approaches use source code dependencies to expand an initial set of relevant methods suggested by IR or a programmer. Nonetheless, to explore the expanded set of methods,
the programmers still need to use other concept location techniques where they navigate program dependencies.
Concept slicing techniques decompose a program into a set of
modules or executable slices that represent conceptually cohesive
fragments of the programs domain [22,23]. The techniques start
by dening a domain model and identifying a set of seed statements based on the concept assignment process, where the concepts from the domain model are mapped onto the source code
statements. A program dependency graph is then used to construct
program slices that originate at the seed statements and are relevant to the concepts in the domain model. As in the previous group
of approaches, other techniques have to be used to inspect the
slices and locate the concept. Furthermore in contrast to the IRbased techniques like DepIR, the concept assignment process typically requires that concepts are mapped onto non-overlapping
segments of code and this makes concept slicing challenging when
overlapping concepts are present [24].
Hill et al. describe a tool Dora that uses call graph to expand
an initial set of user-supplied seed methods based on an IR query
[25]. First, the tool builds a subgraph of the call graph by recursively including those neighbors of the seed methods that have
IR relevance score exceeding the exploration threshold. In this
subgraph, the methods with the IR score exceeding an even higher
relevance threshold are presented to the user as relevant to the
concept. Tool Dora does not require user input when exploring
the dependency graph, but it requires a manual identication of
the seed methods and is less suitable for identifying concepts in
methods that have a low IR relevance score.
Recent research has studied the information needs of developers during software maintenance. One such study revealed that
in most cases, when faced with changes to existing software,
developers began by searching for relevant code both manually
and using search tools . . . When developers found relevant code,
they followed its incoming and outgoing dependencies, often
returning to it and navigating its other dependencies [26]. The
ndings are consistent with later studies that found that developers spend much of their time on activities to nding focus points
and then expanding the focus points during software change
[27]. These and other studies [28] strengthen the view that static
653
concept location is an instance of information seeking behavior

[29]. Previous studies have also found that during a program
investigation task, a methodical investigation of the code of a system is more effective than an opportunistic approach [30].
3. DepIR
DepIR is a hybrid concept locations techniques that leverages
the strengths of each constituent technique against the weaknesses of the other. Compared to the stand-alone DepS and IR, DepIR gives the users more options how to conduct search and hence
it allows locating concepts that can be missed by these two techniques alone.
One of the weaknesses of the IR is that it constraints the user to
navigate the software in the order of the results returned to a given
user query. This ordering, while semantically relevant, is void of
structural information. Thus, two software components A and B,
which are structurally strongly related, may not appear adjacent
in the ranked list of results. While investigating A, the developer
should be aware of or at least pointed to B. In DepIR, structural
relationships are preserved and presented to the user together
with the text retrieval results.
On the other hand, when searching the dependency graph using
DepS, some of dependencies are hard to detect by available program analysis tools, for example database dependencies, reection,
and so forth. Hence, the dependency graph may not contain a path
to the node implementing the concept, making this node (location)
hard to reach. Moreover in large systems, the path in the graph
from the entry point could be long and developers may need to
backtrack several times. DepIR overcomes these limitations by
using IR to query source code and identify a relevant starting point
for the search, and also by ranking neighbors of nodes in the
dependency graph by their relevance to the user query.
DepIR combines IR with DepS. It guides developers by maintaining a global context and a local context. The global context relates to
the change request and is captured in the user query that is handled by information retrieval. The local context relates to the
source code dependencies and allows browsing to the immediate
neighbors.
Although a different dependency graph can be used, we will describe DepIR based on the Method Dependency Graph (MDG). In
MDG, the nodes of the graph are program methods and dependencies are calls between these methods as identied by traversing
Eclipse Abstract Syntax Tree (AST) [31] and extracting method
bindings [32].
The following steps summarize DepIR and are illustrated by
Fig. 1:
1. Initial indexing of documents and extraction of program dependencies: The source code of the software system is parsed and
all methods and their dependencies dened in the code are
extracted to build MDG as explained above. The corpus of the
software system is created and indexed with an IR technique
as explained in Section 2.1. This part of the process is fully automatic (i.e., no user input is required) and performed every time
major changes applied to the system.
2. Ranking methods: The developers formulate a text query that
describes the target concept based on their knowledge of the
code and the description of the change request. During concept
location, they may learn new facts about the code and return to
this step and reformulate the query. Once the query is formulated, the IR engine ranks all methods in descending order
based on their relevance (i.e., textual similarity) to the query.
3. Selecting a promising method: The programmer inspects these
ordered results and determines their relevance to the change
654
(1)
11
12
11
12
11
12
10
10
10
(2)
(3)
(4)
Fig. 1. Example of DepIR workow: (1) Indexing methods and extracting their dependencies; circles represent methods and lines represent dependencies. (2) Formulating a
query and ranking the methods; numbers indicate the ranks of the methods. (3) Investigating the results of the query; the method with the highest rank is shaded in gray. (4)
Exploring neighborhood of the highly-ranked methods using dependency search, as highlighted by the bold edges, and locating the concept in the methods shaded in black.
task. For each method, the following decisions and actions are
made:
a. If the method is the place that will change, then concept
location ends and the developer continues with the next
phase of the software change (not discussed in this paper).
b. If the method is irrelevant to the task at hand, the programmers inspect the method with the next highest rank, return
to the step #2 and reformulate the query.
c. If the method is related to the concepts implementation, yet
it is not where changes are to be made, then the programmers consider this method to be focus method continues
with the step #4.
4. Navigating program dependencies: The developers explore the
MDG by investigating methods in the neighborhood of the focus
method. The adjacent methods are ranked based on their similarity to the user query. For each neighbor, the developer has
the following options:
a. If the method will be the one that changes, then concept
location ends and the developer continues with the next
phase of the software change (not discussed here).
b. If the method is not the one that will change, then navigate
to an adjacent method or backtrack.
c. Return to step#2 and reformulate the query.
d. Stop the navigation; return to the IR search results from step
#3 and continue to investigate the list of retrieved methods.
DepIR keeps track of the entire search and navigation path to
guide the developer and to avoid repetitions. Once the programmer
examines a method during the concept location and decides that it
does not implement the target concept, DepIR marks the method
and noties the programmers if they attempt to visit it the second
time. This ensures that the same method is not inspected twice if
there are cycles in the MDG.
3.1. Example of locating a concept using DepIR
A practical example of DepIR is locating a concept Open a le by
dragging and dropping it into the editor in jEdit. Note that jEdit is
also used in the case study of Section 4.
The concept location begins by formulating the following DepIR
query: open le using drag and drop. The top ten methods returned by IR and their relevance scores are isDragInProgress
(0.75), isDragEnabled (0.71), startDragAndDrop (0.66), setDragInProgress (0.64), drop (0.59), setDragEnabled (0.58), dragEnter
(0.48), two overloaded openFile (0.47) methods in the jEdit class,
and the insideSelection (0.46) method in the SelectionManager
class. Upon inspecting the top ranked methods in the TexEdit class,
the programmer concluded that although these methods are rele-
vant to the drag-and-drop concept, they are irrelevant to the

open le concept.
After inspecting one of the openFile (0.46) methods in the jEdit
class, the programmer discovered this method does not implement
any functionality but rather calls one of the other ve overloaded
openFile methods in the same class. Since the name of these methods suggested that one of them could be used to open a le using
drag-and-drop, the programmer decided to follow program dependencies between them. After visiting two methods with IR relevance scores of 0.28 and 0.14, the programmer discovered the
openFile method (0.26) that had two potentially relevant callers:
the importFile (0.13) method in the TextAreaTransferHandler class
and the run (0.26) method in the DraggedURLLoader class.
Upon inspecting the higher-ranked run (0.26) method in the
DraggedURLLoader class, the programmer decided this method is
irrelevant and proceeded to inspect the importFile (0.13) method
in the TextAreaTransferHandler class. After the inspection, he concluded the method is the location of the concept.
3.2. Tool support
We implemented a tool support for DepIR in Java programs in
JRipples, an interactive Eclipse plug-in that supports IR operations
based on Lucene (http://lucene.apache.org), an open-source IR
engine [33]. JRipples also builds the MDG and supports its navigation, keeping track of parts of the code the developers inspected.
JRipples and its source code are freely available at http://
jripples.sourceforge.net.
4. Evaluation
We conducted an exploratory case study where we compared
and evaluated the proposed concept location approach DepIR
against the basic IR and DepS approaches. Our hypothesis is that
using DepIR, concept location can be performed faster than by using
IR only or DepS only.
4.1. Design of empirical study
Empirical studies of concept location are impacted by the fact
that concept location is just one part of a larger process, i.e. software change. In our empirical studies, we chose to use reenactment
a case study technique, where information about past software
changes is extracted from software repositories and the process
of these changes is reenacted using the new technique under
study, for example DepIR.
During reenactment, steps of concept location can be done manually by a human or automatically by an algorithm that simulates
actions of a programmer. Automated reenactment was applied in

[10], where several concept location tools were used to produce
ranked lists of methods relevant to past change requests in opensource software; best ranks of the methods known to be changed
for these requests were used as a measure of effectiveness of the
tools. Automatic reenactment allows to collect and analyze many
data points, strengthening the validity of our observations. In turn,
it introduces some specic threats to validity, which we discuss
later.
An example of the other variant of reenactment, manual reenactment, is in [9]. In, it, programmers recorded knowledge that
they used while manually locating concepts for a set of change requests already implemented in Mozilla. Reenactment is not unique
to concept location. Examples of reenactment can be also found in
the research on impact analysis [32,3436] and software development [37]. The word reenactment originates from [37]; other papers use a different terminology for the same idea.
Previous work on concept location used the number of methods
a developer needs to inspect before locating a concept as a measure
of effort [10,38]. We adopt this measure in our study because it is a
proven measure and can be used in the combination with automatic reenactment.
In our study, we used automated reenactment to compare DepIR to IR and the DepS, evaluating the potential performance of
each technique in 50 cases (i.e., 10 change requests in ve systems), based on historical information extracted from software
repositories. For each case, we randomly selected a change request
(e.g., bug descriptions or feature requests) such that we can identify (based on the patch, release notes, or commit comments) its
corresponding change set the exact methods in source code that
were changed to satisfy this request.
We used the change requests as the input to a specially designed algorithm that simulates users actions at each step of the
concept location process, while using a particular concept location
technique. After this, the change sets were used to verify and assess the results produced by the algorithm for each of the cases
and each of the studied techniques. It is important to note that
the reenactment algorithm simulated perfect users, meaning
that at each step in the process it makes the choice that would nd
the target method fastest. Therefore, the results of the simulation
indicate a best case scenario estimate of the effort needed to locate
the target method for each change request with each technique.
4.2. Objects of the study

Our goal for selecting objects of the study was to identify systems of considerable size and complexity such that concept location would not be trivial. We selected ve open source software
systems from the popular repository Sourceforge.com. All of the
selected systems use bug trackers, and their commit comments include the IDs of the bugs that were xed, which allowed us to
determine the change sets for each of those bugs.jEdit (http://
sourceforge.net/projects/jEdit/) is a text and source code editor.
DrJava (http://sourceforge.net/projects/drjava/) is a lightweight
Java IDE. JabRef (http://sourceforge.net/projects/jabref/) is a bibliographical manager. Adempiere (http://sourceforge.net/projects/
Table 1
Properties of the systems used in the study.
Software
Classes
Methods
Revisions total
Revisions with bug ID
Adempiere
DrJava
JabRef
jEdit
Megamek
4116
1147
632
536
1268
74,930
20,193
5768
7296
12,793
5581
3197
2300
3796
5910
954
440
35
500
1664
655
adempiere/) is an enterprise resource planning application. Megamek (http://sourceforge.net/projects/megamek/) is an interactive

game. Table 1 summarizes the main properties of these systems.
For each of the ve systems, we selected 10 change requests,
see the summary in Table 3.
4.3. Collected data
In this study, for each bug in the ve systems, we considered the
available bug description as the starting point. We extracted the
change set for each bug x from the repository and extracted a
snapshot of the system prior to the bug x. The goal of concept
location in each of these cases is to nd one of the methods in
the change set. Since repositories do not store information regarding which of the modied methods was the one identied during
concept location (i.e., the rst to change), we repeated our reenactment iteratively, treating each of the modied methods as the target. We assessed the efciency of the studied techniques for a
particular revision based on the smallest number of methods required by these techniques to locate each of the targets. We computed this number based on the reenactment summarized in Fig. 2;
the following explains the reenactment in details.
4.3.1. DepS
DepS technique calls for a starting point. In the absence of any
change specic information, the obvious start is the main() method. We adopted this strategy and, in order to simulate user choice
during DepS, we computed the shortest path in MDG from main()
to the target method. The length (n) of the path is the effort measure we report for this technique. The shortest path was determined using the Dijkstras algorithm [39]. In other words, an
ideal user would start the search on the MDG at main() and he
would have to inspect n methods before nding the target method.
In order to assess how different is the ideal case from the average
case, we recorded statistics about the number of neighbors of the
methods on the shortest path (see Table 3). The idea is that the
more neighbor these methods have, the harder concept location
is, as it may require inspecting more methods to decide which path
to take.
4.3.2. IR
IR technique requires the formulation of a textual query. In the
absence of any additional information, one can use the change request description as the initial query. We used bug description as
the initial query. Since formulating subsequent queries requires
user intervention, we did not reformulate the query during the
simulation. We computed the rank of the target methods in the ordered list of results retrieved by the IR engine. This is equal to the
number of methods that would need to be investigated by traversing the ranked list of methods to nd the target. This is the effort
measure we collect in this case.
4.3.3. DepIR
In the case of DepIR, we use the same initial query and the
ranked list of retrieved methods as for IR. We identied the 10
methods with the highest ranks, treating them as the possible entry points for the DepS. For each of these entry points, we identied
the shortest path [39] in the MDG to the target method. We determined the effort for each entry point by adding its rank to the
number of the edges on its shortest path to the target method.
The minimum of the effort measure for the 10 starting points represents the minimum effort needed to locate the concept using DepIR. Once again, we did not reformulate the query. As in the case of
DepS, we recorded statistics about the number of neighbors of the
methods on the shortest path (see Table 3). The assumption and
bias here is the same as in the case of DepS.
656
(a) Dependency search
11
12
11
12
10
10
(b) Information retrieval
(c) DepIR (dependency search

started in the node with rank 2)
Fig. 2. Reenactment of concept location techniques. The gray shading indicates methods inspected to locate the concept, black shading represents a method containing the
concept, numbers are IR ranks of the methods, bold edges are dependencies followed while exploring the dependency graph, and M marks the main() method. (a)
Dependency search is reenacted as the shortest path in the dependency graph between the main() method and the method containing the concept. (b) Reenactment of IR
explores methods in the order suggested by their ranks until the concept is located. (c) Reenactment of DepIR inspects methods in the order suggested by their ranks until a
method is selected as the starting point of dependency search; after this, the shortest path is computed from this starting method to the concept.
Table 2
Aggregate results for the ve systems.
Statistics
Inspected methods
Average
Median
StDev
DepS
IR
DepIR
4.3
4
0.97
190
10
647
2.98
3
1.23
4.4. Results and discussion

Table 3 summarizes the results for each technique and change,
while Table 2 presents aggregate results for all the 50 changes.
Fig. 3 shows the rst four columns of Table 3 in graph form.
We analyzed the average effort for each of the compared approaches. Since we cannot assume that the population of our
measurements follows the normal distribution, we used the nonparametric Wilcoxon matched pairs test [40]. We found a support
for the following observations.
On average, DepIR required a signicantly (W = 0, p < 0.001)
smaller effort than the pure IR (3 vs 10 median, 2.98 vs. 190 average). Since DepIR and IR used the same set of queries, whenever
the target method retrieved the highest rank (in six cases out of
50), both techniques performed the same. On the other hand, DepIR performed well even in the cases, where the performance of
IR was less than perfect, indicating DepIR is a more robust

heuristic.
Particularly, even though in 26 cases IR required inspecting no
more than 10 methods, there were eight cases where IR necessitated inspecting over 100 methods (Table 3); on the other hand,
DepIR never required inspecting more than ve methods. The closer examination of these outlier cases showed that their change requests were formulated either in terms that are used extensively
throughout the code, or not used in the code at all. For example,
the change request for revision 7074 in jEdit contains such common for this text editor terms as buffer, character, menu, etc.;
as the result, when using this change request, IR ranks high many
methods that contain these terms, but are not relevant to the target concept.
These observations indicate that DepIR is especially useful in
the cases when it is difcult to formulate a good initial query,
and also when the vocabulary of the system is not suitable for IR
(e.g., in the presence of synonyms, shared terms, etc.): as long as
the top IR results are not far in the MDG from the target method,
developers will still locate the target method quickly, relying on
the navigation ability of DepIR. In other words, when using DepIR,
developers may spend less effort and attention formulating perfect queries.
The improvement over DepS is less impressive (3 vs 4 median,
2.98 vs. 4.3 average), however is statistically signicant (W = 112,
p < 0.001). DepS outperforms DepIR by at most two methods in
10000
Adempiere
1000
100
Megamek
DrJava
JabRef
jEdit
Dep Search
IR
DepIR
10
7387
7630
8052
8136
8189
8228
8275
8739
9289
9400
5085
5573
5583
5650
5656
5700
5912
5934
6459
6473
1294
1578
4305
4637
4744
4810
4927
4928
4966
838
1002
1336
1698
1728
1737
1745
1812
1831
1909
2383
11176
12522
13899
14898
15732
5312
5571
6686
7074
7629
Fig. 3. Plot with effort data for the 50 changes and the three techniques.
657

Table 3
Case study results.
Revision
Inspected methods
Neighbors on DepS Search path
Neighbors on DepIR search path
DepS
IR
DepIR
Min
Max
Mean
Median
StDev
Min
Adempiere
7387
7630
8052
8136
8189
8228
8275
8739
9289
9400
5
6
5
5
5
5
5
5
5
5
13
21
3
2
7
1
6
7
4
1
4
5
3
2
3
1
3
5
4
1
2
2
2
2
2
2
2
2
2
2
107
2793
140
1156
54
1128
610
414
1049
97
35
572
58
306
22
298
169
112
279
33
17
8
46
33
17
32
33
17
33
17
34.8
571.8
57.8
305.8
21.8
297.8
168.8
111.8
278.8
32.8
1
1
Located
Located
28
Located
5
2
39
Located
DrJava
1294
1578
4305
j4637
4744
4810
4927
4928
4966
838
4
4
4
3
3
6
4
4
3
5
2
44
53
3
4
585
4
2
11
2108
2
5
5
3
3
2
3
2
5
5
10
3
7
8
7
3
8
7
8
3
45
111
188
228
309
116
248
237
14
72
24
42
68
118
158
29
89
86
11
22
17
13
9
118
158
8
11
16
11
7
24.0
42.0
68.0
118.0
158.0
28.6
89.0
86.0
11.0
21.8
Located with IR part only

3
111
33
2
14
9
5
6
5
6
6
6
2
4
3
3
3
3
7
123
39
3
72
21
JabRef
1002
1336
1698
1728
1737
1745
1812
1831
1909
2383
3
5
5
4
4
4
5
4
4
3
1
3
99
11
3
14
27
6
53
104
1
3
4
3
2
3
4
4
3
2
17
1
1
1
1
1
1
1
1
1
47
82
405
91
411
413
55
424
58
15
32
29
106
35
142
142
19
146
24
8
32
17
10
14
14
14
10
15
15
8
32.0
28.8
105.8
34.7
141.7
141.7
18.8
145.7
23.7
7.5

9
88
48
6
405
140
5
409
207
22
22
22
10
92
51
5
95
35
1
8
4
9
9
9
15
15
15
jEdit
11,176
12,522
13,899
14,898
15,732
5312
5571
6686
7074
7629
3
3
5
6
3
3
3
3
4
3
1
1014
10
17
253
416
1
14
3997
485
1
5
2
2
5
3
1
4
3
4
49
7
14
4
53
42
9
42
11
46
274
49
102
107
107
239
44
237
248
252
161
28
51
38
80
140
26
139
100
149
161
28
45
15
80
140
26
139
42
149
161.0
28.0
51.0
37.8
80.0
8.4
26.0
139.0
100.0
149.0

1
287
77
4
4
4
23
23
23
2
107
34
3
4
3
3
237
82
6
11
8
10
77
36
Megamek
5085
5573
5583
5650
5656
5700
5912
5934
6459
6473
5
3
5
3
5
5
5
5
6
5
3
2
4
2
28
31
10
11
1
20
3
2
3
2
2
3
2
3
1
3
13
13
7
13
13
13
13
13
2
13
45
48
100
48
168
101
314
314
75
359
28
30
33
30
68
51
98
98
31
109
27
30
12
30
46
46
33
33
14
33
28.0
30.0
33.0
30.0
68.0
51.0
98.0
98.0
30.8
109.0
Located
Located
38
Located
61
15
13
7
Located
51
seven out of 50 cases, with three cases in DrJava and four in jEdit.
A closer analysis of the case study logs showed that the main()
method of jEdit resides in the same class as a collection of methods that provide a centralized access to the common libraries and
preferences of jEdit. Because of this design, the client methods
that call the library and preference methods are very close in
MDG to main(). The outlier cases required modication of such
client methods, resulting in the observed performance of DepS.
We also found a similar design in DrJava, where the main()
method is used to load common libraries and initialize program
preferences; this design peculiarity also resulted in the outlier
cases.
Max
6650
6664
with IR
with IR
2836
with IR
331
27
1049
with IR
with IR
with IR
268
with IR
61
70
13
77
with IR
359
Mean
part
part
part
part
2218
2378
only
only
1432
only
168
10
544
only
part only
part only
153
part only
61
42
13
42
part only
205
Median
StDev
4
1424
2217.7
2377.8
1432
1432.0
168
7
544
168.0
9.5
544.0
9.5
11
32.8
8.8
5.5
6
3
3
13.5
5
5.0
6.0
3.0
3.0
39.0
20.8
48.5
10
207
22
51
7
5
9
15
48.0
140.0
207.0
22.0
51.0
35.0
3.7
9.0
15.0
11.5
4
23
14.5
3.5
76.8
4.0
23.0
33.8
18.0
8
8.5
21
82.0
8.0
36.0
153
153.0
61
42.5
13
42
61.0
42.0
13.0
42.0
205
205.0
Based on our observations, we argue that DepIR has a bigger

advantage over DepS in well-structured systems that have a more
complex (i.e., less shallow) dependency graph.
Furthermore, in 25 out of 50 cases, methods on the search paths
of DepIR had on average fewer neighbors than on the search paths
of DepS; additionally, in 13 cases, DepIR efciently located relevant
methods using IR only and did not require inspecting dependency
graph. Since, while exploring MDG, programmers might have to inspect some irrelevant neighbors before nding a neighbor that
leads to the concept, the reduced number of the neighbors on
the search paths in DepIR reects additional benets in the effort
of CL offered by this technique.
658
When developers have good knowledge of the system, they

usually can write good queries and the textual search gets them
quickly to the desired part of the code. Based on the empirical work
we observed we speculate that DepIR pays off when the user query
is not too good and instead of returning the target methods on top
of the list, other methods are ranked higher. In these situations,
navigating the dependencies provides shortcuts through the search
results and it leads to a more efcient concept location.
4.5. Threats to validity
In this study, we evaluated the performance of the three concept location approaches based on automated reenactment of past
changes. Similarly to other empirical techniques, reenactment has
its own advantages, disadvantages, and biases that affected results
of our study and may limit interpretation of its results.
Unlike the studies that examine concept location by observing
programmers, automated reenactment does not require human
participants and, in this way, it lowers threats to internal validity
associated with personal skills and preferences of subject programmers. These include familiarity with certain concept location techniques, programming experience, degree of understanding of the
change requests and object software, and so forth [9,41,42]. Further, studies using automated reenactment are not biased by program learning, where knowledge learned by programmers while
working with one change request affect their work with other
change requests [9].
On the other hand, reenactment involves a threats to the construct validity because it always assumes the best choices made
by the simulated users, while the real programmers often make
less than perfect choices.
Further, in the case study, most of the change requests modied
more than one method. SVN repositories do not provide information on which of the xed methods was located during the original
concept location, nor do they provide the order of xing; also, the
repositories do not record the classes that were inspected by the
programmers, but not modied. Therefore, we measured the effort
needed by the studied techniques by modeling the ideal concept
location process, where the simulation algorithm was required to
locate any of the changed methods by inspecting the minimal
number of methods.
Moreover, in reality, the programmers might be less accurate in
recognizing relevant neighbors while exploring dependency graph,
selecting the shortest path to the concept, choosing staring point of
DepS, and so forth; this may result in an effort of concept location
that is higher than the measured minimum. Additionally, the programmers may choose to compose different search queries or to
rene their queries during the search; as these queries can be more
or less accurate than the change request-based queries used in the
study, the effort of concept location using these queries may also
differ.
The JRipples tool used in the study extracts only certain types of
dependencies. Capturing additional dependencies (e.g., dependencies dened in text les, database dependencies, reection, etc.)
might further improve the effectiveness of DepS and DepIR. The
IR and DepIR techniques relied on the text of change requests,
assuming this text provides a correct and accurate description of
the required updates. Inaccuracies in the text may decrease the relevance of the ranking and, consequently, increase the effort of IR
and DepIR. Nonetheless, as it was observed in the study, the quality
of the queries had only a minor, if any, effect on the performance of
DepIR.
Furthermore, for the reenactment of DepIR, we allowed only the
10 highest-ranked methods to be selected as starting points of
DepS. However, since any starting method with a rank below 10
would require inspecting more than 10 methods, and since in all
studied cases DepIR required inspecting no more than ve methods, we conclude that this assumption had no effect on the results
of the study.
We assessed the effort required to locate the concepts by
recording the number of inspected methods, which may pose a
threat to external validity. Depending on the goals of a particular
maintenance process, other measures of effort, such as the time
that programmers needed to locate the concepts, might provide a
more suitable insight into the difference between the compared
techniques. Furthermore, different change requests and software
systems might require a different effort to locate the concepts. A
specic software system might have classes with more or less
descriptive terms, which can impact the effectiveness of both IR
and DepIR.
In the studies, all of the selected change requests described bugs
program functionalities that can be viewed as unwanted features
[10,38]. Other types of changes, such as adding a new functionality
or modifying an existing functionality, may require a different effort. Furthermore, the selected change request described explicit
features. We expect that DepS and DepIR will have an advantage
over IR when locating implicit features, since such features are implied but not explicitly expressed in the code, and, therefore, can be
hard or impossible to locate by querying source code for specic
terms [43].
The systems we studied are open source and may not be representative of software developed by other processes. However, it
should be noted that, for example, Megamek was developed by
29 programmers and 75% of its code was contributed by a core
of ve developers, where each of these core developers contributed
over 5% of the total code. Adempiere was developed by 53 developers and the core of 6 developers contributed 80% of the total code
base, where each of the core developers contributed over 5% of the
code. In this way, Adempiere and Megamek display the characteristics similar to industrial software, where the bulk of the code is
developed by a relatively small and stable team of developers; this
is different from a more typical open-source software process,
where the code is produced by large and spontaneous groups of
independent contributors [44,45].
5. Conclusions and future work
In this paper, we proposed and discussed DepIR, a concept location technique that combines program dependency search and
Information Retrieval based search approaches. Our case study
indicates that concept location using DepIR requires a signicantly
smaller effort than the concept location using DepS or IR alone.
Furthermore, the comparative analysis of results of the concept
location using IR and DepIR indicates that DepIR allows nding
concepts even with queries that do not rank the relevant methods
highly. Since formulating a good query is not always trivial, this
tolerance of lower-quality queries signicantly broadens the
usability of DepIR.
In the future, we plan to investigate which specic software
characteristics make DepIR a better choice for concept location
technique compared to other concept location techniques. We
would also like to investigate whether the concept location process
can be further improved by combining DepIR with the recent techniques that, based on IR, generate a sub-graph of dependency
graph as discussed in Section 4, i.e., [1721].
Acknowledgments
We would like to thank Andrian Marcus and Denys Poshyvanyk
for their extensive input on IR techniques and help with the early
versions of this paper. This work was partially supported by the
Grant CCF-0820133 from the National Science Foundation (NSF).

Any opinions, ndings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reect the views of the NSF.
References
[1] N. Wilde, J.A. Gomez, T. Gust, D. Strasburg, Locating user functionality in old
code, in: IEEE International Conference on Software Maintenance (ICSM92),
Orlando, FL, 1992, pp. 200205.
[2] T.J. Biggerstaff, B.G. Mitbander, D.E. Webster, The concept assignment problem
in program understanding, in: 15th IEEE/ACM International Conference on
Software Engineering (ICSE94), 1994, pp. 482498.
[3] V. Rajlich, Software Engineering: The Current Practice, Chapman and Hall/CRC,
2011.
[4] V. Rajlich, Changing the paradigm of software engineering, in:
Communications of ACM, 2006, pp. 6770.
[5] S. Bohner, R. Arnold, Software Change Impact Analysis, IEEE Computer Society,
Los Alamitos, CA, 1996.
[6] B. Dit, M. Revelle, M. Gethers, D. Poshyvanyk, Feature location in source code: a
taxonomy and survey, Journal of Software Maintenance and Evolution:
Research and Practice , 2011, http://dx.doi.org/10.1002/smr.567.
[7] A. Marcus, V. Rajlich, J. Buchta, M. Petrenko, A. Sergeyev, Static techniques for
concept location in object-oriented code, in: 13th IEEE International Workshop
on Program Comprehension (IWPC05), 2005, pp. 3342.
[8] A.V. Aho, Pattern matching in strings, in: Formal Language Theory:
Perspectives and Open Problems, Academic Press, New York, 1980, pp. 325
347.
[9] M. Petrenko, V. Rajlich, R. Vanciu, Partial domain comprehension in software
evolution and maintenance, in: IEEE International Conference on Software
Comprehension, 2008, pp. 1322.
[10] D. Poshyvanyk, Y.G. Guhneuc, A. Marcus, G. Antoniol, V. Rajlich, Feature
location using probabilistic ranking of methods based on execution scenarios
and information retrieval, IEEE Transactions on Software Engineering 33
(2007) 420432.
[11] A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, An information retrieval approach
to concept location in source code, in: 11th IEEE Working Conference on
Reverse Engineering (WCRE04), Delft, The Netherlands, 2004, pp. 214223.
[12] K. Chen, V. Rajlich, Case study of feature location using dependency graph, in:
IEEE International Workshop on Program Comprehension, IEEE Computer
Society, 2000, pp. 241249.
[13] W. Zhao, L. Zhang, Y. Liu, J. Sun, F. Yang, SNIAFL: towards a static noninteractive approach to feature location, ACM Transactions on Software
Engineering and Methodologies (TOSEM) 15 (2006) 195226.
[14] P. Shao, R.K. Smith, Feature location by IR modules and call graph, in: Annual
Southeast Regional Conference, 2009.
[15] S. Hayashi, K. Sekine, M. Saeki, iFL: an interactive environment for
understanding feature implementations in: IEEE International Conference on
Software Maintenance, 2010, pp. 15.
[16] E. Hill, L. Pollock, K. Vijay-Shanker, Investigating how to effectively combine
static concern location techniques, in: 3rd International Workshop on SearchDriven Development: Users, Infrastructure, Tools, and, Evaluation, 2011.
[17] M. Eaddy, A.V. Aho, G. Antoniol, Y.-G. Gueheneuc, CERBERUS: tracing
requirements to source code using information retrieval, dynamic analysis,
and program analysis, in: IEEE International Conference on Program
Comprehension, 2008, pp. 5362.
[18] M. Revelle, D. Poshyvanyk, An exploratory study on assessing feature location
techniques, in: IEEE International Conference on Program Comprehension,
2010, pp. 218222.
[19] S. Ratanotayanon, H.J. Choi, S.E. Sim, Using transitive changesets to support
feature location, in: IEEE/ACM International Conference On Automated
Software Engineering, 2010, pp. 341344.
[20] S. Hayashi, T. Yoshikawa, M. Saeki, Sentence-to-code traceability recovery with
domain ontologies, in: Asia Pacic, Software Engineering Conference, 2010, pp.
385394.
[21] D. Shepherd, Z. Fry, E. Gibson, L. Pollock, K. Vijay-Shanker, Using natural
language program analysis to locate and understand action-oriented concerns,
in: International Conference on Aspect Oriented Software, Development
(AOSD07), 2007, pp. 212224.
659
[22] N.E. Gold, M. Harman, D. Binkley, R.M. Hierons, Unifying program slicing and
concept assignment for higher-level executable source code extraction,
Journal of Software Practice and Experience 35 (2005) 9771006.
[23] R. Al-Ekram, K. Kontogiannis, Source code modularization using lattice of
concept slices, in: the Eighth Euromicro Working Conference on Software
Maintenance and Reengineering, 2004, pp. 195203.
[24] N.E. Gold, M. Harman, Z. Li, K. Mahdavi, Allowing overlapping boundaries in
source code using a search based approach to concept binding, in: IEEE
International Conference on Software, Maintenance, 2006, pp. 310319.
[25] E. Hill, L. Pollock, K. Vijay-Shanker, Exploring the neighborhood with dora to
expedite software maintenance, in: IEEE/ACM International Conference on
Automated Software Engineering, 2007, pp. 1423.
[26] A.J. Ko, B.A. Myers, M.J. Coblenz, H.H. Aung, An exploratory study of how
developers seek, relate, and collect relevant information during software
maintenance tasks, IEEE Transactions on Software Engineering (TSE) 32 (2006)
971987.
[27] J. Sillito, G.C. Murphy, K. De Volder, Asking and answering questions during a
programming change task, IEEE Transactions on Software Engineering 34
(2008) 118.
[28] J. Sillito, K. De Volder, B. Fisher, G.C. Murphy, Managing software change tasks:
an exploratory study, in: International Symposium on Empirical Software
Engineering (ISESE 2005), Noosa Heads, Australia, 2005, pp. 2332.
[29] G. Marchionini, Information Seeking in Electronic Environments, Cambridge
University Press, Cambridge, United Kindom, 1997.
[30] M.P. Robillard, W. Coelho, G.C. Murphy, How effective developers investigate
source code: an exploratory study, IEEE Transactions on Software Engineering
30 (2004) 889903.
[31] T. Khun, O. Thomann, Abstract Syntax Tree, in: Eclipse Corner Articles, 2007.
[32] M. Petrenko, V. Rajlich, Variable granularity for improving precision of impact
analysis, in: IEEE International Conference on Program Comprehension, 2009,
pp. 1019.
[33] D. Poshyvanyk, A. Marcus, Y. Dong, JIRiSS an eclipse plug-in for source code
exploration, in: 14th IEEE International Conference on Program
Comprehension (ICPC06), Athens, Greece, 2006, pp. 252255.
[34] A.E. Hassan, R.C. Holt, Replaying development history to assess the
effectiveness of change propagation tools, Empirical Software Engineering 11
(2006) 335367.
[35] G. Canfora, L. Cerulo, Impact analysis by mining software and change request
repositories, in: 11th IEEE International Symposium on Software Metrics,
2005, pp. 929.
[36] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, Identifying the starting impact
set of a maintenance request: a case study, in: European Conference on
Software Maintenance and Reengineering, IEEE Computer Society, 2000, pp.
227230.
[37] C. Jensen, W. Scacchi, Discovering, modeling, and reenacting open source
software development processes, new trends in software process modeling,
Series in Software Engineering and Knowledge Engineering 18 (2006) 120.
[38] D. Liu, A. Marcus, D. Poshyvanyk, V. Rajlich, Feature location via information
retrieval based ltering of a single scenario execution trace, in: 22nd IEEE/ACM
International Conference on Automated Software Engineering (ASE07),
Atlanta, Georgia, 2007, pp. 234243.
[39] E.W. Dijkstra, A note on two problems in connexion with graphs, Numerische
Mathematik 1 (1959) 269271.
[40] W.J. Conover, Practical Nonparametric Statistics, third ed., Wiley, New York,
NY, 1999.
[41] K.B. McKeithen, J.S. Reitman, H.H. Rueter, S.C. Hitle, Knowledge organisation
and skill differences in computer programmers, Cognitive Psychology 13
(1981) 307325.
[42] D.N. Perkins, F. Martin, Fragile knowledge and neglected strategies in novice
programmers, in: Empirical Studies of Programmers, 1986, pp. 213229.
[43] V. Rajlich, Intension are a key to program comprehension, in: IEEE
International Conference on Program Comprehension (ICPC 09), 2009, pp.
19.
[44] H.K. Wright, M. Kim, D.E. Perry, Validity concerns in software engineering
research, in: FSE/SDP Workshop on Future of Software Engineering Research,
2010, pp. 411414.
[45] A. Mockus, R.T. Fielding, J.D. Herbsleb, Two case studies of open source
software development: Apache and Mozilla, ACM Transactions on Software
Engineering and Methodology 11 (2002) 309346.

1 s2.0 S0950584912002078 Main

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

1 s2.0 S0950584912002078 Main

Diunggah oleh

Hak Cipta:

Format Tersedia

Information and Software Technology 55 (2013) 651659

Contents lists available at SciVerse ScienceDirect

Information and Software Technology

Concept location using program dependencies and information retrieval (DepIR)

Corresponding author. Tel.: +1-301-614-5830.

source code, which is a search process, and may be difcult and

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

2.2. Concept location by searching program dependencies

2.1. Concept location using information retrieval

1. Identifying a starting point: DepS starts with an entry point

2.3. Hybrid concept location techniques

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

Unlike DepIR, these approaches do not support inspecting

concept location is an instance of information seeking behavior

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

vant to the drag-and-drop concept, they are irrelevant to the

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

actions of a programmer. Automated reenactment was applied in

4.2. Objects of the study

Revisions with bug ID

adempiere/) is an enterprise resource planning application. Megamek (http://sourceforge.net/projects/megamek/) is an interactive

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

(a) Dependency search

(b) Information retrieval

(c) DepIR (dependency search

4.4. Results and discussion

IR was less than perfect, indicating DepIR is a more robust

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

Neighbors on DepS Search path

Neighbors on DepIR search path

Located with IR part only

Located with IR part only

Located with IR part only

Based on our observations, we argue that DepIR has a bigger

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

When developers have good knowledge of the system, they

M. Petrenko, V. Rajlich / Information and Software Technology 55 (2013) 651659

Grant CCF-0820133 from the National Science Foundation (NSF).

Anda mungkin juga menyukai