By:
Amit Mishra
i.amitmishra@gmail.com
Department of mathematics and Computer Science
Faculty of Applied and Natural Sciences
IBB University, Lapai
Abstract:
The demand by all business sectors to adapt their information systems to the Web has
created a tremendous need for methods, tools, and infrastructures to evolve and exploit
existing applications efficiently and cost-effectively. Reverse Engineering is focused on
the challenging task of understanding legacy program code without having suitable
documentation. Reverse Engineering means a Process to Design the principle of
applications which analysis its structure, function and their operations.
This paper tries to describe what is Reverse Engineering, is it necessary,
different types of R.E., its uses and applications.
Key Words:
Reverse engineering, data reverse engineering, RE is differ from other types of engineering,
Different uses of RE, Stages involved in RE Method of RE, Approaches of RE.
1. Introduction
According to E. Chikofsky and J. Cross [1] reverse engineering is defined as analyzing a
subject system to identify its current components and their dependencies, and to extract and
create system abstractions and design information. The Reverse Engineering is a Process to
understand the system without knowing the actual internal functions but knows only how the
system responds to various inputs. So we can create the product with equivalent to other product
which is similar in behavior without interfere the products patents. The advantage of reverse
engineering is that we can change the program structure which directly affects its consistent
view. Over the past ten years researchers have produced a no of capabilities to explore,
manipulate, analyze, summarize, hyperlink, synthesize, componentize, and visualize software
1 | Page
Its a misconception that the reverse engineering is the cracking technique and it is used for
stealing the code. Reverse engineering is not used only to check the way how different programs
work but it is also to check those which do not work.
But Reverse engineering (RE) is the process of discovering the technological principles
of a device, object or system through analysis of its structure, function and operation. It often
involves taking something apart and analyzing its workings in detail, usually to try to make a
new device or program that does the same thing without copying anything from the original as
well as Reverse engineering is the process of learning the design of an object by studying its
implementation [4].
So following are the Different uses of reverse engineering which includes.
2 | Page
a)
b)
c)
d)
e)
or not.
f) To calculate the limitation of its duplicate product.
g) To find out the loop wholes before being cracked.
h) To create documentation of the product.
i) To change the old product with the improved version.
3. The idea of Reverse engineering gives us some options which are
a) The structure of a program,
b) The logic of the program and
c) The functions of the program.
4. Advantages of Reverse Engineering
The main advantages of Reveres Engineering are.
a) To understand the functions of a program.
b) To learn the files the target program access.
c) To use the protocol with target softwares uses and communicates the other parts of the
target network.
As we know the reverse is change the program structure which directly affects its consistent
view. So technically we can call it as patching because it involves the new patches code to the
original code. It allows adding and changing the particular functions call. Also to add secret
features, remove or disable function, fix security without source code.
5.
3 | Page
source of information about the system. As a result, the process of reverse engineering
has focused on understanding the code.
b) Data Reverse Engineering:
Most software systems for business and industry are information systems, that is, they maintain
and process vast amounts of persistent business data. While the main focus of code reverse
engineering is on improving human understanding about how this information is processed, data
reverse engineering tackles the question of what information is stored and how this information
can be used in a different context. Data reverse engineering (DRE) is a relatively new approach
used to address a general category of data disintegration problems. DRE combines structured
data analysis techniques with rigorous data management practices. The approach is growing in
popularity as an integrative systems re-engineering method because of its ability to address
multiple problem types concurrently.
Research in data reverse engineering has been underrepresented in the software reverse
engineering arena for two main reasons. First, there is a traditional partition between the database
systems and software engineering communities. Second, code reverse engineering appears at
first sight to be more challenging and interesting than data reverse engineering for academic
researchers.
Recently, data reverse engineering concepts and techniques have gained increasing attention in
the reverse engineering arena. This has been driven by requirements for data oriented mass
software changes resulting from needs such as the Y2K problem.
6.
4 | Page
Techniques used to aid program understanding can be grouped into three categories: unaided
browsing, leveraging corporate knowledge and experience, and computer-aided techniques like
reverse engineering [5].
In the mid 1990s, tools for RE were highly publicized. Three such tools for translating COBOL
files are described in
I. AUGUST-II: A Tool for Step-by-Step Data Model Reverse Engineering by Davis [6]
II. DB-MAIN: A programmable CASE Tool for Database Applications Engineering by
Hainaut [7]
III. Deriving a Logical Data Model for a System Using the RECAST Method by Edwards
and Munro [8]
Two tools were also presented in this time for translating relational databases in
I. Reverse-DBMS for Windows Chen & Associates [9]
II. A Knowledge-based System for Performing Reverse Engineering of Relational Database by
7.
Chiang [10]
Application using Reverse engineering
There are possible methods of reverse engineering which creates the 3DModel of any existing
physical part for use in 3D, CAM, CAE, and other softwares. The term "reverse engineering" as
applied to software means different things to different people, prompting Chikofsky and Cross to
write a paper researching the various uses and defining taxonomy.
From their paper:
Reverse engineering is the process of analyzing a subject system to create representations of the
system at a higher level of abstraction. It can also be seen as "going backwards through the
development cycle". In this model, the output of the implementation phase (in source code form)
is reverse engineered back to the analysis phase, in an inversion of the traditional waterfall
model. Reverse engineering is a process of examination only: the software system under
5 | Page
6 | Page
wishing to change database vendors are typically forced to rewrite their applications using the
new vendor's 4GL. The anticipated cost of this redevelopment can deter an organisation from
changing vendors, hence denying it the benefits that would otherwise result, for example, the
exploitation of more sophisticated database technology. If tools existed that could reduce the
rewriting effort, the option of changing database vendors would become more economically
feasible.
The diversity of reverse engineering problems has provided and still provides many
interesting research topics. By now this technology has reached an advanced level, and it is being
directly exploited in various environments.
8.
As we know that the Reverse Engineering process takes more time and its very expensive. Its
generally consider like try to purchasing or licensing the information of the real manufacture in
the financial risk so in order for a product or component of the system engineer are using
following stages.
a) To check and identify the product or component of the system which will be reverse
engineered.
b) Observe and assess the mechanisms that make the system work
c) Dissect and study the inner workings of a system: Reverse Engineer will disassemble and
decompile the original product one which take more time with respect to the project.
Reverse engineer also attempt to construct the description of the system which collect all
the technical data and instruction of which shows that how the product is work.
d) Compare the actual system to the observations and suggest improvements
7 | Page
e)
To Observe and disassembling the information documentation which may check that
how the real product works: Reverse engineering engineer tries to check the data which
generated by disassemble and decompile which is an accurate reconstruction of the
original product. They will check accuracy and validity of the design through testing of
the system, creating prototypes and experiment with the result.
Nowadays the research on reverse engineering has reached such an advance stage that there are
number of method which can be used in reverse engineering softwares. Each and every method
has its own benefits, time requirement and has other resources. A distinctive approach uses in the
combination of the method when decompiling and investigate the softwares. The best method
combined all of the goals like first it will check the code errors than we may take the number of
inputs on the user given data.
10. Reverse engineering of software
8 | Page
In theoretical the term use reverse engineering in software means different things in different
people but in practice the main two types are comes out.
a) One is when the code is available for the software but high level of phase of the program,
may be documented or poorly documented but no longer valid are exposed.
b) And in the second there is no code available for the software and any effort towards
discovering one possible source code for the software are regarded as reverse
engineering.
A. Binary Software Techniques using reverse engineering.
There are number of software techniques can be used in reverse engineering. The main three
reverse engineering techniques are
1.
To check from first to last that how to exchange information. Common protocol are using
in reverse engineering like bus analyzer and packet sniffer to listen the computer bus and
computer network connection. It is good for reverse engineering. Some times reverse
engineering on embedded system are helped by tools which introduced by manufacture
2.
3.
thing its take more time especially for some times it does not use machine code. [4]
The meaning Decompilation using decompiler, is that techniques that will try with
number of result and to reconstruct the source code in some high level language which is
shape. Often a patent is no more than a warning sign to a competitor to discourage competition.
If there is merit in an idea, a competitor must do one of the following:
a) Must negotiate a license to use the idea
b) Must claim that the idea is not novel and is an obvious step for anyone experienced in the
particular field
c) Make a subtle change and claim that the changed product is not protected by the patent
Consider the following ethical uses involved in reverse engineering:
a) Do not reverse-engineer parts if the procurement contract of the component prohibits
reverse engineering.
b) Remember to perform reverse engineering using only data that is part of the public
domain.
c) While performing reverse engineering, be sure that:
I. Do not have access to proprietary information
II. Have not been recently employed by the OEM, or had access to proprietary
information
III.Do not visit or tour the OEM's place of business
10 | P a g e
11 | P a g e
decompile all programs, and data and code are difficult to separate, because both are
12 | P a g e
represented similarly in most current computer systems. The meaningful names that
programmers give variables and functions (to make them more easily identifiable) are
not usually stored in an executable file, so they are not usually recovered in
decompiling. [20]
11.3 Phases of Decompilation
a) Loader
The first decompilation phase is the loader, which parses the input machine code
or intermediate language program's binary file format. The loader should be able to
discover basic facts about the input program, such as the architecture (Pentium,
PowerPC, etc), and the entry point. In many cases, it should be able to find the equivalent
of the main function of a C program, which is the start of the user written code. This
excludes the runtime initialization code, which should not be decompiled if possible. [21]
b) Disassembly
The next logical phase is the disassembly of machine code instructions into a
machine independent intermediate representation (IR). For example, the Pentium
machine instruction
mov
eax := m[ebx+4];
c) Idioms
Idiomatic machine code sequences are sequences of code whose combined
semantics is not immediately apparent from the instructions' individual semantics. Either
as part of the disassembly phase, or as part of later analyses, these idiomatic sequences
need to be translated into known equivalent IR. For example, the x86 assembly code:
cdq eax;
xor eax, edx
13 | P a g e
code) or on pointers. An add instruction results in three constraints, since the operands
may be both integer, or one integer and one pointer (with integer and pointer results
respectively; the third constraint comes from the ordering of the two operands when the
types are different).
f) Structuring
The penultimate decompilation phase involves structuring of the IR into higher
level constructs such as while loops and if/then/else conditional statements. For example,
the machine code
xor eax, eax
l0002:
or ebx, ebx
jge l0003
add eax,[ebx]
mov ebx,[ebx+0x4]
jmp l0002
l0003:
mov [0x10040000],eax
could be translated into:
eax = 0;
while (ebx < 0) {
eax += ebx->v0000;
ebx = ebx->v0004;
}
v10040000 = eax;
Unstructured code is more difficult to translate into structured code than already structured code.
Solutions include replicating some code, or adding Boolean variables.
g) Code generation
The final phase is the generation of the high level code in the back end of the
decompiler. Just as a compiler may have several back ends for generating machine code
15 | P a g e
for different architectures, a decompiler may have several back ends for generating high
level code in different high level languages.
Just before code generation, it may be desirable to allow an interactive editing of the IR, perhaps
using some form of graphical user interface. This would allow the user to enter comments, and
non-generic variable and function names. However, these are almost as easily entered in a post
decompilation edit. The user may want to change structural aspects, such as converting a while
loop to a for loop. These are less readily modified with a simple text editor, although source
code refactoring tools may assist with this process. The user may need to enter information that
failed to be identified during the type analysis phase, e.g. modifying a memory expression to an
array or structure expression. Finally, incorrect IR may need to be corrected, or changes made to
cause the output code to be more readable. [21]
12. Disassemble
The Disassemble means to convert the program in its executable which is ready to run into a
representation in some form of assembly language so that human can read easily. A program
which used to complete these called disassemble because it perform to inverse the task that an
assembler does. It is the type of reverse engineering. Another such program called decompile
there is a very small difference between both of them. A decompile converts the object code back
into the code of high level language because data and instructions are represented the same way
in most current computer system[4].
Some time its very difficult to compare between two disassembled codes.
16 | P a g e
Any interactive debugger will include some way of viewing the disassembly of the
program being debugged. Often, the same disassembly tool will be packaged as a
standalone disassembler distributed along with the debugger. For example, objdump, part
of GNU Binutils, is related to the interactive debugger gdb.
13. Debugger
It is a software programs which execute a program piecemeal and monitor various
circumstances, enabling the programmer to check whether the operation of the program is correct
orbnot.
Its gives single stepping for code, debug tracing, setting break points and to check variable and
memory state in the target program and executes as step by step manner. Its very useful to
determine the logical program flow. Debuggers have two main categories. The one is the user
mode and other is kernel mode. The user mode debugs run like as the normal program under the
operating system and the subject to the same rules as normal program. Its debugs only user level
process and in the other way kernel mode is part of operating system and debug the device
drivers and even itself operating system.
17 | P a g e
Most mainstream debugging engines, such as gdb and dbx provide console-based command line
interfaces. Debugger front-ends are popular extensions to debugger engines that provide IDE
integration, animation, and visualization features. The example of kernel mode debuggers is
softIce[4].
18 | P a g e
19 | P a g e
The Black box analysis is to analyze a running program by inquiring with various parts. This
type of testing requires only a running program and does not make use of source code analysis
[13]. In the Security point of view malicious code use in the program to break them and if the
program is break during the particular test than security problem can be exposed. Black box
testing may possible without access to binary code. A program can be tested over the network.
All that is required is a program which can accept inputs. If a tester can put input into the
program is being consumed and can examine the effect of the test than the black box testing is
possible [13]. This is main reason why the real attacker frequently way out the black box testing
in obtaining knowledge of the code and its behavior but the black box testing is more easier and
achieve and typically require much less capable than white box testing. During black box testing
and analyst attempt to evaluate as many meaningful internal code path which can directly
observed form outside system.
The black box testing cannot deeply search a real programs input space for problems because of
the theoretical constraints, but the black box test act as a actual attack on target software in a real
operational environment than a white box test usually can because black box testing work in a
live system. Its a effective way to understand and evaluate to DOS problem [13]. And it can
validate an application within its run time environment. It can used to check whether a potential
problem area is actual attack on real production system. Sometime the problem comes in the
white box analysis which may not be exploit in a real deploy system [13].
20 | P a g e
some of the internal workings of the software under test. In gray box testing, the tester applies a
limited number of test cases to the internal workings of the software under test. In the remaining
part of the gray box testing, one takes a black box approach in applying inputs to the software
under test and observing the outputs. Gray box testing is a powerful idea. The concept is simple;
if one knows something about how the product works on the inside, one can test it better, even
from the outside. Gray box testing is not to be confused with white box testing too; i.e. a testing
approach that attempts to cover the internals of the product in detail. Gray box testing is a test
strategy based partly on internals. The testing approach is known as gray box testing, when one
does have some knowledge, but not the full knowledge of the internals of the product one is
testing. A good example for gray box analyzing is running a program within a debugger and use
set of inputs to that program. In this way the program is checking while the debugger is used to
detect any failure or faulty behavior. A software calls Rational purify which can provide detailed
runtime analysis focused on memory use and consumption.
14.3.1 Graybox Software Testing
The Graybox methodology is a ten step process for testing computer software (refer to Table 1). The
methodology starts by identifying all the input and output requirements to a computer system. This
information is captured in the software requirements documentation [24].
21 | P a g e
Step
1
2
3
4
5
6
7
8
9
10
Description
Identify Inputs
Identify Outputs
Identify Major Paths
Identify Subfunction (SF)X
Develop Inputs for SF X
Develop Outputs for SF X
Execute Test Case for SF X
Verify Correct Result for SF X
Repeat Steps 4:8 for other SF
Repeat Steps 7&8 for Regression
The Graybox methodology utilizes automated software testing tools to facilitate the generation of test
unique software. Module drivers and stubs are created by the toolset to relieve the software test engineer
from having to manually generate this code. The toolset also verifies code coverage by instrumenting the
test code. Instrumentation tools help with the insertion of instrumentation code without incurring the
bugs that would occur from manual instrumentation. [22]
By operating in a debugger or target emulator, the Graybox toolset controlled the operation of the test
software. The Graybox methodology has moved out of a debugger into the real world and into real-time.
The methodology can be applied in real-time by modifying the basic premise that inputs can be sent to the
test software via normal system messages and outputs are then verified using the system output messages
Reverse Engineering is basically concern with discovering and leaning. The key to applying
computer-aided software engineering to the maintenance and enhancement of existing systems lies in
applying reverse-engineering approaches. However, there is considerable confusion over the terminology
used in both technical and marketplace discussions. The term "reverse engineering" includes any
activity to determine how a product works, or to learn the ideas and technology that were
originally used to develop the product. Reverse engineering is a systematic approach for
analyzing the design of existing devices or systems. It can be used either to study the design
process, or as an initial step in the redesign process.
Reverse engineering may be a slower and expensive way for information to penetrate through a
technical community publication as we know that the software is machine readable and the code
makes every program function that what it does. Code defines the software and decision will it
make The Reverse engineering as applied for software. Most current decompilation and
dissemble are working on performed program. Existing tools requires highly trained operators to
used and understand the out put. It is the process to looking for pattern in the code by identifying
certain code pattern. The attacker can locate possible software vulnerabilities. Based on this
discussion, the reverse engineering community needs to develop tools that provide more
adequate support for human reasoning in an incremental and evolutionary reverse engineering
process that can be customized to different application contexts.
23 | P a g e
Reference:
[1] E. Chikofsky and J. Cross. Reverse engineering and design recovery: A taxonomy. IEEE
Software, 7(1):13 17, January 1990.
[2] M. Armstrong and C. Trudeau. Evaluating architectural extractors. In Proceedings of the
5thWorking Conference on Reverse Engineering (WCRE-98), Honolulu, Hawaii, USA,
pages 3039, October 1998.
[3] Umar. Application (Re)Engineering: BuildingWeb- Based Applications and Dealing with
Legacies. Prentice Hall, 1997.
[4] http://en.wikipedia.org/wiki/Reverse_engineering , July 2008
[5] S. R. Tilley. The Canonical Activities of Reverse Engineering. Baltzer Science Publishers,
The Netherlands, February 2000.
[6] Davis, Kathi Hogshead, August-II: A Software Reverse Engineering Tool that Produces
a Flexible Conceptual Data Model", The Fourth Reengineering Forum, Victoria, B.C.,
Canada, September 1994.
[7] Hainaut, J-L., DB-MAIN: A Programmable CASE Tool for Database Applications
Engineering, Tutorial on Database Reverse Engineering, presented at both IFORSID95
and CAiSE95 Conferences, 1995.
[8] Edwards, Helen M. and Malcolm Munro, Deriving a Logical Data Model for a System
Using the RECAST Method, Proceedings of the Second working Conference on Reverse
Engineering, July 1995, pages 126 -135.
[9] Chen & Associates, Reverse-DBMS (Access 2.0) for Windows Reference Manual
Version 3.0, 1994.
[10] Chiang, Roger H.L., A Knowledge-Based System for Performing Reverse Engineering
[11]
[12]
[13]
[14]
[15]
[16]
[17]
24 | P a g e
[18] InWorkingConferenceonReverseEngineering,pages3343.IEEE,IEEEComp.Soc.
[19]
[20]
[21]
[22]
Press,Oct.1997.
http://www.debugmode.com/dcompile/july 2008
http://whatis.techtarget.com/definition/0,,sid9_gci804135,00.html Aug 2008
http://en.wikipedia.org/wiki/DecompilationAug 2008
Boris Beizer, Software Testing Techniques, 1983 Van Nostrand Reinhold Company Inc,
New York, pg 67
[23] Boris Beizer, Software System Testing and Quality Assurance, 1984 Van Nostrand
Reinhold Company Inc, New York, pgs (238, 258, 263, 309, 312)
25 | P a g e