Anda di halaman 1dari 5

To JNI or not to JNI?

Demetrius L. Davis

based, or memory-intensive operations are performed may be


Abstract compartmentalized and classified as ordinary software
The combination of computation-intensive algorithms and steep components. Once in component form, the developer can
performance requirements typically implies the use of a native determine which components perform CPU-intensive tasks and
programming language such as C, C++ or FORTRAN. select those components as candidates to be developed in a
Although Java provides a robust API and a wide developer native programming language. By doing this, the application
base, the slower execution times of interpreted Java bytecodes framework remains intact and the developer can outsource the
as compared to compiled C++ classes often eliminate Java as execution of components as necessary.
a candidate platform for high-performance applications. Some studies have shown that, under certain conditions,
The Java Native Interface (JNI) offers an attractive solution to compiled Java routines can perform near or at the level of some
this problem – computation-intensive operations may be coded lower level languages [2]. These benchmarks are tempered by
in a native programming language and then executed within a other research efforts which expose Java’s performance
Java application framework. In theory, JNI applications shortcomings in I/O operations and parallel computing [1, 6, 9].
provide the speed of native language implementations in Nonetheless, there is a prevailing consensus that Java will
critical areas and the platform independence of Java for the eventually outperform its predecessors [8].
remaining sections of the application. This research effort will assess the suitability of a JNI
This paper initially examines the advantages and implementation as a measure for porting an algorithm written
disadvantages of a JNI implementation and shows how the JNI and thoroughly tested in a native language into a Java
implementation performs against an equivalent all-Java framework. This practice is common in situations when an
implementation of a computationally intensive algorithm. The algorithm is customized or optimized for a specific domain or
results of this research effort will assist software developers in problem set. In these cases, developing an equivalent solution
identifying situations for which JNI is a viable or preferred in Java may not be possible or feasible. A JNI solution also
option. permits the algorithm to maintain connections to native
libraries and classes.
Background For more practical purposes, this research effort will assist Java
developers in identifying situations which are amenable to a
It is a generally accepted principle that a compiled C++ class
JNI implementation. Of course, this research does not replace
executes faster than an equivalent Java bytecode version. This
the need to evaluate all of the deciding factors typically
hypothesis is well-documented and rarely challenged in
associated with a software development project. The objective
situations where the Java implementation is interpreted by a
of this research effort is to assess the tradeoffs of JNI and Java
Java Virtual Machine (JVM). Consequently, native languages
implementations to determine 1) if and when JNI should be
such as C++ are regularly selected over Java as the preferred
employed, and 2) at which point is a pure Java implementation
programming platform for high-performance computing (HPC)
a better solution performance-wise.
and scientific computing projects. The general opinion is that
Java is a slow, interpreted language which is incapable of At the time of this writing, it is understood that Java is not able
executing computationally complex and intensive algorithms to compete head-to-head with native languages in terms of
within an acceptable period of time. This perception is now sheer performance. Nonetheless, a pure Java solution may be
being challenged with the current generation of Java native feasible if the computationally intensive routines are identified,
compilers and optimization options. Under certain scenarios, isolated and then developed with a native programming
Java compilers can produce native codes which perform at or language. The remaining elements of the application – the
near the level of equivalent C and C++ routines [2, 3]. sections with comparable performance benchmarks – will be
developed in Java. As any software component, the high-
Currently, only one option provides the Java developer
performance native routines can be added to the Java
community with the potential to combine the performance and
application by interfacing with the Java framework via JNI.
legacy of the lower-level programming languages (C, C++,
assembly) with the attractive features of Java (i.e., robust API, The information technology community will be best served by
garbage collection and platform independence): the Java Native providing clear and concise research results which are
Interface (JNI). JNI is a Java API which permits native codes applicable across multiple domains and implementations.
to be “wrapped” in a Java class and executed within a Java Increased productivity, return-on-investment (ROI), and
application framework. In essence, JNI serves as a bridge reusable software are important to both developers and
between Java and lower-level languages such as FORTRAN, management. The results attained in this research effort will
C, and C++. place time-saving and money-saving information at the
fingertips of the reader. Deciding between forgoing a Java
Like most software architectures, high performance
solution due to pivotal legacy software and wrapping the legacy
applications can be designed and developed in a modular
routines in Java will become a simpler and shorter task.
fashion. Critical sections where complex mathematical, search-

20th Computer Science Seminar


SC3-T2-1
Methodology and environment was used for this study, accuracy of the
experiment was not jeopardized.
The goal of this research effort is to determine if a
computationally intensive native routine, wrapped in Java, Program architecture. Three Java classes comprise the
would yield comparable performance to an equivalent all-Java general program architecture that was used for the benchmark
implementation. Complex math operations, sort and search runs. The first class, main, contains the top-level main()
algorithms are examples of computation-intensive activities routine and initializes the managing class Benchmark (see
which are typically written and compiled in a lower-level Code example 1). The main() class then obtains the
language such as C, C++, or FORTRAN. For this study, two minimum set of algorithm parameters from the user via the
robust algorithms were selected and implemented as self- array of string arguments passed in to the main() method.
contained components which will be called from a Java The main class passes the user-supplied input values to the
program. run() method of the Benchmark class.
Tests. A series of benchmark tests were scheduled to measure public class main {
the average execution times of the two test algorithms, each public static void main(String[] args) {
written in Java and C++. The start and end times were int algIndex = Integer.parseInt(args[0];
collected from each benchmark test; results related to memory Boolean runJNI =
management, software quality, I/O performance, level-of- Boolean.valueOf(args[1]);
effort, and program size (in SLOCs) were not recorded. To int iterCnt = Integer.parseInt(args[2]);
(new Benchmark()).run(algIndex,
increase accuracy among all test runs, the tests were executed 3
runJNI.booleanValue(), iterCnt);
times and the average execution time was recorded. }
Each of the selected algorithms operates on an array of integer }
values. To assess the impact of array sizes on the performance Code example 1: main.java
of the Java and JNI implementations, each program run was run
using three different array sizes: 250, 1000, and 10000. This Inside the Benchmark class (see Code example 2), the start
range of array sizes was selected to reveal any thresholds where and stop times for the program execution are stored; the
performance suddenly improved or diminished. Benchmark class is also responsible for initializing and
executing the appropriate algorithm. The algIndex input
In addition to varying the array sizes, the number of iterations
parameter designates which of the algorithm types will be used
was varied in order to exacerbate the impact of any overhead
for the current operation. The runJNI parameter states
associated with the startup or execution of either approach. It
is understood that the startup overhead associated with JNI is a whether or not the JNI version of the algorithm will be
primary deterrent for developers concerned with performance. employed for the current run. The iterCount parameter
Consequently, each program was executed with iteration counts specifies the number of times the algorithm is to be executed.
of 250, 1000, and 3500. The execTime parameter – the difference between the start
time and end time – will be calculated and outputted to
Languages. JNI supports several different lower-level standard output.
languages. For this study, C++ was selected as the native
programming language due to its large customer base and public class Benchmark
{
reputation for high-speed computations. The C++ classes were public void run(int algIndex, boolean
compiled with MS Visual C++ 6.0. runJNI, int iterCount)
{
Programmers. JNI programming naturally requires
// Initialize appropriate Algorithm object
proficiency in more than a single programming language. To Algorithm alg =
minimize differences in programmer skill levels, programming AlgorithmFactory.getAlgorithm(algIndex,
methodologies, and programming language features, runJNI);
implementations of the target algorithms were obtained from Calendar cal = Calendar.getInstance();
the public domain. The performance of the surrounding long t1 = cal.getTimeInMillis();
software architecture of each candidate solution was not under
for (int i=0; i<iterCount; i++) alg.run();
evaluation; only the time spent executing the algorithm was
recorded. long t2 = cal.getTimeInMillis(); // end
Algorithms. Two computation-intensive algorithms as used time
long execTime = t2 - t1; // execution time
for this study: an implementation of the Heapsort algorithm }
which encompasses sorting and large-array operations and a }
Discrete Fast Fourier Transform (FFT) algorithm which
exposes Java’s performance in numerical analysis and complex Code example 2: Benchmark.java
mathematical computations. [4] and [7] provided complete
Java and C++ versions of the selected algorithms. Measurements
Platform/OS. All benchmark tests were run on the same At the start of each execution, the current time was recorded as
machine – a 2.6 GHz Pentium 4 computer with 512 MB RAM. the start time. Time is stored as the number of milliseconds
The test machine was a Windows XP platform running the Java since January 1, 1970, 00:00:00 GMT.
SDK version 1.4.1. Previous benchmarking efforts have shown
Upon completion of each execution, the end time was captured
that JVM and compiler performance across multiple computing
and recorded. The time difference between the start and end
platforms can vary widely [2, 9]. Since a single test platform
times was recorded as the algorithm execution time. Each run

20th Computer Science Seminar


SC3-T2-2
was performed three times in succession and the average program within a specific iteration count grouping (see Table
execution time was recorded and analyzed. 1). In this case, the similarity of the execution times across
such a considerable range of array sizes likely reflects the
Evaluation efficiency and speed of the sorting algorithm rather than the
JVM.
The results obtained from the algorithm runs were impressive
and conclusive. The following subsections present the results Another interesting observation is the consistent curvature seen
for each set of benchmark runs. A final summary and analysis in each of the three line charts shown in Figures 1, 2, and 3.
concludes the section. The intent of the multiple array sizes was to agitate any
performance inhibitors such as excessive overhead during start-
Heapsort up and JNI method invocation. If there were significant
overhead issues attributed to JNI implementations, then a larger
The pure Java implementation of the Heapsort algorithm
number of method invocations would necessarily impose larger
revealed that the iteration count has a significantly greater
performance delays. The similarities among each of the line
impact on execution time than the array size. This assertion is
charts suggest that the impact of start-up and method
evident considering the near identical results for the pure Java
invocation overhead is sufficiently negligible.

Table 1 – Heapsort results (milliseconds)

Number of iterations
250 1000 3500
Array size Array size Array size
250 1000 10000 250 1000 10000 250 1000 10000
Heapsort (ms)
Java 40 40 50 140 140 140 400 400 400
JNI 30 30 110 30 60 390 60 150 1310
FFT (ms)
Java 741 4577 49290 2924 18286 197154 10165 63942 689381
JNI 742 4576 49661 2904 18196 197805 10234 63912 694048

Clearly, the most telling results from the Heapsort runs can be count increases, the JNI program further widens the performance
seen in Figures 4, 5, and 6. Unlike the iteration count-based line gap when the array elements do not exceed 1000.
charts, the bar charts view the data according to the array sizes.
The nearly identical results for the pure Java implementation At some point beyond 1000 elements, the JNI implementation
further the assertion that the iteration count plays a more takes an enormous performance hit. The cause of the sudden
significant role than the array size. While the pure Java degradation in execution speed is likely attributable to memory
implementation shows little to no variation, the JNI limitations of the test computer than to an inherent lack of
implementation yields a pronounced curve as the array size scalability of JNI. Nonetheless, the figures in Appendix A
increases. Up to 1000 integers, the JNI implementation expose an obvious shortcoming of JNI: heavyweight objects or
outperformed the pure Java version; actually, as the iteration large arrays of primitive data types have a noticeably adverse
effect on the application performance.

Heapsort (sample size=250) Heapsort (sample size=1000) Heapsort (sample size=10000)

450 450 1400

400 400 1200


350 350
1000
300 300
Time (ms)
Time (ms)

Time (ms)

250 250 800


Java Java Java
200 JNI 200 JNI JNI
600

150 150
400
100 100
200
50 50

- - 0
250 1000 3500 250 1000 3500 250 1000 3500
Iteration count Iteration count Iteration count

Figure 1 – Heapsort (250 elements) Figure 2 – Heapsort (1000 elements) Figure 3 – Heapsort (10000 elements)

20th Computer Science Seminar


SC3-T2-3
Heapsort (250 iterations) Heapsort (1000 iterations) Heapsort (3500 iterations)

120 400 1400

350 1200
100
300
1000
80
250
Time (ms)

Time (ms)

Time (ms)
800
60 Java 200 Java Java
600
JNI 150 JNI JNI
40
400
100
20
50 200

0 0 0
250 1000 10000 250 1000 10000 250 1000 10000
Array size Array size Array Size

Figure 4 – Heapsort (250 iterations) Figure 5 – Heapsort (1000 iterations) Figure 6 – Heapsort (3500 iterations)
This research effort has uncovered several interesting findings.
Discrete Fast Fourier Transform (FFT) First, JNI is definitely a viable option for handling algorithms
The results from the execution of the FFT algorithm offer that manipulate a small to moderate-sized set of primitive data
additional insight into the differences between a pure Java and a types. In such cases, the time required to “wrap” a native routine
C++ routine implemented via the JNI. The FFT routine involves within a JNI framework is minimal compared to transcribing the
more mathematical computations and takes a sizeable amount of native routine into a comparable Java version. The consistency
time to execute as compared to the Heapsort algorithm. This recorded by the pure Java implementation indicates that, as the
disparity served two purposes: the two Java programs executed a need for larger sized or a larger number of elements arises, a pure
more computationally intensive algorithm and consequently, each Java re-implementation of the native code routine is
implementation to handle extensive math computations was recommended.
sufficiently tested. Secondly, the overall performance of either Java implementation
The range of execution times between the Heapsort and FFT runs is greatly influenced by the efficiency and performance of the
were significant. Nonetheless, there is profound symmetry underlying algorithm. Performance can be markedly improved if
between the execution times of the two algorithms (see Table 2). an algorithm is optimized for performance and a small memory
Moreover, the execution times between the Java and JNI footprint. Such constructs as nested loops, recursion, and
implementations of the FFT algorithm are eerily similar. Unlike excessive object instantiations will result in a drastic decrease in
the highly efficient Heapsort algorithm, the necessary processing response times. This assertion is supported by the data collected
time for the FFT algorithm outweighed any overhead impact from the Heapsort algorithm runs. The efficiency of the Heapsort
generated by JNI function calls. As suspected, the recorded times algorithm, which executes in less than O(n log n) time, is a major
of a less-than-efficient algorithm reflect more of the algorithm contributor towards the relatively fast times recorded in Table 1.
performance and efficiency than the performance of the Lastly, the performance of the pure Java implementations as the
programming platform. iteration count increases is amazingly linear. In the executions of
In addition to the similarity in the execution times, the rate of the Discrete Fast Fourier Transform algorithm, for array
execution is also nearly identical throughout every run and containing 250 integers, the algorithm maintained a 2.9-
iteration count grouping. Regardless of array size or number of millisecond average over all three iteration count groups. The
iterations, every FFT calculation averaged 2.9 milliseconds. This Heapsort algorithm registered extremely minimal variations.
form of consistent performance indicates a high probability that This conclusion establishes a pure Java implementation as a
the implementation, whether pure Java or part Java, will not lower risk option when the variable parameters of the application
adversely affect the scalability of this type of algorithm. are not readily known.

Conclusion Future Work


High performance applications are traditionally developed in The boundaries of this research effort can be expanded in many
native programming languages such as C, C++ and FORTRAN. ways. Accuracy can be increased by evaluating a larger cross-
The movement towards Java has been hindered in the high section of algorithms, compilers, platforms and operating
performance computing domains due to the slower, interpretive systems. Moreover, a comparative analysis of I/O performance
nature of the JVM. The need for computing speed by select (i.e., file streams, network and database connectivity) as it relates
domains have resulted in the older, compiled languages to JNI implementations would refine this research effort by
maintaining a foothold on most software development and identifying more specific scenarios in which a hybrid JNI
maintenance efforts. Although developers of real-time, solution will outperform a 100% Java implementation.
embedded, simulation, graphics and scientific applications are Liang [4] states that the overhead associated with the transfer of
attracted to Java’s platform independence, robust API, supporting heavyweight objects such as large arrays and strings is “not
technologies, and extensive developer base, the fear of degraded acceptable” over an extensive sequence of function calls. Similar
performance in time-critical applications have slowed the to this study, a future research effort can zero in on the specific
adoption of Java as a primary programming language and scenario under which the use of heavyweight objects negates the
computing platform. benefit of using a JNI implementation. These types of research
Moreover, the cloud of uncertainty that hovers over the Java activities will aid the software developer considerably in 1)
Native Interface (JNI) deters software developers from clearly understanding and knowing the pros and cons of JNI, and
considering a JNI solution for their existing legacy applications. 2) providing a map for when to employ (and when not to employ)
a JNI-based solution.
20th Computer Science Seminar
SC3-T2-4
References
[1] Dickens, P. and R. Thakur, “An Evaluation of Java’s I/O
Capabilities for High-Performance Computing,”
Proceedings of the ACM 2000 Conference on Java Grande,
2000, pp. 26-35.
[2] Kazi I., H. Chen, B. Stanley, and D. Lilja, “Techniques for
Obtaining High Performance in Java Programs”, ACM
Computing Surveys, vol. 32, issue 3, September 2000, pp.
213-240.
[3] Kurzyniec, D. and V. Sunderam, “Efficient Cooperation
between Java and Native Codes – JNI Performance
Benchmark,” The 2001 International Conference on
Parallel and Distributed Processing Techniques and
Applications (PDPDA), 2001.
[4] Liang S., The Java Native Interface, Sun Microsystems Inc,
1999.
[5] Mathew J., P. Coddington, and K. Hawick, “Analysis and
Development of Java Grande Benchmarks”, Proceedings of
the ACM 1999 Conference on Java Grande, 1999, pp. 72-
80.
[6] Prechelt L., “Technical Opinion: Comparing Java vs.
C/C++ Efficiency Differences Interpersonal Differences,”
Communications of the ACM, vol. 42, issue 10, October
1999, pp. 109-112.
[7] Press W., S. Teukolsky, W. Vetterling, and B. Flannery,
Eds., Numerical Recipes in C++: The Art of Scientific
Computing, Cambridge University Press, 1992.
[8] Reinholtz K., “Java will be faster than C++”, ACM
SIGPLAN Notices, pp. 25-28.
[9] Sangappa S., K. Palaniappan, and R. Tollerton,
“Benchmarking Java against C/C++ for Interactive
Scientific Visualization,” Proceedings of the 2002 joint
ACM-ISCOPE Conference on Java Grande, 2002, p. 236.

20th Computer Science Seminar


SC3-T2-5