Anda di halaman 1dari 50

Devlopment of a Path for The Improvement of Computational

Efficiency

A THESIS

Presented to the Graduate Division

College of Arts & Sciences

New Mexico Highlands University

In Partial Fulfillment

Of the Requirement for the Degree

Master of Science in Media Arts and Computer Science

By

Jared M. Leyba

July 22, 2017


Devlopment of a Path for The Improvement of Computational

Efficiency

A Thesis Presented to the Graduate Division


College of Arts & Sciences
New Mexico Highlands University

In Partial Fulfillment
Of the Requirement for the Degree
Master of Science in Media Arts and Computer Science
By
Jared M. Leyba

Approved by Examining Committee:

Dr. Gil R. Gallegos Dr. Gil R. Gallegos


Department Chair Chair of Committee
Discipline of Computer Science

Dr. Kenneth Stokes Dr. Richard Medina


Dean, College of Arts and Sciences Discipline of Computer Science

Dr. Warren Lail Dr. Tatiana Timofeeva


Dean of Graduate Studies Outside Member
Department of Chemistry

Copyright 2017 Jared M. Leyba

All Rights Reserved


ABSTRACT

Table of Contents

Table of Contents ..................................................................................................................................... ii

List of Figures ............................................................................................................................................. i

List of Code Samples ............................................................................................................................. iii

ACKNOWLEDGMENTS ..........................................................................................................................iv

CHAPTER 1 INTRODUCTION............................................................................................................... 1

1.1 Introduction ..................................................................................................................................

1.2 Reaserch Motavation...................................................................

1.3 Reaserch Objectives...................................................................

1.4 Reaserch Background................................................................

CHAPTER 2 OBJECTIVE 1 .......................................................................................................................

2.1 Objective 1: Computaionl Holography Introducntio ........................................................

2.2 Objective 1: Computaionl Holography Background .........................................................

2.3 Objective 1: Computaionl Holography Methodes ...............................................................

2.4 Objective 1: Computaionl Holography Resultse..................................................................

CHAPTER 3 OBJECTIVE 2 .......................................................................................................................


3.1 Objective 2: Computaionl Learning Introduction ...............................................................

3.2 Objective 2: Computaionl Learning Background ................................................................

3.3 Objective 2: Computaionl Learning Methods .......................................................................

3.4 Objective 2: Computaionl Learning Resultse .......................................................................

CHAPTER 4 OBJECTIVE 3 .......................................................................................................................

4.1 Objective 3: Computational Plasmonics Introduction .....................................................

4.2 Objective 3: Computational Plasmonics Background ......................................................

4.3 Objective 3: Computational Plasmonics Methodes...........................................................

4.4 Objective 3: Computational Plasmonics Resultse .............................................................

CHAPTER 5 CONCLUSION & FUTURE RESEARCH........................................................................

5.1 Conclusion .........................................................................................................................................

5.2 Current and Future Research ....................................................................................................

References .....................................................................................................................................................

APPENDICES ................................................................................................................................................
List of Figures

Figure 1: ..........................................................................................................................................

Figure 2: ..........................................................................................................................................

Figure 3:...........................................................................................................................................

Figure 4:...........................................................................................................................................

Figure 5: ..........................................................................................................................................

Figure 6:...........................................................................................................................................

Figure 7: ..........................................................................................................................................

Figure 8: ..........................................................................................................................................

Figure 9: ..........................................................................................................................................

Figure 10: .......................................................................................................................................

Figure 11: .......................................................................................................................................

Figure 12: .......................................................................................................................................

Figure 13: .......................................................................................................................................

Figure 14: .......................................................................................................................................

Figure 15: .......................................................................................................................................

Figure 16: .......................................................................................................................................

Figure 17: .......................................................................................................................................

Figure 18: .......................................................................................................................................

1
List of Code Samples

Code Sample 1: .............................................................................................................................

Code Sample 2: .............................................................................................................................

Code Sample 3: .....................................................................

Code Sample 4: .....................................................................

Code Sample 5: .....................................................................

Code Sample 6: .....................................................................

Code Sample 7: .....................................................................

Code Sample 8: .....................................................................

Code Sample 9: .....................................................................

2
ACKNOWLEDGMENTS

I would like to thank everyone involved in helping this thesis come to fruition.

Thank you to my committee members, Dr. Gil Gallegos, Dr. Richard Medina, and Dr.

Tatiana Timofeeva, for the guidance and assistance. Your advice and support were

influential in the completion of my thesis and ensuring that my work was so much

better than what it would have been. Thank you Dr. Gallegos for your unwavering

support through my both my undergraduate years and my graduate studies. I

would also like to thank my girl frined , Kendra for her constant love, support and

understanding through this difficult process. To my mother Tina and my father

Steve for there suport and installment of a good work ethic as I was growing up, and

to my deceased grandfarther Juan Gallegos for have confedence in my abilatey

espeally in academics.

1.1 Introduction

Data is getting larger at a exponential rate and the computational ability of current

computational systems is reaching a platu. Which is causing the time it takes to

perform a given computation to increase. This is relevant because the application of


3
these computations can not just predict sales for businesses but predict health

issues, so the ability to increase computational efficiency is of great use.

1.2 Research Motavation

This research focuses on the improvement of computational methods specifically

the efficiency of computational methods. As the amount of data being created by

todays data systems grows, there is a growing need to be able to perform complex

computation on this data to look for correlations, both positive and negative to

better serve the publics wellbeing. These computations require a lot of time and

computer power to get usable results. Is it possible to improve computational

efficiency without having to add expensive upgrades to user systems?

1.3 Research Objectives

The predominant Goal of this research was to find and develop a path to which

computation can be made more feasible to the user by increasing efficiency of

currently available resources. The resource that all these objectives have in

common, is the use of an embedded system for computation.

4
Objective 1: Take algorithms and test if better coding practices and languages can

lead to a more effent computation time at the algorithm level.

To test this approach to computational improvement the act of computational

holography was chosen as testing ground as computational holography is both a

complex calculation that can be done using multiple methods and the use of the

holographic process is finding uses in other fields other then optics and media arts

such as data compression.

Objective 2: Using accepted computational methods test to see if reduction of the

dataset will result in reduction of the computational time need to get a result, with

limited impact on results validity when compared to unmodified dataset.

To test the data reduction side of computation machine learning was chosen as the

accepted computational method to test, specifically data classification. The data

reduction was tested by using two methods of classification that don't use the same

theoretical bases and two methods of data reduction.

Objective 3: Was to test embedded implementation of software versus a more

traditional software impamentation on computational systems to see if an

5
embedded systems use of a slimmed down operating system gives performance

improvement for the calculation.

The final objective will be tested using open source computational plasmonic

software called (MEEP), which was ran on multiple systems to measure

performance.

1.4 Research Background

With the level of connectivity that exisit in todays world with the advent of smart

phones, social media, and a computer in every thing that is for sale on the market.

Data is being created at an incredible pace (according to google in the order of

exatbytes a day ), leading to the new buzz word “Big Data”. With this growing stock

pile of data the need to be able to perform calculations on it, is also growing and this

is just not for use by advertisers to get you the latest and greats products into the

costumers home but also by medical professionals to help safe guard the health of

the public as was attempted by the Google flu project. With that said the size of

these datasets are getting to large to be computed on regular systems nor in a

reasonable amount of time. That is were computational efficiency comes in to play

at both the software level and the hardware level.

6
Programing language is the way in which a software devloper communicates with a

computational system to complete a perticuler task. Programing languages come

into main varients the compiled language which is older is taken from the source

code form the user wrote to a seperate file as executable machine code. Which tends

to run faster as it is being ran by the computational system itself and as one of the

manny steps in its creation it is subjected to an optimzer, the other varient scripting

language is a new devlopment that is ran directly from the source, using a

interpreter this language tends to be slower as it is comman for a additional peice of

software to sit between the computational systems operating system and the code

be executed, this is usaley called a virtual machine.

The Jetson TK1 is a devlopment bored created by NVIDIA, a devlpoment bored is a

printed circuit bored and processer with limted logic and user interface used for

prototyping systems. The TK1's prototyping capabilaty mainly revolves around the

use of Graphical Processing Unites(GPUs) as that is NVIDIA's area of expertise. A

GPU is a specialized computer system component traditionally used for the

rendering of computer graphics. The GPU utalizes its high number of parallel

compents to perform simple operations quickly. This parallel operation was then

7
adopted for super computing as there is little difference at phisical level between

the image data and the scientifc data.

2.1 Objective 1: Computaionl Holography Introduction

To test the computational improment throughe algorithm improvement

computational holography was chosen as a measuring device as holography is a

very complex process that can be performed mean ways and those has mean ways it

can be improved appone to test the validity of algorithm improvement.

2.2 Objective 1: Computaionl Holography Background

What is computational holography to answer that question one must first know

what is stander holography or analog holography. Holography is a visual and spacel

from of data storage created using laser light, similarly to a photograph but with the

spacel data also included in the image. This process was created in the 1940s by the

Hungarian-Britsh physists Dennis Gabor who is created with the creation of the

holographic method thue the use of lasers was not done until later. The set setup to

create an analog hologram is as follows the object that the hologram will be of is

chosen and set up by its self. Next comes the laser, the laser usually for ease of

8
access is a helium neon laser which is in the red wave length. This laser is setup on a

stable surface such as a optical bench. The laser is then aligned so the laser light

flows through an optical device called a beam splitter which divides the single beam

in to two separate beams the first beam is the object beam this is the beam that will

hit the object intended for holographic capture the object beam must be made large

enough to hit the whole object and be reflected to the plat for storage this is done by

using lens to widen the beam and culminate the beam back into peral waves that

can now reflect off the whole object to hit the emulation plate. The second beam that

was split off is the reference beam this beam is only widened and culminated before

hitting the emulation plate. At this point a hologram has now been recorded on the

plate to see the hologram another beam of laser light matching the reference beam

must pass throughe the plat. Now computational holography is the process of using

a computer system to calculate and create the image that would be recorded on the

emulation plat in traditional holography in this research two different ways to do

the computational holography where looked at DFFT and Ray-Tracing.

What is DFFT, DFFT stands for Discrete fast furer transform. A fast foure transform

is a computational algorithm that allows a data set to be converted to its component

parts. In the case of holography this allows us to take an image from the amplitude

domain representation to a frequency domain representation. It is this frequency

domain that is the hologram that would be on the emulation plate. This method of

9
computational holography is fast when there is enough data involved but it can lead

to problems based on how the algorithm works in the back end as the algorithm use

a dived and conquer approach which in computation means we may need to add

data to make the algorithm work properly.

What is Ray-Tracing, Ray-Tracing is an algorithm for calculating the path of a waves

as they propergat through a system. This algorithm can be used to calculate

hologram wavefronts by using each individual point on the surface of an object as a

signal light emitting point. the algorithm then calculates the light distribution as it

propergatres to the individual pixels of the plate throughe the use of Euclidean

distance algorithm for three-dimensional space.

2.3 Objective 1: Computaionl Holography Methods

For this research Ray-Tracing was chosen as the method to improve apon as it is an

easier algorithm to understand and follow. it does not hurt that ray casting is quite

similar to the traditional analog holography process. the Jetson TK1 development

Bord which has a 2.32 Ghz CPU, 2GB DRAM, and a NVIDA GPU capable of performing

326 GFLOPS running L4T(Linux for Tegra) operating system which is a slimed down

version of Ubuntu 14.04 LTS is the computational system that was used for this

experiment. the bases for this Ray-Tracing algorithm comes from wendt paper

10
which demos the algorithm in the MATLAB scripting langureg being ran using the

GNU Octave software which is a free open source verent of MATLAB. The first thing

to do to test algorithm effenentsy is to first develop a baseline reading. to begin a

hologram must be created to do this the first step is to stat the wave length of light

to be used, second the resolution of the hologram must also be stated in this

experiment the resolution was 600 dots per inch, the third step is to create the

holographic plat and the reference beam plate, next the dataset must be imported in

in the case of the MATLAB script the data set was hard coded using a total of twenty

object points, now that the variables were set up the calculations can be cared out

object points are grabbed one at a time, the 'X' component of the cordnet is

subtracted from the 'X' in the reference beam plate and the same is try of the 'Y'

component of the object point being subtracted from the 'Y' in the refence beam

plate, the cordnet 'Z' component is unchanged. This new values dx, dy, and dz are

now used to calculate a Euclidean distance the last step in the process is the light

contribution calculation,

for o=1:size(objectpoints,1)
for i=1:size(ipx,2)
for j=1:size(ipy,2)
dx = objectpoints(o,1) - ipx(i);
dy = objectpoints(o,2) - ipy(j);
dz = objectpoints(o,3) - 0;
distance = sqrt(dx^2 + dy^2 + dz^2);
complexwave = exp(2*pi*sqrt(-1)*distance/(wavelength));
film(i,j) = film(i,j) + complexwave;
11
end
end
end

Code Sample 1

now that the holographic image has was created the step was to write it out as a

image this was done by taking the real component of the complex numbers that

make up the plate and write them to a PNG image using MATLAB's "imwrite"

function. To messuer the improvement for this part of the research a bench marking

slandered was used which means the program was ran multiple time on the same

set of data in this case one hundred times thin the average of the run time was taken

and used for comparison at all stages of computation improvement.

To first start the improvement one must understand the algorithm for this perpuss

the implementation used in the research was sampled from the “Computer

Generated Holography” PhD dissertation by Wendt, James B. which use the Ray-

Tracing algorithm write in the scripting language MATLAB, as a scripting language

MATLAB is not known for its speed this brings the first improvement to

computation to be made. if complex computation is nesacery a fast language should

be used for this research C++ was the language chosen as it is still a high level

language meaning it is English like making it easier to understand for the

programmer, C++ also has very nice access to the lower level parts of a computer

12
system. Most parts of the code were straight forward to convert to C++ such as the

variable definitions, the object points for the C++ implamentation where not hard

coded in the software but read from a PCD file, which is file format used by the Point

Cloud Libaray(PCL) that describes a three-dimensional object with XYZ data.

vector<vector<double> > op;


pcd_paraer("my_test1.pcd", op);

Code Sample 2

but other parts were not so intuitive the reference beam plat was being created in

the MATLAB script using a simple for loop which used decimal sized increments, in

C++ such a loop dose not exgest so a function was created using a while loop for the

implementation.

void float_range(double array[], double start, double stop, double step )


{
int counter=0;
double i=start;

while(i<stop)
{
array[counter]=i;
counter++;
i=i+step;
}
}

Code Sample 3

The last issue for the C++ implementation was the image writing its self C++ is not

usually used to create image files a add libery was used called “EasyBMP” to create
13
the Image. The application of the C++ language did show a speed improvement for

the algorithm.

After deciding on the language the code had to be mad more effent the original

templet, from the wendt paper did some operation multiple times in a loop some of

these calculations are constant in the execution of the code allowing for the removal

of the code from the loops speeding up execution of the calculations. Utilizing some

known properties of multiplication and division which stat that as long as long there

is no additional operations involved the order of multiples and divides are eralavent

using this the section of the C++ code that calculated the light contribution was

reduced(eqation redution here)

double pie2 = 2*M_PI;


double optimze = pie2/wavelenght;
.
.
.
double wave= sin(distence*optimize);

Code Sample 4

Taking the code further to improv effentes was attempted throughe the use of a

different mathematics trick of all the singular operations it was hypothesized that

the sin function calculation that is done in the light contribution step of the

hologram creation process was taking the most time to execute in attempt to solve

14
this issue a Taylor Series Expansion to calculate the sin was implemented to see if it

would offer any speed up in the process. So a sepret function was created just for

this calculation that allows the programmer to stat how mean iterations to take out

the Taylor Series Expansion based on the accuracy required for the experiment.

Some functions take longer then others to be computed by a computational system

two such functions in the Ray-Tracing algorithm are the use of the sqruer root

function and the sin function, in this experiment for execution speed up the sin

function was removed from the efficiencyt coding experiment in the previous step

and replaced with a user defined sin function using the Taylor series expansion

which is a series of values that represent a function the more values used in the

series the more accurate the result will be. For this experiment the first three values

for the Taylor Series Expansion for sin where used, based on the theory that

multiple less complex operations would out perform a single complex operation

such as sin.

The last code improvement for computation in this research was the peralissing of

the processes using CUDA C , CUDA C is a programing language built on top of C for

use with NVIDA's GPU's for parallel programing. This allows the programmer to

unroll the loops of the algorithm dropping the complexity of the code depending on

15
what is happening in the loops for this experiment the peralissation was done in

both the Euclidean destinenc calculation and the light contribution calculation by

taking sections of indavedl rows of the holographic plat and giving them to sepreat

CUDA blocks to be calculated all at once this means that each block also need accses

to different variables we created in the previous steps this was done using

cudamallok to allocate memory to each block and cudamemcopy to populate the

previously allocated memory after the calculations were completed each block

copied its results back in to the holographic plate to be written to a image.

2.4 Objective 1: Computaionl Holography Results

The baseline performed in the research reviled that the simple act of taking the

wendt code templet and converting it from the MATLAB script language to the C++

language provided a large speed increase. The original MATLAB script took on

average 19.08 minutes to calculate the hologram. The C++ implementation on the

other hand was able to

16
Figure

Figure 1

on average calculate the same hologram in 3.4596 seconds which is a huge time

difference for something so simple as changing the language used for algorithm

implementation.the time diffrenc comes out to a 99.69 percent drop in execution

time.

17
Figure 2

The next stage of the experiment was the implementation of efficiencyt coding

practices by defining constants that don't change through the experiment process

this was done using order of operations to remove repetitive operations, another

important feature implemented at this step in the code was the use of the sin

function instead of the exponential function which in this algorithm would produce

a complex number as answer but as the real component is what is of interest we can

use the sin function to get the same result with less calculation for this code

implementation. Execution time went from 3.4596 which was seen in the simple

C++ version to an average of 3.3776 seconds for execution of the hologram,which

18
gives the efficiencyt codeing practes implamentation a 2.37 percent drop in

execution time.

Figure 3

The third result was replacement of the stanered sin function with the user defined

sin function the experiment took a average of 2.2706 seconds to execute which dose

provide a computational speed increase but it was flawed as the hologram produced

by this method was blacked out due to the user defined sine function not have a high

enough accuracy for the experiment this could be changed in the future but it was

19
felt that increase the functions accuracy would lead to negation of the computational

speed up gained in previous steps taken.

Figure 4

The last result to discuss is the CUDA C implementation of the holographic

experiment as the user defined sine implementation was fruitless due to the lack of

image accuracy. the execution time used for comparison comes from the efficiencyt

coding practices section which had a average execution time of 3.3776 seconds. The

CUDA C implementation showed greatest speed increase counting the initialization

of all the verabliz and the copying of memory to the Jetson TK1's GPU it took 30.935

seconds if just the actual algorithm executions are examined it took 10.556 seconds

20
to execute all one hundred iterations making the average execution time .10556

seconds. At the current resolution the created hologram gets close to real time

rending, for comperesion perpases this resultes in a 96.94 perecent drop in

execution time which is the second largest drop in this experament.

Figure 5

3.1 Objective 2: Computaionl Learning Introduction

To test the computational improvement throughe data reduction to different

applications of machine learning wear used to test the result the first approach is a

random forest for classification and the second machine learning approach is a

neural network for classification.

21
3.2 Objective 2: Computaionl Learning Background

What is data reduction data reduction is a process of of dropping the complexity of a

data set by reducing the number of features in the dataset. this is a common practes

in machine learning

to reduce what is called the data manifold, this done because a lot of the data seen in

machine learning now is “Big Data” big data has the issue of usually being wide

meaning it has a large number of features usually more features then there are

instenses of data. there are three methods for data reduction that will be covered in

this research there are principal component analyses, autoencoders, and clustering.

Principal component analyses is a statistical process that allows for a linear dataset

to be broken down in to its compint parts, the eigenvectors. And what amount of the

original data is built out of this compint parts, the eigenvalues. For data reduction

purposes, these eigenvalues are important the values that are the smallest

contribute the least to the original data set so the corresponding eigenvector can

also be removed from data while have the least statistical impact at reconstruct. It is

important to remember this works best in cases of a linear dataset.

22
What is an autoencoder, an autoencoder is a form of data reduction that relies on

the use of an artifeal neural networks to reduce the dataset size. this works by using

a deep neural network that drops the features by some predefined amount at every

level until the desired reduction is achieved thin the process is reversed expanding

the reduced dataset out to the size of the orignal dataset. This is done multipi times

adjusting the weights in the neural network after each repetition until the error

between the origenal dataset going in to the autoencoder and the reconstructed

data coming out of the backend of the autoencoder has been minimized. When this

point of error minimization has been reached the data in the middle of the

autoencoder is the reduced representation of the original data, and it is this dataset

that will be used for experimentation. Due to the repetitive nature of minimizing the

error between the two datasets this particular algorithm can take time to find a

optimal solution.

Clustering is a slightly different approach to data reduction it is based the K Nearest

Nabour algorithm used for unsupervised machine learning, by attempting to form

distinct cluster in the data. For data reduction this usually used in the computational

holography experiments to reduce the number of XYZ points that represent the

object, based on there procsimaty to naboring points. As long as the distance

23
messure between the points in question is not to large there is very little impact on

the users ability to destingresh the reduced point object from the original object.

3.3 Objective 2: Computaionl Learning Methods

For this section of research in to computational data reduction. the computer

system that was used was the Jetson TK1 development Bord which has a 2.32 Ghz

CPU, 2GB DRAM, and a NVIDA GPU capable of performing 326 GFLOPS running

L4T(Linux for Tegra) operating system which is a slimmed down version of Ubuntu

14.04 LTS. all three reduction forms descust were tested throu not to the same

degree the forms that were covered specifically were principal component analyses

and autoencoders using the traditional iris data for comparison all programing for

this research were done using the python 2.7 scripting language.

The first step was to get a base line to compare agenst so the iris data was broken

down in groups 1 and 2 of 3 of the iris dataset were used, the data was also

normalized before uses.

dataset = numpy.loadtxt("iras.dat", delimiter=",")

X = dataset[:,0:4]
X01 = dataset[0:99,0:4]
X01 = X01/numpy.linalg.norm(X01)

24
Y = dataset[:,4]
Y01 = dataset[0:99,4]

The newly built dataset was then ran throughe a random forest algorithm that is

part of the python sklearn module to do this a number of variables must be set the

first thing to be set is the random seed that will be used by the algorithm in this case

it will be the number 7, second is the number of trees that will be used to make a

destion for the classification for this experiment 100 trees was chosen, next the

number of features to be used for the classification is entered, for the experiment

baseline all 4 features were used, then came the validation step that uses k fold

cross validation to confirm the accuracy for the classification for the purposes of the

experiment 10 was pick as the “K” to be used as it is a accepted standard for such

validation test, second to the last step was to define the model that was used the

sklearn module has mean different algorithms that can be used for classification so

the Random forest must be explicitly stated and given the verbal's for the number

trees, as well as number of features both of which were defined previously, and final

results are calculated using the K-fold cross validation returning the accuracy to the

user.

seed = 7
num_trees = 100
max_features = 4
25
kfold = model_selection.KFold(n_splits=10,random_state=seed)
model = RandomForestClassifier(n_estimators=num_trees,
max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)

The python module Keras was used to create the neural network classifier which

requires the user to define the model to be used for this experiment the Sequential

model was used, now that the model type has been set the layers can be added for

classifier only one layer was needed this layer was densely connected which in this

case means every node in the initial layer is connected to every node in the next

layer the first value that must be defined is the number of out puts from the layer for

this experiment the out is one representing the classification, the next variable to be

defined was the number of features to be used for this dataset as with the random

forest was 4 representing all the data in the dataset next the model must be

compiled creating the model for the user to do this a loss measurement must be

given for the experiment the “mean squared error” was used, a optimize was also

chosen for model adam in this experiment, and lastly a metric to measure which was

accuracy now the data was fit to the model this was done by giving the function fit

the verbal's holding the dataset “X”, the expected out comes for classification “Y” the

maximum number of iterations epochs which was set to 100, and the batch size

26
which was set to 10. as a final step the model is evaluated which retunes to the user

the classification accuracy.

model = Sequential()
model.add(Dense(C, input_dim=IP, activation='sigmoid'))
model.compile(loss='mean_squared_error', optimizer='adam',
metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=100, batch_size=10)
# evaluate the model
scores = model.evaluate(X, Y)
return("\n%s: %.2f%%" %(model.metrics_names[1],
scores[1]*100))

After the baseline was calculated it was time to test the two forms of data reduction

the first method tested was the principal component analyses which was also done

using the python 2.7 scripting language to perform the analyses the steps were as

follows one the data was centered around the mean of the data by taking the mean

of feater sets and minesing the mean from ever data point, second the data was then

dot productid with its transpose and divide by the number of total data points in the

dataset which creates the covariance matrix, the third step is to now calculate the

Eugen vectors and Eugen values for the covariance s matrix which provides with the

means we need to reduce the data as we need as well as provide us with vector

importens, which can be used to reduce the dataset while impacting the original

accuracy as little as possible, and the final step is to take the number of Eugen

27
vectors to require and perform a dot product with the original dataset, which will

return to you a reduced representation of the original data with as mean featers as

Eugen vectors used this new dataset is then sent in to the random forest algorithm

described previously using the same number of trees and the same random seed

value as used in the base line test and the same approach is applied for the

autoencoder using the same parameters as used in the base line.

def pca(X, col, fullBuild=0):


n = len(X)
print n
mn = np.mean(X,axis=0)
X0 = X - np.repmat(mn,n,1)
S = np.divide(np.dot(np.transpose(X0),X0),float(n))
[mW,mV] = np.linalg.eig(S)
if fullBuild == 0:
return np.dot(X0,mV[:,0:col])
elif fullBuild == 1:
return np.dot(np.dot(X0,mV[:,0:col]),np.transpose(mV)) + mn

Next the autoencoder reduction is performed using the python module Keras, the

same module that was used for the nerul network classifer. this done by making a

new nerul network this new network will be a sequential model like the classifer

next a string verirbal repesenting the layer that will provid the reduced dataset,

then the model layers most be built using the “add” function to add a layer to the

network as befor all layers are densely conect as denoted by the “Dense” function

the first layer took in 4 features as input droping the size down to 3, using the
28
sigmoid function for activation purposes for all layer. the second layer took the 3

nods from the previus layer and droped the features down to 2 at this point the

string verirbal that was deffineed at the beginning of the code is placed in this layer

to allow for data retrival. The third layer took the 2 features from layer 2 and

expaned the features back out to 3 and as stated before all this is done using a

sigmoid function for activation. The last layer took the 3 features and returned the

features to the origanal size of 4. then the model was compiled with the same

peramiters used in the nerul network classifier using the mean squard error, the

adam optimizer, and the accuracy metric. Next the model is fit to the data the first

peramiter is the origanal data set “X” as was the case in the classifier, the second

perameter was also “X” as the last result in the nerul network is meant to be close to

the origanal dataset, and as before in the classifier the epochs and batch size were

set to 100 and 10 respectivly. Lastly we use the string verirbal name to return the

dataset to the user

In the last two phases of the experiment the stated procedure was cared out one

hundred times insecsetion matching the baseline setup to allow for an average time

of execution to be determined for the corresponding methods of data reduction.

def auto_e(X):
layer_name1 = 'my_layer1'
layer_name2 = 'my_layer2'

29
model = Sequential()
model.add(Dense(3, input_dim=4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid', name ='my_layer1'))
model.add(Dense(3, activation='sigmoid'))
model.add(Dense(4, activation='sigmoid', name ='my_layer2'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam',
metrics=['accuracy'])
# Fit the model
model.fit(X, X, epochs=100, batch_size=10)
# evaluate the model
intermed_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name1).output)
intermediate_output = intermed_layer_model.predict(X)

last_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name2).output)
last_output = last_layer_model.predict(X)
scores = model.evaluate(X, X)
return [intermediate_output, last_output]

3.4 Objective 2: Computaionl Learning Resultse

The baseline experiment was ran first involving the random forest classifier and the

Neural network classifier. the total execution time was 732.7328 seconds to

compute all one hundred iterations of the experiment for the random forest which

gives an average execution time of 7.32732 secs for the dataset. the results of the

random forest itself should a one hundred percent accuracy in its ability to classify

the two iris speces. The neural network classifier took 245.9223 seconds to execute

30
all one hundred iterations giving average execution time of 2.459223 seconds, the

neural network classifier also performed with one hundred percent accuracy in

destingreshing the two speses .

Figure 6

With the baseline out of the way the first data reduction tested was the principal

component analyses the data set was dropped down form four total features down

to to total features first the random forest classifier took 728.5684 seconds to

execute making for an average of 7.2856 seconds to complete the computation and

as before this classification was also completed with one hundred percent accuracy.

The neural network was done next it 244.8235 seconds to complete giving an

average execution time 2.4482 seconds. the neural network classifier as before had

a one hundred percent accuracy, agen these times are based on one hundred
31
iterations to get the average execution time. based on these execution times the

principal component analyses data reduction method resulted in 0.5693 percent

increase speed for the random forest and a 0.4482 percent increase for the nerual

network classifier.

Figure 7

The last experiment was the application of the autoencoder to perform the required

data reduction the random forest portion of the calculation took 726. 0595 seconds

to execute the classification which gives an average execution time of 7.260595

seconds with a classification accuracy of 96.777 percent. The neural network

classifier completed the calculation in 241.3531 seconds making for average

execution time of 2.413531 seconds with a classification accuracy 0f 50.51 percent

which is a very poor performance compared to the original dataset. For the

32
autoencoder data reduction method the random forest had a 0.9106 percent speed

increase and the neural network received a 1.8579 percent speed increase.

Figure 7

4.1 Objective 3: Computational Plasmonics Introduction

In this section of the research the third objective is examined, if the use of a

embedded system would lead to computational improvements. embedded systems

are small devices capable of performing computation. as technology has improved

these embedded systems know house their own operating systems, to have an

operating system on such small devices the templet operating system must be

slimed down and it is this light weight design approach that will be examined for
33
computational improvement using computational plasmonics on three different

computer systems.

4.2 Objective 3: Computational Plasmonics Background

What is computational plasmonics first plasmonics is the design and study of how

metamaterials, made up of graphene and noble metals inter act with

electromagnetic energies for novel effects. In computational plasmonics the study is

taken from the lab to the computer system. This is done in in computation a number

of ways for these experiments it was done using the MIT Electromagnetic Equation

Propagation(MEEP) software, which is a finite difference time domain(FDTD)

simulator that works using Maxwell's equations to compute and simulate

electromagnetic behaver of a sample over time using there C++ or the CTL scripting

language which at its core is the Sheme programing language developed by MIT.

4.3 Objective 3: Computational Plasmonics Methods

The experiment will be done using three different computer systems as tested beds

for comparison the first computer system is the Jetson TK1 development Bord

which has a 2.32 GHz CPU, 2GB DRAM, and a NVIDA GPU capable of performing 326

34
GFLOPS running L4T(Linux for Tegra) operating system which is a slimmed down

version of Ubuntu 14.04 LTS, the second computer system was a Lenovo G580 with

a Intel B960 Dual-core CPU operating at 2.2 GHz, 3.7GB RAM running Ubuntu 14.04

LTS and the last computer system was a Lenovo Thinkcenter M90z with a Intel i3

qurod-core CPU operating at 3.07 GHz , 5.5 GB RAM running Ubuntu 15.01. each of

the three test beds ran MEEP using a stanerdes script going throughe every aspect

of the simulation development process, the process goses as follows a script is

written as a CTL file that first describes the simulation environment such as the

environment dimensions utilizing the lattice key word followed by the martial

description including the materials shape, materials location in the environment

and the materials dielectric and magnetic properts then a description the excitation

source is given including the wave form of the source ether continues or gaussian,

the location of the source in relation to the environment, the source frequency and

source type electric or magnetic in nature and the final section of the CTL file covers

the run time.

(set! geometry-lattice (make lattice (size 16 16 no-size)))

(set! geometry (list


(make block (center -2 -3.5) (size 12 1 infinity)
(material (make dielectric (epsilon 12))))
(make block (center 3.5 2) (size 1 12 infinity)
(material (make dielectric (epsilon 12))))))

(set! pml-layers (list (make pml (thickness 1.0))))


35
(set! resolution 10)

(set! sources (list


(make source
(src (make continuous-src
(wavelength (* 2 (sqrt 12))) (width 20)))
(component Ez)
(center -7 -3.5) (size 0 1))))

(run-until 200
(at-beginning output-epsilon)
(to-appended "ez" (at-every 0.6 output-efield-z)))

the script is then ran using the “meep” terminal command which creates a new text

file called a “.out” file that holds the ran simulation data this data in then used to

create jpeg image files representing individual time slices og the simulation that is

then turned in to a gif file using the convert command from the ImageMagick library.

Finally the jpg time slice images that were created were removed from the system

as they were no longer necessary to the simulation. This process was careed out one

hundred times utilizing a set of three different bash scripts to handle all the function

calls as well as having the whole process loop the one hundred times the time of

execution was then gathered using the time function in front of the bash script

which provide a time of execution for the total process by taking this time and

dividing by one hundred a average time of execution should be obtained.

36
4.4 Objective 3: Computational plasmonics Results

The baseline run which was conducted using the Lenovo lap top took 39 minutes

41.588 seconds to complete all one hundred iterations resulting in a total time of

2381.588 secs to complete averaging out to 23.81588 secs to perform the

simulation for the plasmonic behaver described by the ctl file.

Figure 9

The Lenovo Thinkcenter perofrmed the calculations in 69 minutes and 58.358 seconds

which comes out to 4198.358 seconds which averages out 41.9835 seconds for

execution.
37
Figure 10

The Jetson did the same calculation in 75 minutes 32.763 seconds for one hundred

iterations giving a total time in seconds of 4532.763 which averages out to 45.32763

secs to calculate the simulation.

38
Figure 11

These results a contradictory to what was expected to Jetson computational system

poses a small operating versus the Lenovo laptop used in the baseline test but this

out performance may be due to a number of things the first and most likely

candidate was the difference in the DRAM between the two computational systems

the Lenovo had 3.7 GB where as the Jetson possess only 2.2 GB of DRAM. The

Lenovo laptop has 50.84% percent more RAM versus the Jetson and the execution

time showed a 62.23 percent difference. The last test bed the Lenovo thinkecenter

M90z took 69 minutes 58.258 seconds to compute all one hundred iterations that is

4198. 358 seconds giving a average execution time of 41.9835 seconds. One last

point in this experiment to take note of is that the Jetson is meant to be ran as a

parallel computation system not a serial computation system which this experiment

39
was. due to the lack of the testbeds other then Jetson haveing accesse to GPUs for a

parallel implementation of the experiment.

5.1 Results discussion

the modification of a algorithm to a more effent form both in language and program

format should the highest impact on the execution speed of the computation of the

experaments with a speed increase of 99.9907 percent. Where as the data reduction

experament yeilded a speed incress of 0.5693 percent in the case of the principal

component analyses data reduction when using the reandom forest classifer thoe in

over all speed the neriual network did perfom quricker. The thrid experament did

not leand its self to determining if the slimed down featuer of the Jetson's operating

system benifeted computational improvement.

5.2 Conclusion

The research presented in thesis demonstrates that computational systems can have

their speed efficiency improved apon thure the use of software modification without

having to invests more resources in hardware, but in to a faster language and

stream lined design, and dataset augmentation, to reduce the width of the data

40
instances thus reducing the number of computations need to produce a result.

Specficly the for this research the use of PCL for data reduction and a nural network

classifer showed the most promess as the speed was improved without effecting the

acuraccy negetivly. In the future the methods stated here should be tested against

other datasets in the real world for their effectiveness to determine a set path of

operation to get the most effent speed ups for a given computational system.

References

Chen, CL Philip, and Chun-Yang Zhang. "Data-intensive applications, challenges,

techniques and technologies: A survey on Big Data." Information Sciences 275

(2014): 314-347.

Chen, Rick H-Y., and Timothy D. Wilkinson. "Computer generated hologram with

geometric occlusion using GPU-accelerated depth buffer rasterization for

three-dimensional display." Applied optics 48, no. 21 (2009): 4246-4255.

Katal, Avita, Mohammad Wazid, and R. H. Goudar. "Big data: issues, challenges, tools

and good practices." In Contemporary Computing (IC3), 2013 Sixth

International Conference on, pp. 404-409. IEEE, 2013.

41
Van Der Maaten, Laurens, Eric Postma, and Jaap Van den Herik. "Dimensionality

reduction: a comparative." J Mach Learn Res 10 (2009): 66-71.

Wendt, James B. "Computer Generated Holography." PhD diss., Pomona College,

2009.

APPENDICES

DataReductionTest.py

import util2
import numpy
import timeit

dataset = numpy.loadtxt("iras.dat", delimiter=",")

X = dataset[:,0:4]
X01 = dataset[0:99,0:4]
X12 = dataset[50:149,0:4]
X02 = numpy.zeros((100,4))
X02[0:49,0:4] = dataset[0:49,0:4]
X02[50:99,0:4] = dataset[100:149,0:4]
X = X/numpy.linalg.norm(X)

42
Y = dataset[:,4]
Y01 = dataset[0:99,4]
Y12 = dataset[50:149,4]
Y02 = numpy.zeros((100,1))
Y02[0:49,0] = dataset[0:49,4]
Y02[50:99,0] = dataset[100:149,4]
X_pca = util2.pca(X01, 2)
[iv, l] = util2.auto_e(X01)#43234

#######withoutdata reduction##########
print 'randomforest\n'
start_time1 = timeit.default_timer()
for i in range(100):
rf01 =util2.r_forest(X01, Y01, 100, 4, 7)
elapsed1 = timeit.default_timer() - start_time1

print 'NNC\n'
start_time2 = timeit.default_timer()
for i in range(100):
nnc01 =util2.nnc(X01, 4, 1, Y01)
elapsed2 = timeit.default_timer() - start_time2

print 'rf with class 0, 1 = {}'.format(rf01.mean())


print elapsed1

print 'nnc with class 0, 1 = {}'.format(nnc01)


print elapsed2
########################################

#######withDataReduction################

start_time3 = timeit.default_timer()
for i in range(100):
rf_pca =util2.r_forest(X_pca, Y01, 100, 2, 7)
elapsed3 = timeit.default_timer() - start_time3

43
start_time4 = timeit.default_timer()
for i in range(100):
nnc_pca =util2.nnc(X_pca, 2, 1, Y01)
elapsed4 = timeit.default_timer() - start_time4

print 'rf_pca all data = {}'.format(rf_pca.mean())


print elapsed3

print 'nnc_pca all data = {}'.format(nnc_pca)


print elapsed4

Util2.py

import numpy.matlib as np
import pandas
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier
from keras.models import Sequential, Model
from keras.layers import Dense

def r_forest(X, Y, numtrees, maxf, sed):


seed = sed
num_trees = numtrees
max_features = maxf
kfold = model_selection.KFold(n_splits=10,random_state=seed)
model = RandomForestClassifier(n_estimators=num_trees,
max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
return results

def auto_e(X):
layer_name1 = 'my_layer1'
layer_name2 = 'my_layer2'
model = Sequential()

44
model.add(Dense(3, input_dim=4, activation='sigmoid'))
model.add(Dense(2, activation='sigmoid', name ='my_layer1'))
model.add(Dense(3, activation='sigmoid'))
model.add(Dense(4, activation='sigmoid', name ='my_layer2'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, X, epochs=100, batch_size=10)
# evaluate the model
intermed_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name1).output)
intermediate_output = intermed_layer_model.predict(X)

last_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name2).output)
last_output = last_layer_model.predict(X)
scores = model.evaluate(X, X)
return [intermediate_output, last_output]

def nnc(X, IP, C, Y):


model = Sequential()
model.add(Dense(C, input_dim=IP, activation='sigmoid'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=100, batch_size=10)
# evaluate the model
scores = model.evaluate(X, Y)
return("\n%s: %.2f%%" %(model.metrics_names[1], scores[1]*100))

def pca(X, col, fullBuild=0):


n = len(X)
print n
mn = np.mean(X,axis=0)
X0 = X - np.repmat(mn,n,1)
S = np.divide(np.dot(np.transpose(X0),X0),float(n))
[mW,mV] = np.linalg.eig(S)
if fullBuild == 0:
45
return np.dot(X0,mV[:,0:col])
elif fullBuild == 1:
return np.dot(np.dot(X0,mV[:,0:col]),np.transpose(mV)) + mn

46