Anda di halaman 1dari 16



This research paper will require that you deal with the technology and its potential
implications from a

managerial, rather than a technical, perspective. Be careful to avoid excessive
technical jargon and detail;

at the same time, you should describe and assess those aspects of the technology
which are of managerial

importance. The paper will be limited to a maximum of ten typewritten pages


Research Paper #1 Guidelines

Deep learning consists of learning algorithms that discover multiple levels of
features that work together to define increasingly more abstract aspects of data

Deep learning algorithms are machine learning methods based on learning
representations. An observation can be represented in many ways, but some
representations make it easier to learn tasks of interest from examples. What makes
better representations, how to create models to learn these representations

1990s exploration 2006 Hinton

on the S-Curve?

ng forces


Business Challenges


Defining the Technology: Deep Learning as an Advancement in Machine Learning

Deep learning consists of algorithms that use multiple transformations of data to
better represent and thus learn more abstract aspects of data.

The algorithms are part of a larger family of machine learning methods, which all
attempt to extract the most significant features about data for tasks such as
prediction or classification. Deep learning algorithms, however, are the latest
advancement in machine learning in two highly related areas: feature engineering
and data representation.

Feature engineering is the process of using human input to pre-process data to best
increase an algorithms performance and accuracy. A machine learning algorithm
that classifies documents, for instance, will not be able to correctly identify topics
until an engineer first breaks down the input documents into n-grams or bag-of-
words, which are ways to represent and organize language. Thus most machine
learning algorithms can only identify the most important features in data after a
human thinks about what features best represent a dataset and pre-selects these
features for the algorithm to analyze. In contrast, deep learning eliminates the need
for human input and pre-processing procedures by letting the software discover all
possible important features by itself. This shift from requiring human input to
trusting the computer to autonomously learn is the difference between what is
called supervised and unsupervised learning in the machine learning field. Deep
learning, however, has not revolutionized machine learning because it uses the
concept of unsupervised learning, a concept that has been around for a couple years
now, but rather because it enables much more accurate unsupervised learning using
an innovative approach to data representation.

Deep learning champions the idea of using multiple layers of nodes with
connections between the layers rather than a single layer to fully capture all the
complexity and richness in a large dataset. Unlike most machine learning
algorithms, deep learning can discover more complex and abstract relationships
between nodes using multiple layers and thus achieve higher rates of success in
prediction, classification and other traditional machine learning applications.

Thus in this paper, deep learning technology is defined as machine learning
algorithms that specifically use an unsupervised, multi-layer node approach to
represent and learn from data.

Scientific Foundations

At the most basic level, the technology behind deep learning relies is based on the
premise that the relationship between an input and output can be modeled using a
network. The network model abstracts data by using nodes to represent features
about the data and connections between nodes to represent how those features
interact. The challenge for the algorithm is to use data to identify the features and
relationships between features that lead to a certain output.

Before deep learning, the nodes that represented data features would exist in a
single shallow layer and the algorithm would immediately perform computations
to reach a final result. In deep learning algorithms, however, nodes actually pass
their outputs to other nodes, which can pass their outputs to yet another layer of
nodes, and so and so forth. Each layer uses results from the past layer to make
decisions at a more complex and abstract level, which ultimately creates a network
that can engage in more sophisticated decision-making than past machine learning

A deep learning network that learns how to recognize images, for instance, might
have four, five or six layers where the initial layers learn how to recognize edges,
textures, and parts of objects to higher layers that learn how to recognize whole
objects in particular configurations, and finally, a human-like ability to recognize
whole objects regardless of configuration (CITE PICTURE TOO). This type
of modeling is partially inspired by the human brain and conceptually how the
neuron networks work together to understand and learn sensory data.

While all deep learning algorithms use a multi rather than single layer architecture,
there are two separate schools of thought on the type of computations that deep
learning should use: probabilistic graphical models and direct encoding or neural
network models. The next two sections briefly describe the theory and science
behind the two approaches.

Probabilistic Graphical Models

Probabilistic graphical models use the statistical concept of a restricted Boltzmann
machine (RBM), a bipartite graph where each node is a binary random variable
(either 0 or 1). A bipartite graph is a graph where the nodes can be separated into
two groups and the only connections that can exist between nodes are those
between nodes of different groups. In an RBM, the two groups consist of visible
nodes x that make up the input data and hidden nodes h that explain the
dependencies between the visible nodes by measuring their interactions. The RBM
essentially abstracts the input data by representing it as a joint probability
distribution of x and h, and then computes the probabilities of x and h that map to
the probabilities for certain outputs. The most successful deep learning algorithms
use multiple RBMs where the hidden nodes formed by examining the visible nodes
from one layer become the new input data for another RBM (CITE).

Direct encoding or neural network models

In direct encoding or neural network models, each layer has nodes that act as
computational units. These nodes can take in an input and then typically use two
major mechanisms to determine an output. First, the node may check to see
whether inputs pass a lower or upper threshold, and excludes inputs that do not
pass from the decision-making process that leads to an output. Second, the node
weights inputs that are more important than others. Thus using a threshold value
and weights, the node can create an output which is essentially a condensed and
more abstract representation of the initial input. The next step is what essentially
makes a neural network a form of deep learning, because the node then passes on
the output (which is a squashed version of the first input) as the input for another
node and thus layers of nodes are formed. Again the extra layers between the first
layer of the input data and the final output are called the hidden layers which
characterize deep learning. The role of the algorithm is to compute the optimal
weights and thresholds in all the layers of nodes so that the model can accurately
determine the correct output for a given input.

A very simple model of neural network at the node level, with weights of -2 on the
values of input data x1 and x2 and threshold of 3

A neural network with one hidden layer. The w
are weights. (cite

Key developments/early advances

Both of the deep learning implementations described above are based on the
concept of unsupervised training, an advance from supervised training. For years
machine learning models were and still are trained using labeled data, where the
system was shown an input and told what output it should produce, such as
handwriting samples and the corresponding alphabet letters. Labeling data,
however, requires human input and is labor-intensive and slow. The first key
innovation in deep learning, however, was the development of a more powerful type
of supervised learning. In the 1980s Geoffrey Hinton, a computer scientist at the
University of Toronto, pioneered the creation of a machine learning model that
employed learning in multiple layers, combining low-level features into
successively higher levels and was arguably the first effective implementation of
deep learning concepts. The system failed, however, to achieve a high level of
performance in the image recognition task it was given because of the lack of
labeled data and limitations of the computation power at the time.

Starting in 2005, however, fundamental developments in processing power, data
collection and improved algorithms made it possible for deep learning to combine a
multilayer approach with unsupervised learning to more accurately predict results.
Computers were now fast and powerful enough to enable three key features of deep
learning, which made the new deep learning models in the mid 2000s a remarkable
advancement from earlier models.

First, computers could handle the heavy computing involved in a multi-layer
approach, which requires that the computer start with low-level features, such as
the intensities of individual pixels, and learn multiple layers of features all at the
same time (CITE). Second, computers could handle the equally heavy computing in
unsupervised learning, where the computer must learn for itself which features best
represent input data and predict outputs, instead of relying on labeled training data.
Third, computers could process the huge, now unlabeled datasets that were still
needed to properly train a model for a complex operation such as prediction.

Today innovations in the same areas of processing power, data collection and
improved algorithms continue to advance deep learning models and make
applications such as artificial intelligence and image and voice recognition a reality.

Current Status and Applications


Deep Learning on the Technology S-Curve

The darker circle pinpoints deep learning algorithms and the lighter circle pinpoints
machine learning algorithms as a whole on the s-curve. Machine learning algorithms
are farther up than deep learning algorithms on the s-curve because the prior
technology has been much more developed and is much more widely used. While
research in machine learning methods is still flourishing, most recent improvements
in machine learning have been only incremental and increased performance by a
relatively small amount. Single-layer machine learning algorithms used in voice
recognition, for example, have only been able to achieve

Current Performance

Deep learning algorithms have begun to be used for commercial applications, such
as detecting fraud, image and voice recognition.
The use of semi-supervised learning and deep neural nets is the basis for some of the more
dramatic results seen recently in pattern recognition. For 20 years, most speech systems have
been based on a learning method that does not use neural nets. In 2011, however, computer
scientists at MSR, building on earlier work with the University of Toronto, used a
combination of labeled and unlabeled data in a deep neural net to lower the error rate of a
speech recognition system on a standard industry benchmark from 24% to about 16%. "Core
speech recognition has been stuck at about 24% for more than a decade," Platt says. "Clever
new ideas typically get a 2% to 5% relative improvement, so a 30% improvement is
astounding. That really made the speech people sit up and take notice."
In last year's ImageNet Large Scale Visual Recognition Challenge, Hinton's team from the
University of Toronto scored first with a supervised, seven-layer convolutional neural
network trained on raw pixel values, utilizing two NVIDIA graphics processing units (GPUs)
for a week. The neural network also used a new method called "dropout" to reduce
overfitting, in which the model finds properties that fit the training data but are not
representative of the real world. Using these methods, the University of Toronto team came
in with a 16% error rate in classifying 1.2 million images, against a 26% error rate by its
closest competitors. "It is a staggeringly impressive improvement," says Andrew Zisserman, a
computer vision expert at the University of Oxford in the U.K. "It will have a big impact in
the vision community."
Also last year, researchers at Google and Stanford University claimed a 70% improvement
over previous best results in a mammoth nine-layer neural network that learned to recognize
faces without recourse to any labeled data at all. The system, with one billion connections,
was trained over three days on 10 million images using a cluster of machines with a total of
16,000 cores.
The different models for learning via neural nets, and their variations and refinements, are
myriad. Moreover, researchers do not always clearly understand why certain techniques
work better than others. Still, the models share at least one thing: the more data available for
training, the better the methods work.
MSR's Platt likens the machine learning problem to one of search, in which the network is
looking for representations in the data. "Now it's much easier, because we have much more
computation and much more data," he says. "The data constrains the search so you can
throw away representations that are not useful."
Image and speech researchers are using GPUs, which can operate at teraflop levels, in many
of their systems. Whether GPUs or traditional supercomputers or something else will come
to dominate in the largest machine learning systems is a matter of debate. In any case, it is
training these systems with large amounts of data, not using them, that is the
computationally intensive task, and it is not one that lends itself readily to parallel,
distributed processing, Platt says. As data availability continues to increase, so will the
demand for compute power; "We don't know how much data we'll need to reach human
performance," he says.
Hinton predicts, "These big, deep neural nets, trained on graphics processor boards or
supercomputers, will take over machine learning. The hand-engineered systems will never
catch up again."
Performance Metrics
The two primary performance metrics that will determine successful commercialization and
application of deep learning algorithms include 1) accuracy in classification or other
prediction operations and 2) training time. Deep learning algorithms should be evaluated
against other state-of-the-art algorithms and human capabilities under these two criteria to
pinpoint the current stage of this technology.
The first performance metric varies by the application of deep learning, which usually already
have specific, measurable goals. In most classification applications, for example, the goal is to
minimize the number of misclassifications. Similarly in prediction applications, the goal is to
minimize the number of wrong predictions. The second performance metric is typically
measured by measuring training time.
Research has shown that under the first metric and sometimes under the second metric, deep
learning algorithms outperform other machine learning algorithms. Jarret et al., (2009), for
instance, used deep learning algorithms to achieve an 0.53% error rate in handwriting
recognition, the lowest known rate among all algorithms. Mo (2010) further shows that
deeper, multi-layer architectures yields the same error rate after eight hours of less training
time than traditional shallow algorithms in image recognition tasks. Mo also shows the
deep learning algorithms can cut the error rate in half, if given more training time. Similarly,
Stallkamp (2011) demonstrates that the deep learning methodology (CNN or Convolutional
Neural Networks) achieves a statistically significant higher rate (99.47%) in correctly
classifying traffic signs than both humans and LDA, a simplistic machine learning model that
depends on a single layer of feature extraction. He also notes, however, that LDA still
achieved an 95.37% accuracy rate, and required considerably less computation. Many other
papers besides the ones showcased here have showed similar results (see Appendix, add this
In general, however, deep learning algorithms face challenges in meeting the second
performance metric because the multi-layer approach increases the complexity of the model
and requires larger datasets, and so requires more training time. Stallkamps CNN model
took 50 hours in total to train, which is implied to be take much longer processing time
than the simpler LDA model (cite). Similarly Ciresan et al (2010) and other recent studies
observe do not give specific figures but note that it is quite time intensive to train deep
learning algorithms to yield state of the art results
( As processing power continues to
improve, however, and researchers optimize multi-layer architecture, deep learning is likely
to see decreases in training time and meet the second performance metric.

Current applications
Deep learning has applications in ___ which affect a wide variety of industries.
Image Recognition, automobile

Recognition is one of the key technologies driving advances in many industrial applications.
For example, image recognition is now an integral part in document processing, inspection of
manufactured parts and surveillance. Moreover such important technologies as image
compression and computer-user interfaces depend more and more on image recognition.
Neural nets have become a widely accepted approach for recognition tasks. T
If neural technology is to evolve beyond a tool for pattern recognition, it must incorporate
and exploit advances in related fields. Examples of such emerging multidisciplinary
approaches will be provided for computational neuroscience, mechanics, and material failure
analysis, hydrodynamics, autonomous vehicle control, chemical engineering, and molecular

The third performance metric is roughly measured by the number of processors used, or
more generally by the complexity of the hardware.
Dominant Design
As of now there is currently no dominant design, as there are many ways, including DNN,
CBN, different kinds of graphs, both directed/undirected, number of layers, different ways to
count up probabilities. The design that will emerge as the key standard will be the one that
can match performance metrics with as close to perfect accuracy rates in
classification/prediction tasks and low training time.

The latest estimates for the total global market for artificial intelligence come to
$21.2 billion in 2007, one-third of which is made up of neural networks and belief
networks technologies (probabilistic deep learning models are often also called
belief networks). Belief networks, in particular, saw $2.2 billion in sales in 2007 and
the highest AAGR of 13.1%. Overall the market for deep learning networks has
shown significant growth and has nearly doubled in the years 2002-2007. Since
2007, considerable advances in belief networks have most likely caused the market
for belief networks to catch up to that of neural networks.

Because deep learning has potentially substantial applications to several industries,
most notably in technology, defense and healthcare, the major driver of innovation
for deep learning are private companies. While there are no exact figures, the big
players in technology show increasing commitment to deep learning R&D by
pouring money into acquisitions and project launches, and are in many ways
competing with each other to snatch up deep learning talent (cite). Google has
recruited academics to conduct deep learning research and develop some of its
newest initiatives in artificial intelligence. In 2011, Sebastian Thrun left Stanford to
head Googles autonomous car project, Andrew Ng led several projects, including
the deep learning study in very large scale computer vision and most recently a
study that shows how deep learning systems can be created using standard industry
hardware, a project focused on process innovation.
The company also recently acquired DeepMind, for instance, for a reported $400 to
$500 million, a startup with one of the biggest concentrations of deep learning
researchers anywhere (

Facebook At the end of the call, LeCun got a new job: He's now director of AI
research at Facebook.
LeCun says deep learning will soon change the way Facebook and other
companies handle your images. It may be possible for Facebook's search engine
to know the face of a friend, or to organize images that are taken in the same
place or with the same people. "You're going to have similar things happening for
video," he adds.


Short-term: Speech recognition, optical recognition
Long-term: language understanding

First, some context. While Geoff is undoubtedly a giant in the field, he is only the latest in a
string of departures to Google.Sebastian Thrun left Stanford to head Googles autonomous
car project in 2011. Andrew Ng has led several projects at Google, including the recent high-
profile deep learning study in very large scale computer vision. And in late 2010, Matt
Welsh made news when he left his tenured faculty position at Harvard to join Google. Except
for Matt Welsh, this trend has particularly centered on large scale machine learning, and this
question was recently put to Andrew Ng during the panel discussion at the BigVision
workshop at NIPS. Andrews answer was that the effort involved in doing machine learning at
a large scale required significant industrial engineering expertise, one that does not exist in
academic settings including Stanford, and that a place like Google is simply much better
equipped to carry out this function.

Driving force big tech companies, government funding

The government is also another key player in providing funding for research in
deep learning technology that drives innovation across multiple fields, such as
neuroscience, defense and biotechnology. The most recent example is President
Obamas newly announced Brain Research Through Advancing Innovative
Neurotechnologies Intiative or BRAIN. Half of the $100 million in federal funding
allotted to this program will come from defense through Darpa, with some
contribution from the NIH. The project aims to use deep learning technology to
map out how thousands of neurons are interconnected in the brain and better
understand how information is stored and processed in neural networks, and is a
major step towards creating an artificial brain. The data from the project could
ultimately feed into and help improve other deep learning algorithms used for
language analysis, voice recognition and other applications.

If we map how out how thousands of neurons are interconnected and how information is
stored and processed in neural networks, engineers like Ng and Olshausen will have better
idea of what their artificial brains should look like. The data could ultimately feed and improve
Deep Learning algorithms underlying technologies like computer vision, language analysis,
and the voice recognition tools offered on smartphones from the likes of Apple and Google.

Companies investing money:

Government money: BRAIN project! Also in the above article

Restraining forces hardware limitations, funding going to simpler machine
learning methods but a lot of hype now, a lot of money going into deep learning

learning algorithms)

HARDWARE = complementary technology
There are two important problems in the hardware support of neural network
learning: (a) parallel

processing and VLSI designs to support learning and application of neural networks,
and (b)

hardware/software supports for generating/emulating an environment necessary
for learning. The

IC" 8 first problem has been studied extensively by many researchers, but the laitter
is a less glamorous

problem that has been largely overlooked. A related question to be adidressed is
whether the

environment generated for learning is realistic or not.

For instance, to design a neural network to control jet engine fire, it may be
necessary to create an

environment in which a jet engine can catch fire, and the neural network can be
tested to put out the

fire. In addition, the color of the fire may indicate the type of fire, which may need to

considered in the design. Creating such a testbed, either in hardware or in software,
is a nontrivial

task. As another application, it may seem easy to design a neural network to load
balance jobs in a

network of workstations. However, creating an environment in which realistic

workload can be repeated in the network of workstations is nontrivial. [n general, a
lot of the

research and applications of neural networks is hampered by the difficulty of
creating a realistic

environment for learning and testing. Emerging technologies will help in the
development of faster

and more robust learning algorithms but may not simplify the design of the learning

Recently, Adam Coates and others at Stanford developed a deep learning system with over 11
billion learnable parameters. One of the key drivers to progress in deep learning has been the
ability to scale up these algorithms. Ngs team at Google had previously reported a system that
required 16,000 CPU cores to train a system with 1 billion parameters. This result shows that it is
possible to build massive deep learning systems using only COTS (commercial off-the-shelf)
hardware, thus hopefully making such systems available to significantly more groups.

External factors include data privacy issues/ethics, government funding

A unique development in Googles DeepMind acquisition was the mandatory
establishment of an ethics board. According to people close to the situation,
Googles willingness to establish an ethics board was a deciding factor in it
purchasing DeepMind instead of Facebook. While almost any sci-fi movie of the
past 50 years has dealt with ethical questions in some form or other, in the real
world there are still relatively few concrete
laws ( dealing with this part of AI--aside
from the usual rules concerning things like privacy and product liability.


1. Development of application-specific standards bodies that allow for enhanced development
of AI-intensive applications and technologies
2. Closer integration with industry (across sectors) to highlight and demonstrate how
advancing AI-intensive applications can increase productivity
3. Enhanced communication (e.g. development of printed and electronic journals, coordination
of global conferences, etc.) within global AI associations (ECCAI, AAAI) and other groups
working within this area

Compile and analyze all the research done in different computation methods, such
as the ones described earlier: RBM and direct encoding/neural networks

To get big datasets, access to heavy duty computing power -> best done in the
context of a company, like Google
Although startups also have innovative approaches
Strategic bonds between academic researchers and companies that have specific
applications in mind -> faster commercialization of deep learning algorithms

3 sentence conclusion:
Ultimately deep learning has huge potential
Transformed from an obscure academic topic into one of techs most exciting fields
in under a decade