Anda di halaman 1dari 8

Dimensionality Reduction

t-SNE

Stochastic neighborhood embedding (SNE)


t-SNE is a tool to visualize high-dimensional data. It converts similarities
between data points to joint probabilities and tries to minimize the Kullback-Leibler
divergence between the joint probabilities of the low-dimensional embedding and
the high-dimensional data.

Problems:
with dimensionality reduction

The curse of dimensionality


An exponential amount of information is being crushed into an approximately linear space.
Distance functions are far less meaningful in high dimensional space.

with t-SNE

Optimising a non-convex loss function.


Large data sets are computationally expensive. Calculating pairwise relationships requires O((kn)2)
operations. Thus it is just not possible to do t-SNE embeddings for large datasets like ImageNet
which has 14 million images (n) and 196608 dimensions (k).

The Solution:
Soft dimensionality reduction
We plan to test the theory that:
Continuous or incremented (soft) dimensionality reduction will help preserve more structure in the data
when embedding data from a higher dimension.

Potential realisations of soft dimensionality reduction

Punishing data points that lie in the higher dimensions (3D and above) through the objective
function that we are optimising
Continuiously transforming the data (through folding and dekinking) into a 2D plane
Iteratively using t-SNE to project downward only a small number of fixed dimensions until we reach
2D

Evaluation (actually its another problem)

Benchmark datasets

Computational complexity

It is straightforward to compare the speed of two algorithms. Thus any improvements in speed
can easily be evaluated.

Visualisations are subjective

in machine learning. MNIST, CIFAR 10-100, COIL 20-100,

Changes to the actual visualisation are inherently subjective. This makes evaluation very hard.

Autoencoders

Will also be used to help validate and generalise the soft dim reduction technique. To show
that it is not just a property of t-SNE.

Details
Process

Iterate

Literature review and ideas


Model design/creation
Evaluation

Write report(s) throughout the year

Timeline

Important dates

Prepare preliminary report 6/6/16. Finish 13/6/16.


Prepare final report 19/9/16. Finish 26/9/16.