Anda di halaman 1dari 1

Title: Aesthetic Ranking of Images

Problem: Given a set of hotel images, train a deep learning classifier to rank the
images according to the aesthetic appeal of their content.

Users seek out hotel photos before making a decision to book. To ease the decision-
making process we can rank the hotels based on the aesthetic appeal of their
images.

The past decade has seen a resurgence in deep learning, from a niche field of
research to a major part of many industrial applications. One of the significant
impacts has been in the field of Computer Vision using Convolutional Neural
Networks (CNNs). CNNs have become defacto for most image
recognition/classification/detection tasks and have not only outperformed
conventional machine learning techniques but also humans for a lot of these tasks.

Following an extensive literature review and dataset collection, we could not find
a perfect labelled dataset for our problem of hotel room image classification task.
Hence we took the hotels-50k dataset with around 250,000 images and manually
labelled around 5000 images for our task.

There have been many successful neural network architectures in recent years for
image classification. The various factors we considered for choosing the model were
the number of parameters, it’s performance across different datasets, its size, and
the availability of pre-trained weights. We used the InceptionV3, which stood 2nd
in the 2015 ImageNet Challenge and was a significant effort into reducing the
number of parameters. This network uses a series of inception modules, which are
basically mini models comprising of different-sized convolution filters operating
on the same input feature map. The last layer of this network is a softmax layer
giving probabilities for each category which adds up to 1.

We initialized our model with pre-trained weights trained on Imagenet dataset. This
technique is called transfer learning, where a model developed for a task is reused
as the starting point for the model on the second task. We use batch normalization
during training, wherein the activations are scaled at every layer of the network
for each batch. These techniques help with speeding up the training process
significantly.

During training we perform data augmentation of our images, meaning we slightly


distort our images with random cropping, changing contrast, and color manipulation.
It helps the model to generalize better, especially in our case where we have
performed over-sampling.

To do:
There are two challenges with this dataset. Firstly, the dataset was highly
imbalanced, with “bedroom” making up 70% of the total images, compared to bathrooms
with only 10%. We are going to experiment with two data sampling techniques, one
with only under-sampling the overrepresented categories and another with both
under-sampling the overrepresented categories and oversampling underrepresented
categories. We based this approach on the ideas from the paper "A systematic study
of the class imbalance problem in convolutional neural networks".

Anda mungkin juga menyukai