Anda di halaman 1dari 7

A Look into the Algorithms of a Movie Recommendation System

Kristen Boscarino Cristobal Forno Kyle Franke

Adam Nieto

Binghamton University
Computer Science Dept.

Abstract see which one provides the best recommendations given a

person’s past movie preferences and perceived personality.
In this paper, we will design and develop three machine However, recommendations are not a science. People’s
learning models: a content-based recommendation model tastes in movies vary and no recommendation system can
using KNN, a content-based recommendation model using predict exactly what a customer might want to watch on a
a modified KNN, and a collaborative filtering model. These given day. Since recommendations are based on the clients
models will be trained with the MovieLens 1 million dataset opinions, quantifying a recommendation system’s results
provided by the GroupLens Research Group from the Uni- can be difficult. We will use various metrics to assess the
versity of Minnesota. Each model will then be analyzed to quality of our recommendations.
determine which one provides the most accurate recommen-
dations using various metrics such as accuracy and recall
as well observational support. 2. Existing Solutions
2.1. Popularity of Items
1. Introduction There are several different ways to design a recommen-
1.1. The Problem dation system. The simplest solution is to recommend the
most popular items. This approach works best for news
With the boom in ecommerce in this day and age, rec- companies whose top priority is to provide breaking new
ommendation systems are aiming to replace the “guy be- coverage for their customers. The limitation to this ap-
hind the counter.” Big tech companies like Google, Ama- proach is that there is no personalization involved. An ap-
zon, Netflix, and Spotify use recommendation systems to proach like this would not work for a movie recommenda-
optimize revenue and increase overall customer satisfaction. tion system because it would just show the same movies to
A recommendation system’s ultimate goal is to personalize everyone, rather than taking a user’s movie preferences into
and identify relevant content to a customer. account [2].
1.2. Interest
2.2. Using a Classifier
Movies are a big part of human culture with intricate
connections to history, human identity, and emotion. The Another solution to this problem is using a classifier to
film industry is massive; the box office grossed over $40 make a recommendation. To classify, the algorithm would
billion in 2017 alone, and it has only grown since the mas- take in features about the user and a movie and output a 1 or
sive growth of the Internet [5]. 0 depending on the user’s taste. This solution proves better
Movie recommendation systems focus on exposing cus- than just recommending the most popular movies because
tomers to movies and television shows that they enjoy and it incorporates personalization. However, this solution is
have never heard of before. Often the recommendations fit limited because as the number of users and movies in the
a person’s movie tastes and personality. In this project, we system increase, the model will become exponentially com-
aim to test several different machine learning methods to plicated and unscalable [2].

2.3. Content-Based System our data, we use the graphing library matplotlib.
Instead of the previous two algorithms, most companies 4. Dataset
employ either content-based algorithms or collaborative fil-
tering. Content-based algorithms use the characteristics of a We use the MovieLens 1M dataset for this project, which
product to generate recommendations. This algorithm takes is data collected by the GroupLens Research Project at the
in many features about an item or movie and tries to find University of Minnesota. It contains 1,000,209 ratings be-
lookalikes for that item, meaning items with similar feature tween 1 and 5 from 6,040 different anonymous users on
values. This algorithm is common among online movies 3,900 movies [4]. Each user has features of their age, gen-
stores and therefore works well for movie recommendation der, occupation, and zip code. Each movie has features of
systems because one can easily determine the features of a their genre information. Each rating has information about
movie with only a few dimensions such as genre, popularity, the user, the movie, the rating, and timestamp. MovieLens
or release date [2]. While this algorithm is most commonly is one of the most common datasets for building recom-
used for movie recommendation systems, it tends be over- mendation systems due to the amount and the quality of the
specialized. Meaning, it never recommends items outside data it provides. Additionally, we use this dataset because
of a user’s past movie history. Essentially, it pigeon-holes it comes preprocessed. To preprocess our data, we just put
recommendations to a user’s past interests. As this type of each of the files into pands dataframes, add descriptive
recommendation system does not take other user profiles columns for the user’s job and age descriptions, and remove
into account, it cannot exploit the quality judgment of one’s unneeded columns like timestamp. We use this dataset
peers [1]. to implement the collaborative filtering and content-based
recommendation systems.
2.4. Collaborative Filtering System
Unlike content-based recommendation algorithms, user- 5. Content-Based Recommendation
to-user collaborative filtering recommendation algorithms 5.1. Data Selection
exploit another user’s tastes to help pinpoint recommenda-
tions for the user at hand. In other words, it finds a lookalike As the dataset we obtained contains up to 6040 users and
customer rather than a similar product to make the recom- 1,000,209 ratings, a strategic data selection was employed
mendation. A limitation of this is that it takes a lot of time to help provide the best outcome for our model. While we
and resources to be able to compare every user profile to would have liked to use the entire dataset, with the limited
each other [2]. One of the methods used to combat this lim- amount of computational power, we were forced to reduce
itation is clustering. By clustering the user profiles, we are our training and testing sets in order to create the model in
able to simplify the huge dataset into a small list of clusters. a time-efficient manner. We simplify our dataset by using
a threshold to help select users. The threshold measures
3. Our Research how many movies a user has rated. If a user has not rated
enough movies, they are not included in the experiment. We
For our project, we employ three machine learning mod- tested with various sizes of the threshold, seen in the results
els and compare and contrast their effectiveness. The first later in this paper. However, most of our experiments were
model we deploy is a K-nearest neighbors (KNN) content- performed on users with at least 500 ratings in the system.
based recommendation model. The second model we de-
ploy was a modified KNN content-based recommendation
5.2. KNN Model
model. For our third model we used user-to-user collabora- For this model we use binary supervised classification
tive filtering. We want to compare and contrast these models where we predict a 1 (would enjoy movie) or 0 (would
with each other in order to understand how they work and not enjoy movie) based upon whether a user would enjoy
what differences or similarities they share. a movie or not.
With these three approaches we will use both supervised We use K nearest-neighbors to help provide our classifi-
and unsupervised learning. We also explore different simi- cation, because it is a non-linear model that takes similarity
larity algorithms and metrics such as Euclidean distance and of items into account, which is exactly what we need for a
cosine similarity to measure the similarity between movies recommendation system. KNN algorithm helps to classify a
or users. data point by looking at its K nearest data points. It predicts
For the implementations of these models we use the pro- the class of a datapoint based upon the majority of its neigh-
gramming language, Python. Additionally, we use the ma- bors’ classes. It calculates which datapoints are its nearest
chine learning libraries scikit-learn, and lightFm. neighbors using a distance or similarity function. Note that
We also use the panel data library pandas to help manipu- KNN is instance-based learning algorithm. This means that
late, process, and display our data for analysis. For graphing KNN trains by comparing the data with previously known

data in memory [3]. In this case, that previously known data
is the distance between movies in the dataset.

5.2.1 Training
We split up the dataset into 80% training to 20% testing for
every user’s ratings. We provide predictions for 399 users.
As we were limited by time and computational resources,
we had to use a random subset of our users to conduct our
experiment, as explained in Section 5.1. To perform our
KNN classification, we classify each movie in the user’s
training set that was highly rated as a 1, and every other
movie as a 0.
Since KNN is instance-based, we trained by using cosine
similarity to measure the distance between every movie in
Figure 1. Hyperparameter K vs Accuracy
the dataset, using genre as our distinguishing feature. Co-
sine similarity is the measure between two non-zero vectors
of an inner product space that measures the cosine of the This inverse relationship is due to the fact that we only have
angle between them. The smaller the angle, the higher the one feature, genre, for the movies, which makes our data
similarity. Cosine similarity is commonly used for string really biased. Therefore, as our value of K increased, we
vectors to calculate the term frequency (TF) of words in a underfit our data, making our classifications less accurate.
document [6]. When K=3 and K=10, our recommendation system per-
In our experiment, cosine similarity calculates the TF of forms the best. While K=3 performs a little better, we
the movie genre. Since movies can have multiple genres, choose to do the rest of our experiments with K=10 because
we measure the TF of each genre in one movie with each it compares our training set with the more movies in the
genre in another. The more genres the two movies have in Vector Space. When K=3, we feel we are not comparing
common, the higher their similarity score will be. For our with enough data.
KNN model, this means that the movies with the highest In further data exploration and analysis, we saw that ma-
similarity scores will be nearest to each other in the Vector jority of the movies were Drama films, as seen in Figure
Space, making them nearest neighbors. 2. This is an accurate representation of majority of movies
in the box office throughout time. However, the high num-
5.2.2 Testing ber of movies with this one genre makes it hard to predict
movies accurately for that genre, as the vast amount of them
We test our KNN content-based model by classifying each
may not have been seen by the user. The pollution of drama
movie in the user’s testing set. We compare each movie in
films in our data also made it more biased, leading us to con-
the testing set with its k-nearest neighbors, which are de-
clude again that our model was underfitting our data. We
termined by the similarity scores from the training. If the
propose for future studies to add more features to movies
majority of those neighbors were highly rated in the train-
such as actors, directors, and scripts to provide better pre-
ing set (classified as a 1), we classify this movie as a 1.
Otherwise, it is classified as a 0.
The movies in the testing set with high ratings, meaning
a rating above a 2.5, should be classified as a 1. The movies 5.2.4 Hyperparameter: Ratings Threshold
with low ratings in the testing set should be classified as a
0. We then calculate the accuracy of our predictions based Many users have few (1-500) ratings in the dataset. With
on how many movies in the testing set were correctly clas- 6,040 movies in the dataset, this sparsity makes our recom-
sified. mendations less accurate for these users. Thus, to make our
dataset more dense, we remove any users who have less than
a certain threshold for a number of ratings. As seen in the
5.2.3 Hyperparameter: K
Figure 3, as the numbers of ratings necessary to be included
We use eight different values of KNN’s hyperparameter K: in the dataset increases, the accuracy of our model increases
3, 5, 8, 10, 12, 15, 20 in order to see which value of K gives as well. We can create more accurate recommendations for
us the most accurate recommendations. The results of these a user the more we know about them, because when they
experiments are shown in Figure 1. We noticed that as K rate more movies, we have more movies to compare against
increased, the accuracy of our recommendations decreased. in training set.

we recommend movies both inside and outside of the user’s
testing set. For each movie in the testing set, we calculate its
n-nearest movies from the cosine similarity function used in
KNN. We recommend all n movies, given that they are not
in the training set.
Like in KNN, this model is an instance-based learning
algorithm and does not actually need to be trained; instead,
it uses the previously computed cosine similarities between
movies. This modification makes it harder to quantify re-
sults, because we do not know if a user will like a movie
outside of their testing set. To measure our results, we cal-
culate recall to see how many of the movies we recommend
are actually in the testing set. The movies recommended
outside of the testing set cannot be measured, as we do not
have any results on whether the user likes the movie or not;
Figure 2. The amount of movies of each genre in the MovieLens however, it is interesting to observe what movies the model
1M dataset would suggest to the user and judge based on personal opin-
ion if we agree with the recommendation.

5.3.1 Training
Like in KNN, we split up the dataset into 80% training and
20% testing on a given user’s ratings. Similarly, we provide
recommendations for 399 of the users in the dataset due to
limited time and resources.
For every movie in the training set, we recommended n
movies that are most similar to it by using the previously
computed cosine similarities. In the end, we recommend a
total of N movies to the user, some of which are present in
the testing set and others which are not.

5.3.2 Testing
Figure 3. Number of Ratings Per User vs Accuracy
We use recall to measure the results of our modified-KNN
model. The formula for recall is (TODO insert recall equa-
5.2.5 Analysis and Results tion) where in our experiment, relevant documents are the
movies in the users testing set and retrieved documents are
Our KNN model performed best with K = 10 and minimum the movies we recommend. Recall allows us to see how
number of ratings needed = 500 for each user in many movies our model recommends that we know the user
the experiment. Our final accuracy rate was a 0.65. //TODO enjoys, because these movies are already highly rated in
- COMPARE WITH LIGHTFM . We suspect this overall their testing set. (TODO - Insert recall photo here)
performance could have been higher given there were more However, the movies outside of their testing set cannot
movie features in the dataset. Instead, we have a high bias accurately be quantified. Instead, we observe what other
causing us to underfit our data. types of movies our model recommends. In Figure 4 is a
Additionally, because in content-based recommendation very simple (user only has 20 ratings!) example to show
systems we only recommend based on past movie watched, how the modified-KNN model suggests movies outside of a
we are not able to take advantage of another person’s opin- user’s testing set.
ions in the dataset. This limits us to a small scope of movies The two movies outlined in blue overlap between the
we can recommend; thus, we try two other models. testing set and the recommendations. However, one can
observe that our model recommends other similar movies
5.3. Modified KNN Model
as well. For example, the user watched King Kong (1933)
For this supervised model, we modify the original KNN and our algorithm suggests they watch King Kong (1976),
algorithm to increase the number of movies recommended a movie that was not in their testing set. It is safe to say the
to a user. Rather than classify each movie in the testing set, user would also enjoy that movie based on their past history.

Figure 5. Number of Recommended Movies vs Recall

recall rate is good, we found that when more movies are

recommended, it can actually become overwhelming to the
user. Sorting through 200 recommended movies is similar
to having to browse the whole dataset. Thus, we found that
recommending about 25 movies was a sufficient amount,
as it has a recall of 0.54 and it is an acceptable amount of
movies for a user to choose from.
Additionally, note that as our model recommends more
movies, these movies can stray from their original training
set. Thus, the more movies we recommend, the more the
variance of our recommendations will increase.

5.3.4 Analysis and Results

Our final recall rate was an average of 0.49 when recom-
mending 25 movies to the users. (TODO- add lightfm)
While this result is not as high as we would like it to be,
we believe the users would still enjoy our recommenda-
tions based on observation. While still underfitting due to
only having one feature of genre to recommend off of, this
Figure 4. Results of Modified KNN on a user with small number change to our original KNN model increases the variance of
of ratings our model a little, as we can recommend movies outside of
a user’s testing set.

Additionally, our model recommends Star Wars: Episode 5.4. User-to-User Collaborative Filtering
1 given that the user has watched Star Wars: Episode IV. We use unsupervised user-to-user collaborative filtering
While the results of this model cannot all be specifically to provide movie recommendations for a given user based
quantified, it is clear that it can provide relevant recommen- upon the movies that a similar peer also enjoyed. In partic-
dations to a user. ular we use a Euclidean distance function to calculate how
similar a user is to another based on the movies they have
5.3.3 Hyperparameter: Number of Movies Recom- seen and the ratings they have given them. A user with the
mended smallest distance to a given test user is their most similar
user. By calculating these distances, we are clustering users
As seen in Figure 5, as the number of movies our model into groups based on how similar their movie ratings are.
recommends increases, the recall rate increases. This is due Note that since user-to-user collaborative filtering is un-
to the fact that there is a higher likelihood of the movies supervised we rely on observation to help us gauge how
we recommend also being in the testing set. While a higher well our recommendation system is performing. We also

create a similarity metric and analyze demographics to
see relationships between users in other areas besides just
movie ratings.

5.4.1 Data Selection

With our one million ratings and 6,040 users, we needed to
cut down the dataset as we were faced memory and com-
putational limitations. Instead of conducting recommenda-
tions on all 6,040 users we instead provided recommenda-
tions for 399 users. Our training set consists of all users in
our experiment besides one, which is the user we are testing.
Figure 6. Results of User-to-User Collaborative filtering on User
5.4.2 Training and Testing
This model, like KNN and modified-KNN, is an instance-
based model, and does not require actual training. Instead,
our training consists of calculating the Euclidean distance
between all users in the experiment.
Then, to make recommendations to a given user, we sim-
ply calculate which user has the highest similarity score to
them. Users with high similarity scores have watched and
highly rated a lot of the same movies.
For sake of simplicity, let our test user be User A and
our test user’s most similar user be User B. To make our
recommendations, we sort through both User A and User
B’s ratings, and any movies that User B has highly-rated
that User A has not seen, we recommend to User A.
View example in Figure 6). User A is an 18-24 year old
Figure 7. Most common demographics between a given test user
male. User B, with the most similar ratings to User A, is and their most similar peer by movie ratings.
a 35-44 year old male. User A’s past movie ratings can be
seen on the left. On the right are the movies our recommen-
dation system suggests to Use A based off of movies that metric to help gauge similarity in demographics. We de-
User B likes. User A likes Toy Story (1995). From User signed this metric to be a ratio of the demographics a given
B’s suggestions, User A is recommended with Toy Story 2 test user and their most similar peer matched out of the to-
(1999), the sequel to Toy Story. User A mostly watches Dis- tal demographics given. Looking at the Figure 8, one can
ney movies; however, besides Toy Story 2, User B was able see that majority of the time a given test user and their most
to provide User A with movie recommendations outside of similar peer were not often demographically similar. Inter-
Disney. This shows how User-to-User Collaborative Filter- estingly, we conclude that a demographically diverse popu-
ing takes advantage of other user’s opinions, rather than just lation of movie viewers often share the same movie tastes.
cornering User A into Disney films as a content-based rec-
ommender would have done. 5.4.4 Analysis and Results
While we could not quantify the results for our unsuper-
5.4.3 Analyzing Demographics
vised model, we were able to observe how one’s peers can
Since our user-to-user collaborative filtering found a similar affect recommendations. We found that with each of our
peer based upon a user’s movie ratings, we were curious to test users, the recommendations they receive are diverse, but
see how similar or dissimilar demographically a given test relevant to movies they have seen. We believe that our Col-
user and their most similar peer were to each other. Look- laborative Filtering model does have the potential to overfit
ing at Figure 7, we found that gender was the most com- our data because we recommend based off of a large amount
mon demographic shared between a given test user and their of varying people. Ratings have high dimensions, so fitting
most similar movie rating peer. The second most shared de- users to one another based off of these ratings leads to a
mographic was age and then the third most common demo- high variance. (TODO compare to lightfm??)
graphic was occupation. We also calculated a similarity Additionally, we find it interesting that those with similar


Figure 8. Similarity metric tally between a given test user and their
most similar peer by movie ratings.

movie tastes are not necessarily similar demographically. In

future work, we would like to base our Collaborative Filter-
ing model off of user features like gender, age, and occupa-
tion, rather than the user’s ratings.

6. Conclusion and Future Work

[1] Anna B. 2015. Recommending Recommendation
Systems Gab41. (December 2015). Retrieved October
16, 2018 from

[2] Aarshay Jain. 2017. Quick Guide to Build a

Recommendation Engine in Python. (May 2017).
Retrieved October 16, 2018 from

[3] Frnkranz, J. (n.d.). Instance-based Learning. Retrieved


[4] GroupLens Research Group. 2015. MovieLens 100K

Dataset. (September 2015). Retrieved October 16,
2018 from

[5] Chris Ortman. 2018. New Report: Global

Entertainment Market Expands on Multiple Fronts.
(April 2018). Retrieved October 15, 2018 from

[6] Perone, C. S. (2013, December 9). Machine Learning

:: Cosine Similarity for Vector Space Models (Part III).
Retrieved from