Anda di halaman 1dari 7

Case Study Competition (2015)

Nathan Esau, Justin Kwong


July 1, 2015

Contents
Competition Rules . .
Background . . . . . .
Measurement . . . . .
Analysis . . . . . . . .
Training Sets . . . . .
Dataset Characteristics
Remarks . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

2
2
2
3
3
3
6

Demographic features in the Uben Games Dataset . . . . . . . . . . . . . . .


Metrics in the Uben Games Dataset . . . . . . . . . . . . . . . . . . . . . . .
Event features in the Uken Games Dataset . . . . . . . . . . . . . . . . . . .

4
4
6

List of Tables
1
2
3

The website for the case study competition can be found at


http://people.stat.sfu.ca/~dac5/CaseStudy2015/Welcome.html
The purpose of the case study competition is to predict engagement, interactions and revenue
from players in a mobile social game from Uken.
Teams may requuest to be paired with a graduate student mentor. The case study competition will culminate in a poster presentation taking place in Mid-September at the Department
of Statistics and Actuarial Science fall reception.

Competition Rules
Posters of the results will be presented by teams for judging at the Department of Statistics
and Actuarial Science and fall reception.
The data owners would like to see what you do with their data, so you will need to email a
copy of your poster and code for Dr. Dave Campbell to pass along to them prior to judging.
Code sharing also ensures that analyses are reproducible, so its a good statistical practice.
The use of a repository such as github is allowed, but public online repositories must not
contain the dataset.

Background
Video game analytics is an important area for statistics, and this case study has been developed by Uken Games. Freemium games are free to download and play, but users have the
option of making in-app purchases to enhance or accelerate their performance. After the app
is downloaded, users typically work through a tutorial stage, then move on to open player
where users earn some in-game currency. When users have enough in-game currency they
can progress to the next stage. In-game currency can be earned by completing tasks or it can
be purchased. Users have the option of connecting their game account to Facebook, which
unlocks the ability to interact with friends in-game by sending and receiving gifts. The case
study is modified from a Statistical Society of Canada case study from 2014 with permission
from Uken games.

Measurement
Three target variables are available for each user: revenue, engagement (time played),
and retention (does the player return to the game after a number of days).
Times are recorded for events such as when the user makes different in-app purchases, sends
or receives gifts, or unlocks different achievements. Demographic data is also available for
some users. For privacy reasons some of the variables have been masked. For example, the
total length of the observation time period recording when achievements were unlocked is
2

the same for all users but the exact length of that time period has been removed. Also,
revenue and engagement numbers have been rescaled.

Analysis
Below are some questions to guide the analysis:
(i) Can you come up with a good way to visualize this data?
(ii) What are some of the explanatory insights that you can obtain from this data?
(iii) How do the user demographics and user actions affect the response variables (engagement, revenue, retention)? Which are the strongest predictors? What interactions are
present? The data is split into training and testing datasets.
(iv) In the mobile gaming industry, the golden standard for evaluating product changes is
through randomized control-treatment experiments (often called A/B tests). Common
metrics to test include revenue and retention, which are included in this dataset. Based
on your analysis of the data, what A/B tests would you follow up with if you had access
to our full data stream? What would be the experimental design? (including the sample
size, experimental groups, and the statistical insights you would use) 5. What other
insights can you provide?

Training Sets
(Wikipedia) A test set is a set of data used in various areas of information science to assess the
strength and utility of a predictive relationship. Test sets are used in artificial intelligence,
machine learning, genetic programming and statistics. In all these fields, a test set has much
the same role.
Regression analysis was one of the earliest such approaches to be developed. The data used
to construct or discover a predictive relationship are called the training data set. Most
approaches that search through training data for empirical relationships tend to overfit the
data, meaning that they can identify apparent relationship in the training data that do not
hold in general. A test set is a set of data that is independent of the training data., but that
follows the same probability distribution as the training data. If a model fit to the training
set also fits the test set well, minimal overfitting has taken place. A better fitting of the
training set as opposed to the test set usually points to overfitting.

Dataset Characteristics
The distributions of revenue and engagement are very heavy tailed. Most users dont make inapp purchases. Most of the players who make in-app purchases do not make large purchases.

However, the small number of users who make large in-app purchases account for a large
part of the revenue, in a sense subsidizing the game for the other players.
The game economy is closed - the conversion price between real currency and in-game currency is controlled by the game company, as well as the number and type of available of
purchases.
For each user, measurements are taken between the time they install the app and until
a certain number of days has passed. For privacy reasons, we cannot reveal the exact
observation period, but note that the length of the observation period is the same for every
user.
The dataset consists of a single table, user stats.csv, with one record for each user. There
is a header containing the variable names listed below. There are 300,000 rows, where the
response variable from 50,000 rows were withheld back as a validation data set.
The data includes the following columns:
integer uniquely identifying each user

user id

in the format of year, month, date

install date

(ipad, iphone). What platform does a user install on?

platform
platform 2 install date

fb connect
country
gender

Date when a user installs on a second platform (NA if


they only install on one platform throughout the observation period)
date when user connects their game account to Facebook
(NA if they dont do so during the observation period)
string specifying the country the user is from (NA if
unknown)
(male, female, NA). Gender is known if and only if the
user connects to Facebook. Note that if a user connects
to Facebook after the observation period, their gender
is known, but fb connect will be NA.

Table 1: Demographic features in the Uben Games Dataset

return player
engagement
revenue

(0,1): 1 if the player plays a session on the last day of


the observation period, 0 otherwise
number of minutes the game was played during the observation period
amount of money the user spend during the observation
period
Table 2: Metrics in the Uben Games Dataset

tutorial completed
first game played
first type 1 game

first
first
first
first
first

type 2 game
type 3 game
type 4 game
win
bonus

first special purchase


first purchase A
first
first
first
first
first
first
first
first

purchase
purchase
purchase
purchase
purchase
purchase
purchase
gift sent

first
first
first
first

gift 2 sent
gift received
gift2 received
uken gift received

first collection

first prize A

B
C
D
E
F
G
H

Date when user completes the tutorial.


Date when user plays their first round of the game (note
that some users quit before ever starting a game)
There are four variations of the game, each with different
intensity. Each round, a user chooses what variation
they would like to play. first type 1 game is the date of
the first time a user played the first variation.

date of the first round the player won


when the user accumulates enough energy, they can exercise a bonus which allows them to win a game faster
and accrue more ingame currency. first bonus is the
date when this first happens.
date of first in-app purchase of any kind that the user
has made.
date of first of first in-app purchase of type A that the
user has made

If a user connects their account to facebook, they can


send and receive gifts with their facebook friends. There
are two types of gifts they can receive (corresponding to different ingame currency). The dates in which
these events first occur are coded by first gift sent,
first gift received

Our company can also send a gift to the player (for example, during Holiday promotions). This feature indicates the date of the first such gift they received from
us.
users have the option of collecting some artifacts in the
game. Once enough artifacts are gathered, a collection is
complete, and the user gets a bonus of virtual currency.
first collection is the date when this first happens.
In each round player, a user may win of three prizes;
prize A, prize B, or price C
5

first prize B
first prize C
stage1
stage2
stage3
stage4
stage5
stage6
stage7
Training Validation

date when user first plays stage 1

0 if it is part of the training data set and 1 if the response


variables (return player, engagement, and revenue) were
withheld as part of the validation dataset

Table 3: Event features in the Uken Games Dataset

Remarks
(i) The revenue and engagement numbers have been rescaled
(ii) Stage 1 becomes available as soon as the user completes the tutorial. Subsequent stages
become available as a player plays rounds on the stages available to them. A player
may choose whichever unlocked stage they like, as it possible, for example, that they
unlock play stage 4 without ever playing stage 3./
(iii) For all event features, NA indicates that the event did not occur in the observation
period.
(iv) In-app purchases provide users with virtual currency that allows users to continue
playing when they run out of currency, or to increase the intensity of the game. They
can also be used to change the game aesthetics.

Index
stage6, 6
stage7, 6

engagement, 4
fb connect, 4
first bonus, 5
first collection, 5
first game played, 5
first gift2 received, 5
first gift 2 sent, 5
first gift received, 5
first gift sent, 5
first prize A, 5
first prize B, 6
first prize C, 6
first purchase A, 5
first purchase B, 5
first purchase C, 5
first purchase D, 5
first purchase E, 5
first purchase F, 5
first purchase G, 5
first purchase H, 5
first special purchase, 5
first type 1 game, 5
first type 2 game, 5
first type 3 game, 5
first type 4 game, 5
first uken gift received, 5
first win, 5

Training Validation, 6
tutorial completed, 5
user id, 4

gender, 4
install date, 4
platform, 4
platform 2 install date, 4
return player, 4
revenue, 4
stage1,
stage2,
stage3,
stage4,
stage5,

6
6
6
6
6
7