Contents
Competition Rules . .
Background . . . . . .
Measurement . . . . .
Analysis . . . . . . . .
Training Sets . . . . .
Dataset Characteristics
Remarks . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
2
3
3
3
6
4
4
6
List of Tables
1
2
3
Competition Rules
Posters of the results will be presented by teams for judging at the Department of Statistics
and Actuarial Science and fall reception.
The data owners would like to see what you do with their data, so you will need to email a
copy of your poster and code for Dr. Dave Campbell to pass along to them prior to judging.
Code sharing also ensures that analyses are reproducible, so its a good statistical practice.
The use of a repository such as github is allowed, but public online repositories must not
contain the dataset.
Background
Video game analytics is an important area for statistics, and this case study has been developed by Uken Games. Freemium games are free to download and play, but users have the
option of making in-app purchases to enhance or accelerate their performance. After the app
is downloaded, users typically work through a tutorial stage, then move on to open player
where users earn some in-game currency. When users have enough in-game currency they
can progress to the next stage. In-game currency can be earned by completing tasks or it can
be purchased. Users have the option of connecting their game account to Facebook, which
unlocks the ability to interact with friends in-game by sending and receiving gifts. The case
study is modified from a Statistical Society of Canada case study from 2014 with permission
from Uken games.
Measurement
Three target variables are available for each user: revenue, engagement (time played),
and retention (does the player return to the game after a number of days).
Times are recorded for events such as when the user makes different in-app purchases, sends
or receives gifts, or unlocks different achievements. Demographic data is also available for
some users. For privacy reasons some of the variables have been masked. For example, the
total length of the observation time period recording when achievements were unlocked is
2
the same for all users but the exact length of that time period has been removed. Also,
revenue and engagement numbers have been rescaled.
Analysis
Below are some questions to guide the analysis:
(i) Can you come up with a good way to visualize this data?
(ii) What are some of the explanatory insights that you can obtain from this data?
(iii) How do the user demographics and user actions affect the response variables (engagement, revenue, retention)? Which are the strongest predictors? What interactions are
present? The data is split into training and testing datasets.
(iv) In the mobile gaming industry, the golden standard for evaluating product changes is
through randomized control-treatment experiments (often called A/B tests). Common
metrics to test include revenue and retention, which are included in this dataset. Based
on your analysis of the data, what A/B tests would you follow up with if you had access
to our full data stream? What would be the experimental design? (including the sample
size, experimental groups, and the statistical insights you would use) 5. What other
insights can you provide?
Training Sets
(Wikipedia) A test set is a set of data used in various areas of information science to assess the
strength and utility of a predictive relationship. Test sets are used in artificial intelligence,
machine learning, genetic programming and statistics. In all these fields, a test set has much
the same role.
Regression analysis was one of the earliest such approaches to be developed. The data used
to construct or discover a predictive relationship are called the training data set. Most
approaches that search through training data for empirical relationships tend to overfit the
data, meaning that they can identify apparent relationship in the training data that do not
hold in general. A test set is a set of data that is independent of the training data., but that
follows the same probability distribution as the training data. If a model fit to the training
set also fits the test set well, minimal overfitting has taken place. A better fitting of the
training set as opposed to the test set usually points to overfitting.
Dataset Characteristics
The distributions of revenue and engagement are very heavy tailed. Most users dont make inapp purchases. Most of the players who make in-app purchases do not make large purchases.
However, the small number of users who make large in-app purchases account for a large
part of the revenue, in a sense subsidizing the game for the other players.
The game economy is closed - the conversion price between real currency and in-game currency is controlled by the game company, as well as the number and type of available of
purchases.
For each user, measurements are taken between the time they install the app and until
a certain number of days has passed. For privacy reasons, we cannot reveal the exact
observation period, but note that the length of the observation period is the same for every
user.
The dataset consists of a single table, user stats.csv, with one record for each user. There
is a header containing the variable names listed below. There are 300,000 rows, where the
response variable from 50,000 rows were withheld back as a validation data set.
The data includes the following columns:
integer uniquely identifying each user
user id
install date
platform
platform 2 install date
fb connect
country
gender
return player
engagement
revenue
tutorial completed
first game played
first type 1 game
first
first
first
first
first
type 2 game
type 3 game
type 4 game
win
bonus
purchase
purchase
purchase
purchase
purchase
purchase
purchase
gift sent
first
first
first
first
gift 2 sent
gift received
gift2 received
uken gift received
first collection
first prize A
B
C
D
E
F
G
H
Our company can also send a gift to the player (for example, during Holiday promotions). This feature indicates the date of the first such gift they received from
us.
users have the option of collecting some artifacts in the
game. Once enough artifacts are gathered, a collection is
complete, and the user gets a bonus of virtual currency.
first collection is the date when this first happens.
In each round player, a user may win of three prizes;
prize A, prize B, or price C
5
first prize B
first prize C
stage1
stage2
stage3
stage4
stage5
stage6
stage7
Training Validation
Remarks
(i) The revenue and engagement numbers have been rescaled
(ii) Stage 1 becomes available as soon as the user completes the tutorial. Subsequent stages
become available as a player plays rounds on the stages available to them. A player
may choose whichever unlocked stage they like, as it possible, for example, that they
unlock play stage 4 without ever playing stage 3./
(iii) For all event features, NA indicates that the event did not occur in the observation
period.
(iv) In-app purchases provide users with virtual currency that allows users to continue
playing when they run out of currency, or to increase the intensity of the game. They
can also be used to change the game aesthetics.
Index
stage6, 6
stage7, 6
engagement, 4
fb connect, 4
first bonus, 5
first collection, 5
first game played, 5
first gift2 received, 5
first gift 2 sent, 5
first gift received, 5
first gift sent, 5
first prize A, 5
first prize B, 6
first prize C, 6
first purchase A, 5
first purchase B, 5
first purchase C, 5
first purchase D, 5
first purchase E, 5
first purchase F, 5
first purchase G, 5
first purchase H, 5
first special purchase, 5
first type 1 game, 5
first type 2 game, 5
first type 3 game, 5
first type 4 game, 5
first uken gift received, 5
first win, 5
Training Validation, 6
tutorial completed, 5
user id, 4
gender, 4
install date, 4
platform, 4
platform 2 install date, 4
return player, 4
revenue, 4
stage1,
stage2,
stage3,
stage4,
stage5,
6
6
6
6
6
7