Anda di halaman 1dari 6

Statistical Analysis and Predictive Model for Horde

Attack Size in Die2Nite


Iacobus1, The Data Masseuse

Abstract
One of the principle challenges to the browser based zombie MMO Die2Nite is the uncertainty regarding
future zombie attacks in both the short (tonight) and long (next several days) term. We report a
characterization of 403 reported attack sizes from 74 towns and a predictive model for the attack size.
We found a robust estimate of attack size is given by 5.95*Day2+12.54*Day+6.60 with a moving 4 day
cyclic nature.

Introduction
Die2Nite, a browser based zombie survival MMO (http:\\www.die2nite.com by MotionTwin) poses a
unique logistic problem for towns, especially during later game stages. Principally, the game requires the
management of water rations, without which the town will die, and defense points to protect against
impending zombie attacks. The water rations, coming only from known sources, are easily accounted for
by town’s when deciding upon construction and daily voting and therefore do not pose significant
uncertainty. On the other hand, the impending attack size, which gives the number of defense points
needed to survive, is responsible for the majority of uncertainty in planning. By knowing basic
information about the typical size of an impending attack as given by the descriptive statistics and the
nature of the distribution allows for more informed decision-making processes and longer term planning
regarding the balance between water and defense. Additionally, knowing the underlying nature of the
attack size allows for better planning for “spikes” and other adverse events.

Methods
We obtained a dataset of 403 reported attack sizes from 74 different towns from the Die2Nite Wikia
website (http://die2nite.wikia.com/wiki/Zombie_Attack_Strength) as shown in figure 1A. We generated
descriptive statistics and checked for auto-correlation between the size of the attack on day(d) and
day(d-1).

A predictive model was obtained by obtaining the best estimate for lambda using a Box Cox
transformation and performed a simple linear regression on the linearized dataset. Following the
regression, we examined the residuals for any correlation between each day with a REML mixed effects
model using Toeplitz covariance structure.

All data analysis was performed using SAS 9.2 (SAS Institute, Cary, NC).

1: Member of Die2Nite Group “Prairie of Terror” found at http://groups.google.com/group/die2nite-pot


Descriptive Data
For this analysis, we were seeking to characterize the nature of the data and to estimate rough ranges
for the size of the attack as shown in table 1. Over the first several days (day <= 5) the confidence
intervals are very narrow and suggest a high utility in predicting the size of the attack, however, the
limited sample size of larger day values results in a meaningless confidence interval and is of marginal
value to a town’s planning. Furthermore, due to a poor linear fit and a slight exponential shape
regression of the raw data was hampered.

Also of note, when the data was plotted such that AttackSize(Day(d)) vs AttackSize(Day(d-1)) to see if
there was a trend between the attack on a given day and the scale of the attack on the preceding day
there was no visible trend (data not shown).

While this data does provide very informative information for the planning of a town during the early
stages of the game, the utility is drastically reduced at the late stages as the result of both small sample
size and a greater need for information in the planning during late stages. In other words, the chief
limitation to this approach is that it is contingent on the existence of the data and is descriptive, not
predictive.

Table 1: Estimated Horde Strength. All estimates are for the attack at the end of the day and the CI is obtained using a 2-tailed
t distribution with n-1 degrees of freedom. After day ten, the sample size becomes too small and the error too large for any
informed predictions.

Day Standard 95% CI Known Extremes


n Mean
Number Error of Mean Min Max Max Min
1 71 25 0 24 26 7 39
2 60 55 3 49 61 24 139
3 54 97 5 87 107 52 208
4 48 149 5 138 160 92 240
5 45 223 8 207 239 143 349
6 38 311 11 289 333 217 529
7 31 422 18 385 459 279 685
8 18 520 22 474 566 411 750
9 13 596 29 534 658 449 801
10 10 794 41 702 886 611 1003
11 5 900 79 682 1118 662 1113
12 2 986 96 0 2208 890 1082
13 2 949 47 356 1542 902 995
14 2 1189 114 0 2636 1075 1302
15 2 1411 64 602 2220 1347 1474
16 1 1400 . . . 1400 1400
17 1 1550 . . . 1550 1550

2
Predictive Model
First, we assumed that there is an underlying structure to the data and each days attack is not randomly
calculated per town as this would be too computationally expensive for the developers to support a
game of this scale. We assumed that there would be an underlying data trend that is used to obtain a
value that is then subjected to varying effects of “random” noise to obtain the final horde attack size per
town. Secondly, we assumed that the model used to find the value that is modified to create variation
among towns has to be computationally simple at the basic level. Note that a basic equation can have
seemingly complicated outcomes.

In this case, a Cox Box analysis suggest a best fit lambda of 0.5 indicating a linear trend that is squared to
obtain the actual horde count. If this holds, the data can be linearized to increase fit by plotting the
square root of the zombie count against the day number. This relationship holds true for our data, as
shown in figure 1B.

After obtaining the linearized data as shown in figure 1B, we Table 2: 95% TL For Expected Zombie
applied a simple linear regression to obtain the square root of the Counts
Day Lower Limit Upper Limit
expected zombie count on a given day with an R2 of 0.9477
(Equation 1). Squaring this formula yields a 1 to 1 predictive 1 3 71
2 16 117
equation for zombie counts (Equation 2).
3 42 176
4 80 247
√ ( )
5 130 330
6 191 424
( )
7 264 531
( )
8 350 649
9 447 780
Furthermore, the regression allows for the creation of 95%
10 556 922
tolerance intervals, or the range in which we would expect 95% of
11 676 1077
attacks to follow in. A plot on the original data with model fit and 12 809 1244
95% tolerance points are shown in figure 2. Visually, it appears the 13 953 1422
model is a highly robust estimator of attack size. Estimates of the 14 1109 1613
95% tolerance intervals are provided in table 2. It should be noted 15 1277 1816
that the variance and trend are only linear on the square root plot 16 1457 2030
17 1648 2257
and as such the variance widens on the original data scale.

Cyclic Nature?
Analysis for an auto- correlation between the size of attacks on given days using REML and found the
estimate for the R matrix was not significantly different from zero with two important exceptions. There
appears to be a direct relationship between days separated by 4 days (i.e. days 1,5;2,6;3,7;etc) with the
covariance parameter estimated at 1.1685 (tested against H0=0 with a p<0.0001) and again at 8 days
with estimate of 1.0026 (p=0.0081).

3
Zombie Attacks - All Towns
Zombies
1600

1500

1400
1300 A
1200
1100

1000

900
800

700

600
500

400
300

200

100
0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Zombie Attacks Attack
- Transformed
TownName
SqrtZombies 1 2 3 4 5 6 7 8 9 10 11
40 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74

B
30

20

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Attack

Figure 1: Scatter Plot of Reported Horde Sizes (A) and a Square Root Transformation (B). The data shown in (A) represents the
dataset used in the descriptive statistics of attack size and the formation of the model used to predict attack size and variance.
Note the lack of a linear trend and the slight increase in the tangent slope at higher day values. Different towns are coded using
difference colored and shaped markers and are consistent across the days. The data shown in (B) is the resulting linearization of
the data in (A) obtained by using a lambda of 0.5 as estimated by the Cox Box transformation. Vertical lines represent +/- 2
standard deviations and appear to be constant across the days. Formally, this would suggest the model used in Die2Nite is
monotonic increasing trend with a constant variance.

When the data was plotted (not shown) for sample towns significant changes were seen in terms of a
towns “zombie burden rank” (i.e. most zombies=1 to least zombies=n) on any given two consecutive
days or any two days that were not spaced 4c (where c is any integer) apart. However, regardless of
mixing during the interim days, on days with an interval of 4c there was remarkable restoration of the
ranking seen on the prior multiple days.

4
Zombie Attacks - Transformed
UpZombie
3000

2000

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Attack

Figure 2: Total Number of Zombies vs. Day of Attack. The uppermost blue and lower most red points represent the limits of
the 95% TL and give the range in which we expect to find 95% of all sample data points. The sample points from the dataset are
given in green with the standard error given by the green bars.

It should be noted that while the data suggests a cycle with a period of 4, it is not conclusive. Further
time series analysis needs to be done to fully elucidate the presence or lack of a cyclic nature to horde
sizes. Also, the estimate for an interval of 8 is only borderline significant suggesting a possible
dampening of the cycle. However, the data for towns with the number of days needed to explore an
interval of 8 is very limited.

Conclusion
The data analyzed for this report strongly suggest that there is an underlying nature to the horde size
and the actual count is some slight variation of the sum from equation 2. Having these data, however,
provide for a more informed decision making process for towns regarding resource and population
management.

We suspect that the reported model is highly robust given that the game likely calculates horde sizes
based on a simple linear equation. The constant variance in the equal stage of the game and the
preservation of that trend into later more uncertain days suggests there is a finite range for a given
horde size on any given day. The tolerance intervals can be read as likely near-threshold cutoffs for the
allowed range on any given day. Furthermore, the data suggests this model continues into hereto
unreported day counts.

Finally, the cyclic nature of the data and the very likely covariance of days separated by intervals of 4c
encourage further time series analysis to determine whether or not the ebb and flow of horde counts
are or are not randomly determined. If a cyclic nature is found and characterized, it may be possible for

5
towns to further pinpoint their predictions of future horde sizes and accurately prepare for and predict
count spikes and other abnormalities.

Anda mungkin juga menyukai