Theaters
3,130
3,115
3,480
617
924
4,404
Weeks
16
15
30
6
5
35
Mean
The mean is considered one of the most important measures of location for a variable as it provides a
measure of central location for the data (Anderson et. al., 2015). Looking at Table 1, we can find the mean
values for the 4 variables of interest. The data in Table 1 was produced with common Excel functions; for
example, to find the mean the =Average(DATA RANGE) formula was used. For the 4 variables relevant to
this report, we see that the mean or average opening gross was about $30 million dollars; the average total gross
was about $100 million dollars; the average number of theaters was about 3,000; and the average number of
weeks was 16. In other words, this means that on average, a movie stays in theaters for about16 weeks or 4
Standard Deviation
The standard deviation is the positive square root of the variance and is very nice as it has the same units
as the data for which it represents. The standard deviation is a measure of the spread of data and is computed by
taking the square root of the variance. Going further, the variance of a sample is computed by summing all of
the deviations about the mean, squaring that sum, and dividing it by n 1, where n is the number of data points
available. If one only knows the mean or median of a sample of data and does not know something like the
standard deviation or variation of the data, then one can have a very ill informed understanding of their data.
For example, if one knows that the mean of a data, the shape of the curve that represents all the values for that
data set can be bell shaped, bi-modal, or multi-modal and have the same mean. If one knows the standard
deviation though, alongside the mean, then one has a better understanding of how spread out the data is from
the mean.
For our movies, the standard deviations are as follows: $33 million opening gross, $96 million total gross, 617
theaters, and 6 weeks. Comparing the standard deviations for opening gross and total gross, we see that the
z-Score
5.34
3.94
3.69
3.34
The above movies listed are each outliers for the listed category. For example, Moonrise Kingdom has a z-Score
of -3.57, which implies that it was about 3.6 standard deviations below the sample mean. This is quite
understandable as it was only released in 924 theaters whereas the average for our data set is 3,130 theaters!
Astonishingly, despite being the movie that was shown in the least amount of theaters in our set, it was not the
last ranked movie in terms of the amount of money it made. It actually ranked 74th with its total gross of about
45 million dollars.
Correlation Coefficient
The correlation coefficient is computed by dividing the sample covariance by the product of the sample
standard deviation of x and the sample standard deviation of y (Anderson et. al., 2015). As one can see within
the definition of the correlation coefficient, it involves the comparison of 2 different variables and their
covariance and sample standard deviations. In essence, it is a tool which helps identify positive, negative, or no
linear correlation between the two variables. If the sample correlation coefficient is close to 1, then this implies
a strong positive linear relationship. If the sample is correlation coefficient is close to -1, then it implies a strong
The above correlation coefficients show that there is a strong linear relationship between the total gross of a
film and its opening gross, a weaker linear relationship between a films total gross and the number of theaters
it is viewed in, and an even weaker linear relationship between the total gross of a film and the number of weeks
it is in theaters. It is important to remember that we have only found positive linear relationships, which do not
mean that an increase in one of the variables causes an increase in the other. In other words, there is a
correlation between the two variables but not necessarily the idea that one causes the other. To bolster this data,
you will find below 3 scatter diagrams that plot the total gross versus each of the 3 other variables and a trend
line which attempt to model the best linear relationship possible for the data. Looking at figure 1, which had a
correlation coefficient closest to 1, we see that many more of the data fall near the line when compared to the
other 2 figures which plot total gross versus theaters and total gross versus weeks. Again, this bolsters the claim
that there is a strong positive linear relationship between the total and opening gross of a film. In other words, if
a film made a lot of money in the opening weekend, then it was likely to have made more money over all.
100,000,000
200,000,000
300,000,000
10
15
20
25
30
35
40
Conclusion
In summary, we have looked at measures of location, measures of variability, measures of relative
location, and measures of association to ascertain some important characteristics of successful movies in the
film industry. We discovered that the average movie (n=100) was in theaters for approximately 16 6 weeks,
appeared on average in 3,130 617 theaters, had an opening gross of about 30 33 million dollars, and on
average had a total gross of 100 96 million dollars. We also found that in our sample, our most successful
movie, based on total gross, was The Avengers, especially as its z-Score was 5.46. Separately we noticed that in
order for a movie to be somewhat successful, like the Moonrise Kingdom with a z-Score of -3.57 for the
number of theaters, it did not necessarily have to be viewed in a lot of theaters. Lastly, we realized that if we
compared the total gross with each of the 3 other variables using the correlation coefficient, that there existed a
positive linear relationship in each of the comparisons, but the strongest positive linear relationship in our
sample existed between the opening gross and total gross of the film with a correlation coefficient of 0.93.
Using this sample data which was derived from 2013, we can make some interesting assumptions about the
larger population of movies that exist. Of course, if one attempts to compare this data with more unrelated data,