Anda di halaman 1dari 17

T20 Cricket Score

Predicting 1st innings scores
● Training set consists of 636 T20 international matches all over India from
2008 to 2017.The data was taken from

● Consider the 1,50,460 1st innings balls in the dataset to be independent of

one another.

● The first model : Linear Regression

● The second model : KNeighborsRegressor
● The third model : RadiusNeighborsRegressor
● The fourth model : RandomForestRegressor
● The fifth model : LogisticRegression
● The sixth model : DecisionTreeRegressor
Linear Regression:In linear regression, the relationships are modeled using linear
predictor functions whose unknown model parameters are estimated from the
data.A linear regression line has an equation of the form Y = a + bX, where X is
the explanatory variable and Y is the dependent variable.

The features that we took into consideration were present score, wickets fallen
and balls remaining.

The equation for final score was score =

1.080*current_score+1.16*balls_remaining – 4.04*wickets + 17.1
Accuracy of Linear Regression Model:

Graph of Predicted Vs Final scores:

K Nearest Neighbors - Regression: K nearest neighbors algorithm stores all
available cases and predict the numerical target based on a similarity measure
(e.g., distance functions).

One way to implementation of KNN regression is to calculate the average of the

numerical target of the K nearest neighbors. Another approach uses an inverse
distance weighted average of the K nearest neighbors. KNN regression uses the
same distance functions as KNN classification.
Accuracy of KNN:
RadiusNeighborsRegressor: RadiusNeighborsRegressor is a similar algorithm
to KNN, instead of searching for a fixed number of nearest neighbours, this
algorithm finds all neighbours which are within a certain distance.

The principle behind nearest neighbor methods is to find a predefined number of

training samples closest in distance to the new point, and predict the label from

RadiusNeighbors-based classification is a type of instance-based learning or

non-generalizing learning: it does not attempt to construct a general internal
model, but simply stores instances of the training data.
RandomForestRegressor : Random forests are an ensemble learning method for
classification, regression and other tasks, that operate by constructing a multitude
of decision trees at training time and outputting the class that is the mode of the
classes (classification) or mean prediction (regression) of the individual
trees.Random decision forests correct for decision trees habit of overfitting to their
training set.

● Ensembles are a divide-and-conquer approach used to improve performance.

The main principle behind ensemble methods is that a group of “weak
learners” can come together to form a “strong learner”.
Logistic Regression : Logistic regression can begin with an explanation of the
standard logistic function. The logistic function is useful because it can take any
real input t , ( t ∈ R ), whereas the output always takes values between zero and
one and hence is interpretable as a probability. The logistic function σ ( t ) is
defined as follows:

In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme, rather than the “true”
multinomial LR.

This class implements L1 and L2 regularized logistic regression using the liblinear library. It can handle
both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal
performance; any other input format will be converted.
DecisionTreeRegressor :The decision trees is used to fit a sine curve with
addition noisy observation. As a result, it learns local linear regressions
approximating the sine curve.

● A 1D regression with decision tree.

● We can see that if the maximum depth of the tree (controlled by the max_depth parameter) is set
too high, the decision trees learn too fine details of the training data and learn from the noise, i.e.
they overfit.
As we have very little information to base our predictions on – the best we can do
is give the historical average final score. We become 80% accurate at around the
13th over and 95% accurate with about 2 overs to go.A lot can happen in the last
couple of overs, but over a large number of games any differences tend to
average out.