p=1124
PremierSoccerStats
Your x on the facts
head(allGames)
FIRSTNAME LASTNAME PLAYERID POSITION TEAMID PLAYER_TEAM TEAMNAME DATE START ON
1 Steve Jones JONESS1 F WHU 2054 West Ham U 1993-11-01 0 0
The data is pretty self-evident. Position shows that Steve Jones is a forward and that for the game in question he neither started nor was used as a substitute. As I am basically try-
ing to show when players were in the team squad, I will still include these data in the analysis. To obtain a players career length at a particular club, I need to nd the earliest and
latest dates: probably overkill, but I am used to using the plyr package
library(plyr)
allGames.summary <- ddply(allGames,.(PLAYERID,TEAMID),function(x) c(start=min(x$DATE),end=max(x$DATE)))
# Here is Steve Jone's line at West Ham
subset(allGames.summary,TEAMID=="WHU"&PLAYERID=="JONESS1")
PLAYERID TEAMID start end
2574 JONESS1 WHU 1993-08-14 1997-02-01
OK. Now we can get to some graphing. Lets go way back to the beginning of the Premier League and look at the squad of the champions that season, Manchester United, id MNU
library(ggplot2)
q <- ggplot(subset(allGames.summary,TEAMID=="MNU"&start==as.POSIXct(min(allGames.summary$start)))) +
geom_segment(aes(x=start, xend=end, y=PLAYERID, yend=PLAYERID), size=3)
print(q)
Note the use of the min function again to get the rst date and the geom_segment function of ggplot perfect for producing the required lines. Two gotchas to watch out for. The
dates are of POSIXct datatype and unless they are coerced to that an error arises. Also, if the + is placed on the second line the layer does not get added and no plot appears
As can be seen, the data looks reasonable. All the lines start at one point and show dierent end points. To those in the know, Giggss line correctly extends to the current day; he is
the only player appearing 20 years ago still to pull on a shirt.
However, it is not that aesthetically pleasing. Aspects that could be included include
Wrap it in a function
Some of these amendments need more analysis, others are just adding to the ggplot code
# create a function which takes the team id and game date as parameters
tlPlot <- function(theTeam,theDate) {
# to cover all clubs a player appeared for we need to obtain a list of their ids
squad <- subset(allGames.summary,TEAMID==theTeam&start==as.POSIXct(theDate))$PLAYERID
# order the data by the number of appearances whilst with the team ( and reversed for graph)
playerOrder <- arrange(subset(allGames.summary,TEAMID==theTeam&PLAYERID %in% squad),desc(apps))$player
playerOrder <- rev(playerOrder)
# create the title (full team name and date would be shown with more space)
theTitle <- paste("Careers for players appearing for",theTeam,"on",theDate,sep=" ")
Voila!
Not perfect but certainly more informative and now replicable. The analysis can easily be extended. For instance, one could select the players with top ten appearances for a club
or show all those who were on squads whilst a particular player was there. The position factor could be identied by colour whilst using an alpha scale for apps.
But thats all for now
This entry was posted in R, Soccer er Football on October 21, 2012 [http://www.premiersoccerstats.com/wordpress/?p=1124] .