Margins of Victory

This is a repost from a few days ago that I am using as my introduction to the R-bloggers site. Having experimented with R lately, I have decided to add the relevant code to future blogs mainly in the hope of suggestions for improvemen; this code being a case in pont. You can view the script and a few notes at the end of the blog. I may go back and add code to some of the previous charts I have produced and will twitter such occurrences at @pssguy

Alex Ferguson was relieved to get a 1-0 result at Everton. The team’s first such scoreline of the year but 95th all time in the EPL, second only to Chelsea
I thought it would be interesting to look at the distribution of wins and losses by team over EPL history and as it ‘appens (RIP Jimmy), there is an R package which makes constructing a graph showing each of these fairly straight forward.

Shown here are the total, home, away and categorization of Man U results, to date

As the most successful team in EPL history wins dwarf losses. Indeed, United have won significantly more games with a three goal margin, 140, than they have lost at all, 105

The drubbing in the recent derby stands out, as does the fact that United have lost only two home games by precisely 2 goals in nearly twenty years

United have won 42 away games by a margin of three goals or more, but Arsenal pip them by 1

Chart type: Back to Back Histogram
Inspiration: Patrick Burn
Data: Own data
Tools: MSSQL database, R
Packages: stringr, RODBC, Hmisc
Fix: xaxis should show goal margin as integer perpendicular to axis
Develop: Create function for team, home/away. Add opponents, by season options
Make Interactive with web input of parameters (help required)

#  Inspired by Back to Back histogram http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136
 
library(RODBC) # for database connection
library(Hmisc) # for the plot
library(stringR) # for text manipulation
 
# link to database using the RODBC package and create a df with relevant data
 
channel <- odbcConnect("myConnection")
scores <- sqlQuery(channel,paste(
"
select TEAM as teamID, Ground as ground,RES as result,GF as glsFor,GA as glsAg
from myTable
"
));
odbcClose(channel)
 
# take a look at the data - the first result shows a home game for Blackburn 
head(scores,1)
#teamID ground result glsFor glsAg
 #  BLB      H      D      0     0 # a no-score draw - pretty exciting!
 
# add a column for margin of victory/loss
scores$margin <- abs(scores$glsFor - scores$glsAg)    
 
# Do a little text manipulation to enhance axes labels using the stringR package
scores$result <- str_replace(scores$result,"W","Wins")
scores$result <- str_replace(scores$result,"L","Losses")
 
# now let's look at a plot of Manchester United's away record by setting a variable
# we should exclude the tied results as well
myData <- subset(scores,teamID=="MNU"&ground=="A"&result!="D")
 
# myData is available at  read.csv("http://www.premiersoccerstats.com/mnuadata.csv")
 
# Now plot the data using the histbackback function in the Hmisc package
# The function creates 3 lists for wins,losses at each goal margin and plots the data
png("myPng.png")
out <- histbackback(split(myData$margin, myData$result), probability=FALSE,   axes=TRUE,  
brks=c(0:max(myData$margin)), ylab="Goal Difference" ,col.main="black",
                    main = "Man. Utd. EPL  Away Results (exc draws)")
 
# Add some colour to help differentiate
barplot(-out$left, col="red" , horiz=TRUE, space=0, add=TRUE, axes=FALSE)
barplot(out$right, col="blue", horiz=TRUE, space=0, add=TRUE, axes=FALSE)
dev.off()

Created by Pretty R at inside-R.org

Short URL: http://tinyurl.com/k8dquxb

Leave a Reply