Monthly Archives: November 2012

Shiny is the new Cool

Several of you will probably have tried out the new Shiny package brought to the table by the RStudio guys

This is just what I have been looking for and to my mind could provide a quantum leap in the use of R. There have been other packages addressing the need for web user interactivity but this is the first one I have found that makes it relatively easy for the less technically able of us to get something up and running

Up until now, I have used Flex/Flash to allow interactivity but there are several drawbacks including the platform itself and development time. I also wish to take advantage of the statistical processing and graphical opportunities offered by R

Much of my future work will be sports-related but I am kicking off with an app based on work I have presented in a previous post

The app, which graphically compares subjects wikipedia search rates by day, can be seen accessed RStudio’s server option. Currently under development (so there may be problems), I am hoping it is financially affordable once it goes fully live!
I have posted the code as a gist and for those of you with the shiny package installed on R it can also be run with the shiny::runGist(’4171750′) command

Actual run times to produce the graph are pretty acceptable especially if the number of subjects compared is small and only a limited number of months is surveyed. Any graphs are easily saveable, of course.

It’s mostly fun, of course, and quite addictive. My son took particular pleasure in seeing the lack of interest in any Blue Jays player compared with Derek Jeter.

Where does the star of ‘The Magnificent Seven’ stand in interest comapred with his fellow heroes?

And what explains this pattern with Shakespeare? Click on image for larger version

I am sure there are more useful aspects to be mined. For instance, if a particular subject has to be evaluated, peaks in the timeline will allow easy access to news for that day. And as I have alluded to before perhaps the interest in a particular political candidate is a foreshadow (and potential replacement of) standard opinion polls

For now, I will leave the last word to John Lennon.

And the Player of the Year is …

Less than one third of the season gone and the Daily Mail has an article on their player of the season. Experts weigh in with their opinions with a consensus that Luis Suarez
is the front-runner. Some data on his contribution via goals and assists would appear to confirm that view

Let’s see how he stacks up with all-time EPL leaders at this stage of the season

As you can see, Suarez is up there with the best of them. le Tissier, like Suarez, was a star on a mediocre team who perenially appeared at or near the top of this category. Defoe’s run occurred whilst he was with Portsmouth and led to his return to Spurs less than 12 months after leaving – at double the price

It is obviously difficult to maintain an 80% hand in all goals for a whole season. Here is what Suarez has to aim at after the 38 games are complete

1993/1994 saw a battle royal between Sutton and le Tissier. In those days, teams played 42 games and a strong finish from the Southampton player saw him pip his rival with a 73% contribution to all the team’s goals. Nevertheless, it is hard not to see Henry’s 2002/3 season as the greatest attacking one in EPL history

Hughes’s last breath?

At the time of writing, Mark Hughes is still the manager of QPR, but today’s defeat at
home to fellow-struggler’s, Southampton, would seem to seal his fate
If so, one of his last, and strangest, pronouncements was a quote to the Guardian “Our aim at the start of the season was to finish in the top half and I still think we can do that
If the first half of his statement was optimistic, the second half appears to show a certain detachment from reality

So I decided to look at how teams at the bottom of the Premier League after 11 games ended the season. Since the start of the 38 game season in the EPL the results are…

In seven of the previous seventeen campaigns, the team has also ended the season at the foot of the table; three more have been relegated; and only Spurs have finished higher than fifteenth. In fact, they ended the year in a creditable 8th spot but it will be of little comfort to Hughes’ that Spurs recovery coincided with a managerial replacement in the form of Harry Redknapp
They also have as bad a points return, at four, as any of their fellow bottom-dwellers.
To add insult to injury, the defeat today now puts them slap bang at the bottom after 12 games

‘urry up ‘arry, come on!

Revisiting the GOP Race with the Huff Post API and pollstR

Well, one election is over but it is never too soon to start another – or in this case revisit the past four years

One day after the 2008 US Presidential election, there was a Rasmussen poll taken of 1000 likely voters asking for their choice for the 2012 Republican Presedential Candidate.
The overwhelming favourite was Sarah Palin, who garnered 64% of the preferencees with Huckabee(12) and Romney(11) the only others to reach double digits. And thus started arguably the most topsy-turvy race in election history – ending in ultimate defeat.

Guys at the Huffington Post have kindly produced an API for stacks of opinion polls and Drew Linzer has produced an R function, pollstR, on github to interact with it

The first step is to determine which HP poll the data is in

?View Code RSPLUS
 
library(XML)
library(ggplot2)
library(plyr)
 
url <-"http://elections.huffingtonpost.com/pollster/api/charts"
raw.data <- readLines(url, warn="F") 
rd  <- fromJSON(raw.data)
pollName <- c()
for (i in 1:length(rd)) {
  pollName <- append(pollName,rd[i][[1]]$slug)
print(pollName)
}

This provides a list of 345 polls and a quick perusal shows that the required one is named “2012-national-gop-primary” so this can be plugged into the aforementioned function, once it has been sourced, and an analysis of the resulting data performed

?View Code RSPLUS
# extract data to a data.frame
polls <- pollstR(chart="2012-national-gop-primary",pages="all")
# look at the structure
colnames(polls) # 43 columns most of them names of candidates
#[1] "id"         "pollster"   "start.date" "end.date"   "method"     "subpop"     "N"          "Romney"     "Gingrich" ...
# the data needs to be reshaped - for my purpose I just need the end.date and candidates data
polls <- polls[,c(4,8:43)]
polls.melt <- melt(polls,id="end.date")
# set meaningful columns
colnames(polls.melt) <- c("pollDate","candidate","pc")
 
# get a list of candidates that have polled 10% or more at least once
contenders <- ddply(polls.melt,.(candidate),summarize,max=max(pc,na.rm=TRUE) )
contenders <- subset(contenders,max>9)$candidate
 
# eliminate results for undecideds etc.
contenders <- contenders[c(-4,-5,-7,-11,-18)]
 
# I want to plot the each poll leader and have their name show on the max value for when they led
polls.melt <- arrange(polls.melt,desc(pc))
polls.melt <- ddply(polls.melt,).(pollDate), transform, order=1:nrow(piece))
leaders <- subset(polls.melt,candidate %in% contenders&order==1)
# romney has two pc of 57% so need to hack for a clear graph
leaders[96,3] <- 56
# create highest poll (when leading) for each candidate
leaders$best <- "N"
for (i in 1:nrow(leaders)) {
if (leaders$pc[i]==leaders$max[i]) {
  leaders$best[i]<-"Y"
}
}
# now produce graph
q <- ggplot(leaders,aes(as.POSIXct(pollDate),pc))+geom_point(aes(colour=candidate))
q <- q+geom_text(aes(label=candidate,colour=candidate,vjust=-1),size=3,data=leaders[leaders$best=="Y",])
q  <- q+  ggtitle("Leader of GOP polls and Maximum value by Candidate")+ylab("%")+xlab("")+theme_bw()
q


For the first couple of years, Palin, Huckabee and Romney continued to dominate but when the race commenced for real an amazing eleven participants – even Donald Trump – ended up topping a poll on at least one occasion

It is worthwhile looking at individual candidate’s performance over the final 18 months

?View Code RSPLUS
 p <- ggplot(subset(polls.melt,candidate %in% contenders&pollDate>"2010-12-31"),aes(pollDate,pc)) 
 
 p <- p+ geom_smooth(se=FALSE) +facet_wrap(~candidate) +scale_x_date(breaks = date_breaks("years"),labels = date_format("%Y"))
p <- p +  ggtitle("Smoothed results of National Polls - GOP Race")+ylab("%")+xlab("")+theme_bw()
p <- p+ theme(strip.text.x = element_text(colour="White", face="bold"),
           strip.background = element_rect( fill="#CB3128"))
p

Once Palin and Huckabee had proved uninspiring, the field narrowed to the cultish Ron Paul, the ‘meh’ candidate, Romney, and a host of short-lived shooting stars

Liverpool’s Transfer Record

A recent article in the Daily Telegraph prompted me to produce a graph of purchases Liverpool have made under their past four managers. Against each player’s name is the percentage time of Liverpool’s league games they have spent on the pitch since being acquired.

The data excludes both players where the purchase price was undisclosed e.g. Sebastian Coates and those obtained on free transfers -sometimes on high wages – e.g. Joe Cole

I have split the graphs into two, over and under £10 million. Players still on the team’s books are in black

Since Skrtel was acquired in January 2008, the successes have been few and far between.
As for failures. Take your pick