Craig Bellamy – quite dplyr

This weekend brought a couple of firsts in Cardiff’s winner against Norwich

After a wretched time at Manchester United, Wilfried Zaha recorded his first Premiership assist, whilst, more interestingly, Craig Bellamy became the first player in history to score for seven different Premier League clubs

To celebrate, I thought it was worth taking a quick data dip with the new dplyr package for R, a souped up version of plyr for data.frames.

A main advantage of dplyr is that is way faster than plyr but it also offers the option to chain operations, utilizing %.%. This encourages the good discipline of planning logically ahead of coding, something I am not naturally inclined to, and should make the code more readable

I have loaded into R a largish, (270,000 row) data.frame, playerGames, of players’ appearances in the English Premier League

My target is a graph showing for each the players who have scored for the most different clubs how many games it has taken them to score their first goal for each of these teams.

The process uses several of the dplyr functions. Firstly, I want to tidy up the data, reduce the data to variables of interest and then add some required columns. I then want to find out who these itinerant players are and ascertain when they got off the mark with each club Finally I will knock out a ggplot

?View Code RSPLUS
# load packages - make sure plyr is not running as this may cause issues
library(dplyr)
library(ggplot2)
library(scales)
 
# convert the data.frame to a tbl_df: 
#this is a wrapper around a data frame that won't accidentally print a lot of data# to screen
playerGames_df <- tbl_df(playerGames) 
 
# start the munging
allGames <-playerGames_df %.%
 
# omit rows which exclude players not appearing in game
filter(playerID!="OWNGOAL"&(START+subOn)>0) %.%
# rename columns to standard format
?View Code RSPLUS
# set to required columns
select(playerID,teamID,goals,gameDate) %.%
 
# sort on game date
arrange(gameDate) %.%
 
# group each player by team
group_by(playerID,teamID) %.%
 
# so that we can set a game order and cumulate goals for each #player/team
mutate(
game = 1:NROW(Goals),
cumGoals = cumsum(Goals)
)
 
# example row
tail(allGames,1)
       playerID teamID goals   gameDate game cumGoals
222249    OSCAR    CHL     0 2014-02-03   56       10
 
# now we need to find these players
topPlayers0) %.%
 
# and sum the number of clubs by player
group_by(playerID) %.%
summarise(teams=n()) %.%
 
# now just show Bellamy and the others who were also on six teams
filter(teams==max(teams)|teams==max(teams)-1))$playerID
 
topPlayers
#[1] "BARMBYN" "COLEA1" "BENTD" "BELLAMC" "KEANER2" #"CROUCHP" "ANELKAN" "FERDINL"
 
# now for these players calculate the debut goal data
firstGoal0) %.%
 
# and then select first row for each player/club
group_by(playerID,teamID) %.%
summarise(first=min(game))
 
head(firstGoal,1)
#  playerID teamID first
#1    BENTD    ASV     1

 

At this point, my computer, WordPress and the coding wrapper decided to screw up. The rest of the code just replaces playerID with real names and uses ggplot to create a chart

bellamy

A few football points to note

  • Bellamy took 13 appearances to score his first Premiership goal fro Cardiff, although he had scored plenty for them in the division below. This is the longest due in part to many sub appearances, playing with a weak team and old age
  • Darren Bent  scored on his debut on four occasions. Anelka never managed it before game 4
  • Out of roughly 4,000 players who have appeared in the Premiership, both with surname, Bent, figure. One of the two A Coles and one of the two R Keanes also appear in the list of nine
  • Liverpool and Tottenham figure the most with five stops. Crouch, Keane and Bellamy have each appeared for both clubs
  • All five Spurs players scored in their first four appearances. By contrast, none of the Liverpool five got off the mark before game 7 (Bellamy) with all the other is the 10-12 range

Shiny App for Polling Forums

In 2010, Crystal Palace FC were in administration and had 10 points deducted during the year. They only survived in the Championship on the last day of the season

A year ago, they started the league with three consecutive losses and were relegation favourites.

Fast forward 12 months and they are again strong tips for the drop but this time from the Premier League. Last weekend, a player who was playing non-league football not so long ago bagged the winner for their first three points of the new season

It has been quite a ride for their supporters, including myself, and has led to a significant increase in volume on the forums the supporters website, holmesdale.net

As a Shiny developer, I thought it would be useful to have a web app which regularly polls the forums, thus saving on many link clicks to find the latest coverage – which admittedly is the usual mixture of interesting comment, lame banter and swear words

For me, the most interesting aspects as far as development is concerned are

  • Application of ‘invalidateLater’ which reactivates the function that scrapes the web pages on a proscribed time basis e.g 30 minutes, alterable by the user
  • Enactment of the progress bar, a new feature of the shiny-incubator package
  • Use of the dataTables component of the rCharts package, which allows sorting and searches

I will probably add a couple of extra features but a working version can be found here

I would definitely be interested in doing similar work for items of more general usage
so any suggestions are very welcome

More goodies from rCharts

The guys developing rCharts continue to release enhancements by the day and I have taken advantage to update a couple of Shiny apps

example

The CRAN download app now sports the new exporter feature so that any chart a user comes up with can be saved as a SVG vector, PNG or JPEG image or as a PDF document. In addition, there is now support for the Datatables js library so I have taken the opportunity to revamp the tables in the app. The top 10 sections now page all packages/countries (so takes a little longer to load) and also has a useful filter option

The Wikipedia Search app was initially based on ggplot but I have now added a Highchart from the rChart library. This has the benefits of providing tooltips and a zoom facility as well as the above-mentioned exporter option. Most importantly, click events are also now available for the app. For any point the user can click to obtain either the subject’s wikipedia page or the google search result for news of the subject for that particular day. So taking our favourite subject, Justin Bieber, one can see that a paricular highpoint was the 1st March 2011. Clicking on the point it shows that this was his 17th birthday when the romance with Selena Gomez was at its most intense. Some of his other peaks are for less salubrious events. It is pretty clear that this could prove a pretty useful for any journalistic research over the past five years

Hope you enjoy. There is plenty more under development

Not only CRAN downloads and Shiny … but also .. rCharts

I have been meaning for some time to get stuck into the rCharts package which provides
an interface to many Javascript graphic libraries. These offer rich charting capabilities with interactivity and a great deal of customization.

example

As regular readers will know, I am also interested in improved publicity for CRAN packages, although the Shiny app I developed needs a bit of attention!

So I was delighted when RStudio decided to release daily logs of downloads from their CRAN mirror. It seems as though I am not alone in this, as there have already been several blog posts including a Top 100 of 2013 with some nice graphs from Tal Galili and maps from James Cheshire and Ramnath Vaidyanathan . Of course, this is only one of about 90 CRAN mirrors so
there is no way to be sure it reflects total usage but the usual suspects top the charts

As it happens, Ramnath is the prime mover behind rCharts and he and fellow contributor, Thomas Reinholdsson, have held my hand in the development of a Shiny CRAN Downloads app. The charts included are based on the sophisticated Highcharts library which offers elegant presentation, zoom capability and interactivity including customizable tool-tips and click events, which I have used to link to the relevant package pdf.

Users can select one or more packages or countries and get a chart showing activity in terms of total downloads, rank or percent. In addition, there are tables showing both the Top Tens by week since November 2012 and record achievements in terms of ranking and downloads. Shiny, for example, achieved its highest ranking position of 70 last week as it topped the 1000 downloads from this mirror for the first time

The code, with a limited set of the data, is available as a gist. Or load shiny and runGist(’5832902′) to view

So give it a whirl and let me know what you think.

I’m planning to keep it updated each Monday