Over the past few days, I have been introduced to a few new-to-me R packages, via some comments from the Shiny guys and the R-bloggers site. This seems a rather haphazard way of acquiring knowledge and I cannot be alone in thinking that this is not the most productive way to become aware of new/better packages. Word of mouth and occasional recommendations on sites like StackOverflow are all well and good but may be insufficient given the multitude of packages and growing user base
The CRAN site does a useful job with their summaries by Topic View but several packages I use the most e.g. ‘plyr’,'stringr’, and ‘XML appear to be absent. Perhaps a new topic or two devoted to manipulation and webscraping would help
It would be great to have a site where experts could review both new packages and revisions of the more-popular established ones. Crantastic does allow users to review and rate packages but there is a pretty low response. It also allows people to list the packages they use. This is potentially a good way of showing the most popular, and presumably most useful, packages but again the total data barely averages 1 per CRAN package. Still, better than nothing and I suggest more participation there would be helpful pending any data on actual CRAN downloads by package
As mentioned in my last post, I have been working on some CRAN stats and decided to produce a Shiny app which may help a trifle in this area. It will be regularly updated and summarizes all packages, by Topic group, as well as providing the detailed description and other details such as revision timeline by individual package. Nothing that new, but bringing info together in a different format. Please note that it takes a while to load. Hope some of you find it useful (177)
I have had some work on CRAN stats on the back-burner but the recent article How R Grows tempted me to push it up the list
In the interim, I have a couple of comments on Joseph Rickert`s article. Although the body of the article refers to packages either created or updated in a time period, the actual graph headlined “Packages submitted by Year“ shows a dramatic increase in numbers during 2012 with 2013 apparently poised for a further 50% increase. There are a couple of problems here. Firstly, he uses the latest revision as the date of submission. A package might have been revised 12 times in 2011 and only once in 2012 but it would only show up in the latter year`s data. From my analysis – which thankfully looks v similar for this graph – I can reproduce the image but with a different title
The 2012 figure will fall as the year progresses
Although I came across a couple of glitches, the archive files for each package give a date of first publication. Here are the data on initial releases by year
This shows a smoother upward trend. With a third of the year gone, it looks as though new R packages will be broadly similar to last year
Finally, here is a boxplot of the number of revisions a package has undergone by year.
Unsurprisingly it is not a normal distribution with lots of outliers – the Matrix package leads the way with 166 revisions since its introduction in 2000 – and a general tendency for the earlier packages to average more updates
A few days ago there was an interesting R based article by diffuseprior on the decline and fall in the quality of The Simpsons
The author scraped results from GEOS, an online survey of TV programs, and applied the R package changepoint to offer an analysis of the show over time
This seemed a candidate for a Shiny App, as there are another gross of shows on GEOS. 24 follows a similar pattern to ‘The Simpsons’ although this well-defined decline is by no means universal
. Although using this app multiplies the quantity of charts available, its automation precludes some of the difficult-to-accomplish, data munging done in the original post e.g excluding specials. This will cause some distortion
I have adapted diffuseprior’s code in a few respects
- I used the the readHTMLTable from the XML package as the data is contained in a tabular form
- I used ggplot for the graph rather than base plot. This took a bit more work but enabled me to display visually the relative number of voters for each episode of a show
- It is now available as a Shiny App , covering 145 shows, with any new GEOS votes incorporated in real-time
The code is available as a gist 5498431 (691)
British journalists have been waiting a dozen years to be able to use the above pun and there have recently been a number of comments on Rooney’s perceived decline
A recent column by Martin Samuel in in the Daily Mail quotes several random stats disputing this but then goes on to argue that
“Yet by all purely intuitive reckonings, something is not right.”
That may well be so, but let us look a bit more closely at the data. Here is a graph showing the creative contribution in terms of goals and assists that Rooney has made by EPL game
The first thing to note is how consistent he has been. Since he turned 20 he has pretty much averaged a point(goal+assist) per game each season and adapted to being either more of a provider – to the likes of Ronaldo or van Persie – or striker e.g 27 goals last year
This season, although his appearances are down, has been no different with 23 points in 23 games. And there is precious little suggestion from the figures that he was much better in the earlier going this year. He has snatched 10 goals in his past 13 appearances (in one of which he was subbed after 8 minutes), including only goals in two 1-0 victories. And this at a time when van Persie has struggled to find the net
So, although the argument fits the narrative that the aggressive character is burnt out after starting so young, I’m not buying it yet.
It seems to me this is to some extent journalists supporting the Man U party line, which is still irritated at being out-smarted by Rooney and his agent on his last contract and are looking to a lower pay-packet if his demands for a new one are true (83)
Not the start Blue Jays fans were looking for. With a 6-9 record at the time of publication they already trail Boston (led by reviled ex-BJ manager, John Farrell) by 4.5 games.
The batting has been particularly anemic but the pitching – particularly the starting rotation – has also been a concern
I have whipped up a Shiny App comparing Salary and WAR (Wins above Replacement) for each MLB team over the past couple of years. I will extend the coverage in due course
Here are the BJ results to date.
Josh Johnson ($13.75m) and Mark Buehrle ($11m) have the highest salaries (120)