Monthly Archives: May 2013

Shiny App for CRAN packages

Over the past few days, I have been introduced to a few new-to-me R packages, via some comments from the Shiny guys and the R-bloggers site. This seems a rather haphazard way of acquiring knowledge and I cannot be alone in thinking that this is not the most productive way to become aware of new/better packages. Word of mouth and occasional recommendations on sites like StackOverflow are all well and good but may be insufficient given the multitude of packages and growing user base

The CRAN site does a useful job with their summaries by Topic View but several packages I use the most e.g. ‘plyr’,'stringr’, and ‘XML appear to be absent. Perhaps a new topic or two devoted to manipulation and webscraping would help

It would be great to have a site where experts could review both new packages and revisions of the more-popular established ones. Crantastic does allow users to review and rate packages but there is a pretty low response. It also allows people to list the packages they use. This is potentially a good way of showing the most popular, and presumably most useful, packages but again the total data barely averages 1 per CRAN package. Still, better than nothing and I suggest more participation there would be helpful pending any data on actual CRAN downloads by package

As mentioned in my last post, I have been working on some CRAN stats and decided to produce a Shiny app which may help a trifle in this area. It will be regularly updated and summarizes all packages, by Topic group, as well as providing the detailed description and other details such as revision timeline by individual package. Nothing that new, but bringing info together in a different format. Please note that it takes a while to load. Hope some of you find it useful

How R Grows – not so fast

I have had some work on CRAN stats on the back-burner but the recent article How R Grows tempted me to push it up the list

In the interim, I have a couple of comments on Joseph Rickert`s article. Although the body of the article refers to packages either created or updated in a time period, the actual graph headlined “Packages submitted by Year“ shows a dramatic increase in numbers during 2012 with 2013 apparently poised for a further 50% increase. There are a couple of problems here. Firstly, he uses the latest revision as the date of submission. A package might have been revised 12 times in 2011 and only once in 2012 but it would only show up in the latter year`s data. From my analysis – which thankfully looks v similar for this graph – I can reproduce the image but with a different title
cranPub1

The 2012 figure will fall as the year progresses

Although I came across a couple of glitches, the archive files for each package give a date of first publication. Here are the data on initial releases by year
cranPub2

This shows a smoother upward trend. With a third of the year gone, it looks as though new R packages will be broadly similar to last year

Finally, here is a boxplot of the number of revisions a package has undergone by year.

cranPub3
Unsurprisingly it is not a normal distribution with lots of outliers – the Matrix package leads the way with 166 revisions since its introduction in 2000 – and a general tendency for the earlier packages to average more updates

TV shows rated by episode as a Shiny App

A few days ago there was an interesting R based article by diffuseprior on the decline and fall in the quality of The Simpsons

The author scraped results from GEOS, an online survey of TV programs, and applied the R package changepoint to offer an analysis of the show over time

This seemed a candidate for a Shiny App, as there are another gross of shows on GEOS. 24 follows a similar pattern to ‘The Simpsons’  although this well-defined decline is by no means universal

24

.  Although using this app multiplies the quantity of charts available, its automation precludes some of the difficult-to-accomplish, data munging done in the original post e.g excluding specials. This will cause some distortion

I have adapted diffuseprior’s code in a few respects

  1. I used the the readHTMLTable from the XML package as the data is contained in a tabular form
  2. I used ggplot for the graph rather than base plot. This took a bit more work but enabled me to  display visually the relative number of voters for each episode of a show
  3. It is now available as a Shiny App , covering 145 shows, with any new GEOS votes incorporated in real-time

The code is available as a gist 5498431