How R Grows – not so fast

I have had some work on CRAN stats on the back-burner but the recent article How R Grows tempted me to push it up the list

In the interim, I have a couple of comments on Joseph Rickert`s article. Although the body of the article refers to packages either created or updated in a time period, the actual graph headlined “Packages submitted by Year“ shows a dramatic increase in numbers during 2012 with 2013 apparently poised for a further 50% increase. There are a couple of problems here. Firstly, he uses the latest revision as the date of submission. A package might have been revised 12 times in 2011 and only once in 2012 but it would only show up in the latter year`s data. From my analysis – which thankfully looks v similar for this graph – I can reproduce the image but with a different title
cranPub1

The 2012 figure will fall as the year progresses

Although I came across a couple of glitches, the archive files for each package give a date of first publication. Here are the data on initial releases by year
cranPub2

This shows a smoother upward trend. With a third of the year gone, it looks as though new R packages will be broadly similar to last year

Finally, here is a boxplot of the number of revisions a package has undergone by year.

cranPub3
Unsurprisingly it is not a normal distribution with lots of outliers – the Matrix package leads the way with 166 revisions since its introduction in 2000 – and a general tendency for the earlier packages to average more updates

Short URL: http://tinyurl.com/l43obxd

Leave a Reply