Category Archives: Socio-Economic

Tables/Graphs on subjects that catch my fancy. Some might migrate to dashboards and/or apps
Suggestions welcome but will only be acted on if they interest me and are within my powers to implement

TV shows rated by episode as a Shiny App

A few days ago there was an interesting R based article by diffuseprior on the decline and fall in the quality of The Simpsons

The author scraped results from GEOS, an online survey of TV programs, and applied the R package changepoint to offer an analysis of the show over time

This seemed a candidate for a Shiny App, as there are another gross of shows on GEOS. 24 follows a similar pattern to ‘The Simpsons’  although this well-defined decline is by no means universal

24

.  Although using this app multiplies the quantity of charts available, its automation precludes some of the difficult-to-accomplish, data munging done in the original post e.g excluding specials. This will cause some distortion

I have adapted diffuseprior’s code in a few respects

  1. I used the the readHTMLTable from the XML package as the data is contained in a tabular form
  2. I used ggplot for the graph rather than base plot. This took a bit more work but enabled me to  display visually the relative number of voters for each episode of a show
  3. It is now available as a Shiny App , covering 145 shows, with any new GEOS votes incorporated in real-time

The code is available as a gist 5498431

Analyzing Local Data with a Shiny Web App

A great. recent enhancement for the Shiny App is the ability to upload local files.
Now, in addition to users being able to interact with data provided on the host e.g.
Soccer Tables or via the web, Wikipedia Search Rates they can use apps
to view and analyse their own data

I have knocked up an app based on the 09_upload example provided in the Shiny package. It uploads a small .csv spreadsheet file of school pupil’s scores from a local directory, displays the data and does a couple of analyses.

The ui.R  enables a user to upload a csv file with various parameters, seperator etc.  The example file is downloadable. There are then three tabs showing

  1. The raw data in a gVis Table, which allows sorting and paging
  2. A density graph with the spread of results by term and year
  3. A statistical test to see if there is a difference in marks by gender
?View Code RSPLUS
#ui.R
shinyUI(pageWithSidebar(
  headerPanel("Uploaded File Analysis"),
 
  sidebarPanel(
    helpText("This app is shows how a user can update a csv file from their own hard drive for instant analysis.
In the default case, it uses standard format school marks that could be used by many teachers
Any file can be uploaded but analysis is only available
if the data is in same format as the sample file, downloadable below
"),
    a("Pupil Marks", href="http://dl.dropbox.com/u/25945599/scores.csv"),
    tags$hr(),
    fileInput('file1', 'Choose CSV File from local drive, adjusting parameters if necessary',
              accept=c('text/csv', 'text/comma-separated-values,text/plain')),
 
    checkboxInput('header', 'Header', TRUE),
    radioButtons('sep', 'Separator',
                 c(Comma=',',
                   Semicolon=';',
                   Tab='\t'),
                 'Comma'),
    radioButtons('quote', 'Quote',
                 c(None='',
                   'Double Quote'='"',
                   'Single Quote'="'"),
                 'Double Quote'),
    tags$head(tags$style(type="text/css",
                         "label.radio { display: inline-block; margin:0 10 0 0;  }",
                         ".radio input[type=\"radio\"] { float: none; }"))
 
  ),
  mainPanel(
    tabsetPanel(
      tabPanel("Pupil Marks",
               h4(textOutput("caption1")),
               checkboxInput(inputId = "pageable", label = "Pageable"),
               conditionalPanel("input.pageable==true",
                                numericInput(inputId = "pagesize",
                                             label = "Pupils per page",value=13,min=1,max=25)),
 
               htmlOutput("raw"),
                value = 1),
      tabPanel("Term Details",
               h4(textOutput("caption2")),
               plotOutput("density"),
               htmlOutput("notes2"),
               value = 2),
      tabPanel("Gender difference",
               h4(textOutput("caption3")),
               plotOutput("genderDensity", height="250px"),
               verbatimTextOutput("sexDiff"),
               htmlOutput("notes3"),
               value = 3),
      id="tabs1")
 
)
))

The server.R takes the file, does some processing and provides a list of data which can be rendered into plots and tables

?View Code RSPLUS
# server.R
shinyServer(function(input, output) {
 
  Data

Finally, global.R loads the requisite libraries and houses the script which generated the sample file

?View Code RSPLUS
#global.R
# load required libraries
library(shiny)
library(plyr)
library(ggplot2)
library(googleVis)
library(reshape2)
 
####creation of example data on local directory for uploading####
 
# #load a list of common first names
# faveNames

The app is viewable on glimmer and the code as a gist

Shiny Server – Earthshattering News

As you probably know, I am one of the strongest proponents of the Shiny package for developing interactive web applications

Amongst the latest news from RStudio is that what was planned to be commercial software will now be free and Open Source (AGPLv3 license)

To celebrate this momentous announcement, I have produced an Earthquake app. It was ‘inspired’ by a recent blog post which was quite interesting for a user with R installed and wanting to learn more about the maps package. Now, it starts to turn it into an app that can be used by anyone on the web, providing the user with options to vary time-period, earthquake magnitude and country of origin

Click for larger image. If Shiny/R had been invented 50 years ago I could have discovered Plate Tectonics!

I stress the “starts to turn it into an app” as this is what I will call a proof-of-concept app. There is so much out there to develop in this medium – and by far more proficient R developers – that I plan to initialize several Shiny apps that others can pick up and run with if they so desire. The gist for this app can be found here

Other than the fact that the selectable countries exclude Japan but include USSR there are several potential enhancements that came to me immediately

  • Extend to shorter time periods as alternatives to years
  • Add source to enable plotting lower magnitudes to show at smaller geographical areas e.g. counties
  • Assess other plotting packages e.g. ggplot2
  • Offer user option of splitting magnitudes by count rather than same-size cuts
  • Add more information on graph and/or table re time, position of some/all quakes

Let me know of any developments you make via the comments

On a side note, the Wiki Search Rates app now has an option to download the graph data as a csv file. Extremely useful for further analyses so one more heads up to the guys at RStudio

Shiny is the new Cool

Several of you will probably have tried out the new Shiny package brought to the table by the RStudio guys

This is just what I have been looking for and to my mind could provide a quantum leap in the use of R. There have been other packages addressing the need for web user interactivity but this is the first one I have found that makes it relatively easy for the less technically able of us to get something up and running

Up until now, I have used Flex/Flash to allow interactivity but there are several drawbacks including the platform itself and development time. I also wish to take advantage of the statistical processing and graphical opportunities offered by R

Much of my future work will be sports-related but I am kicking off with an app based on work I have presented in a previous post

The app, which graphically compares subjects wikipedia search rates by day, can be seen accessed RStudio’s server option. Currently under development (so there may be problems), I am hoping it is financially affordable once it goes fully live!
I have posted the code as a gist and for those of you with the shiny package installed on R it can also be run with the shiny::runGist(’4171750′) command

Actual run times to produce the graph are pretty acceptable especially if the number of subjects compared is small and only a limited number of months is surveyed. Any graphs are easily saveable, of course.

It’s mostly fun, of course, and quite addictive. My son took particular pleasure in seeing the lack of interest in any Blue Jays player compared with Derek Jeter.

Where does the star of ‘The Magnificent Seven’ stand in interest comapred with his fellow heroes?

And what explains this pattern with Shakespeare? Click on image for larger version

I am sure there are more useful aspects to be mined. For instance, if a particular subject has to be evaluated, peaks in the timeline will allow easy access to news for that day. And as I have alluded to before perhaps the interest in a particular political candidate is a foreshadow (and potential replacement of) standard opinion polls

For now, I will leave the last word to John Lennon.

Revisiting the GOP Race with the Huff Post API and pollstR

Well, one election is over but it is never too soon to start another – or in this case revisit the past four years

One day after the 2008 US Presidential election, there was a Rasmussen poll taken of 1000 likely voters asking for their choice for the 2012 Republican Presedential Candidate.
The overwhelming favourite was Sarah Palin, who garnered 64% of the preferencees with Huckabee(12) and Romney(11) the only others to reach double digits. And thus started arguably the most topsy-turvy race in election history – ending in ultimate defeat.

Guys at the Huffington Post have kindly produced an API for stacks of opinion polls and Drew Linzer has produced an R function, pollstR, on github to interact with it

The first step is to determine which HP poll the data is in

?View Code RSPLUS
 
library(XML)
library(ggplot2)
library(plyr)
 
url <-"http://elections.huffingtonpost.com/pollster/api/charts"
raw.data <- readLines(url, warn="F") 
rd  <- fromJSON(raw.data)
pollName <- c()
for (i in 1:length(rd)) {
  pollName <- append(pollName,rd[i][[1]]$slug)
print(pollName)
}

This provides a list of 345 polls and a quick perusal shows that the required one is named “2012-national-gop-primary” so this can be plugged into the aforementioned function, once it has been sourced, and an analysis of the resulting data performed

?View Code RSPLUS
# extract data to a data.frame
polls <- pollstR(chart="2012-national-gop-primary",pages="all")
# look at the structure
colnames(polls) # 43 columns most of them names of candidates
#[1] "id"         "pollster"   "start.date" "end.date"   "method"     "subpop"     "N"          "Romney"     "Gingrich" ...
# the data needs to be reshaped - for my purpose I just need the end.date and candidates data
polls <- polls[,c(4,8:43)]
polls.melt <- melt(polls,id="end.date")
# set meaningful columns
colnames(polls.melt) <- c("pollDate","candidate","pc")
 
# get a list of candidates that have polled 10% or more at least once
contenders <- ddply(polls.melt,.(candidate),summarize,max=max(pc,na.rm=TRUE) )
contenders <- subset(contenders,max>9)$candidate
 
# eliminate results for undecideds etc.
contenders <- contenders[c(-4,-5,-7,-11,-18)]
 
# I want to plot the each poll leader and have their name show on the max value for when they led
polls.melt <- arrange(polls.melt,desc(pc))
polls.melt <- ddply(polls.melt,).(pollDate), transform, order=1:nrow(piece))
leaders <- subset(polls.melt,candidate %in% contenders&order==1)
# romney has two pc of 57% so need to hack for a clear graph
leaders[96,3] <- 56
# create highest poll (when leading) for each candidate
leaders$best <- "N"
for (i in 1:nrow(leaders)) {
if (leaders$pc[i]==leaders$max[i]) {
  leaders$best[i]<-"Y"
}
}
# now produce graph
q <- ggplot(leaders,aes(as.POSIXct(pollDate),pc))+geom_point(aes(colour=candidate))
q <- q+geom_text(aes(label=candidate,colour=candidate,vjust=-1),size=3,data=leaders[leaders$best=="Y",])
q  <- q+  ggtitle("Leader of GOP polls and Maximum value by Candidate")+ylab("%")+xlab("")+theme_bw()
q


For the first couple of years, Palin, Huckabee and Romney continued to dominate but when the race commenced for real an amazing eleven participants – even Donald Trump – ended up topping a poll on at least one occasion

It is worthwhile looking at individual candidate’s performance over the final 18 months

?View Code RSPLUS
 p <- ggplot(subset(polls.melt,candidate %in% contenders&pollDate>"2010-12-31"),aes(pollDate,pc)) 
 
 p <- p+ geom_smooth(se=FALSE) +facet_wrap(~candidate) +scale_x_date(breaks = date_breaks("years"),labels = date_format("%Y"))
p <- p +  ggtitle("Smoothed results of National Polls - GOP Race")+ylab("%")+xlab("")+theme_bw()
p <- p+ theme(strip.text.x = element_text(colour="White", face="bold"),
           strip.background = element_rect( fill="#CB3128"))
p

Once Palin and Huckabee had proved uninspiring, the field narrowed to the cultish Ron Paul, the ‘meh’ candidate, Romney, and a host of short-lived shooting stars