Notes on a Scandal – When Jimmy beat Katy

No the title doesn’t refer to how Katy Perry suffered at another of Jimmy Savile’s sexual predelictions, although these are two of  the participants. I’ll get to the details later

Just over a year ago, I reflected on the relative wiki searches of leading female singing celebrities, including Ms Perry. In the light of the recent Jimmy Savile scandal, I thought to revisit the area.

For the first post, I relied on code from a now-defunct web site and had not examined the raw data. It now appears to me as though wiki are not providing the information in the same way. The good news is that they offer a web page with daily searches for each month in JSON format, which actually simplifies matters

For this exercise, I have produced a function which collects and tabulates data for a set of people, produces graphs of their individual daily count data from the beginning of 2008 onwards and creates a group graph within a specified date range. The code is shown at the bottom of the page

Here is some of the output for some of the people mentioned during the scandal coverage

Savile, naturally, leads the way with ex-glam rock star, Gary Glitter, following. This probably reflects his generally greater fame and the severity of the allegations against him compared with DJ, Dave Lee Travis, and dead actor, Wilfrid Brambell

Now for the summary table. The difference between median and mean reflects the situation of steady daily searches punctuated by leaps when publicity occurs

Interestingly, the scandal has not produced the maximum search count for any of the four.

  • Dave Lee Travis peaked when Burmese pro-democracy leader Aung San Suu Kyi said his World Service programme had given her a lifeline
  • Over the timespan of the scandal, Savile’s travails in terms of searches are significant but his death sparked the individually highest rate
  • A TV show, detailing a feud between Brambell and his co-star of “Steptoe and Son”, Harry H Corbett, led to the former’s highest search on Wikipedia

Glitter’s graph shows several peaks before this month representing chronologically; his release from Thai jail and attempt to avoid returning to the UK; the mockumentary, “The Execution of Gary Glitter” shown on Channel 4; and incorrect rumours that he was planning a new tour

So how did Jimmy beat Katy? With a max search almost double her highest of 101,922

?View Code RSPLUS
 
# Packages required
library(RJSONIO) # acquiring and parsing data
library(ggplot2) # graphs
library(plyr) # creation of summary data
 
# create dataframes for all and summary data
allData <- data.frame(count=numeric(),date=character(),name=character())
summaryata <- data.frame(name=character(),mean=numeric(),median=numeric(),max=numeric(),maxdate=character()) #maxdate=date() causes error
 
# create variables for url
month <- c("01","02","03","04","05","06","07","08","09","10","11","12")
year <- c(2008:2012)
 
# function with default dates for comparison graph
wikiFun <- function(person, startDate="2012-09-01",endDate="2012-11-01") {
 
 
  for(k in 1:length(person)) {
    # create dataframe for individual records
    df <- data.frame(count=numeric()) 
 
    for (i in 1:length(year)) {
      for (j in 1:length(month)) {
 
        url <- paste0("http://stats.grok.se/json/en/",year[i],month[j],"/",person[k])
        raw.data <- readLines(url, warn="F") 
        rd  <- fromJSON(raw.data)
        rd.views <- rd$daily_views 
 
        df <- rbind(df,as.data.frame(rd.views))
      }
    }
 
    # create a df with all peoples search counts by day
    df$date <-  as.Date(rownames(df))
    df$name <- person[k]
    colnames(df) <- c("count","date","name")
    df <- arrange(df,date)
    allData <- rbind(allData,df)
 
    # set title display and save individual's graph
    theTitle <- paste0("Daily Wikipedia searches for ",person[k])
    q <- ggplot(subset(df,df$count>0),aes(x=date,y=count))+geom_point()+xlab("")+ylab("")+ggtitle(theTitle) # individual plot prints to screen
     dev.new()
       plot(q)
       fname <- paste0("ws_",gsub(" ","",person[k]),".png")
       dev.copy(png,file=fname)
    dev.off()
 
  }
 
 
  # display and save group graph using log scale for counts
  p <- ggplot(subset(allData,count>0&date>=as.Date(startDate, "%Y-%m-%d")&date<=as.Date(endDate, "%Y-%m-%d")),aes(x=date,y=count, colour=name))+geom_line()+xlab("")+ylab("")+ggtitle("Comparison of Daily Wikipedia searches")  + coord_trans(y="log2") #+scale_y_continuous(formatter=comma) caused error
  dev.new()
    plot(p)
    dev.copy(png,file="group_graph.png")
  dev.off()
 
  # calculate summaries , display and save
  summaryData <- ddply(subset(allData,count>0),.(name), summarize, mean=mean(count), median=median(count), max=max(count), max_date=date[which.max(count)] )
  print(summaryData)
   write.csv(summaryData,"group_data.csv")
 
 
}
names <- c("Gary Glitter","Jimmy Savile","Dave Lee Travis","Wilfrid Brambell")
wikiFun(names)
Short URL: http://tinyurl.com/l2r72hw

2 thoughts on “Notes on a Scandal – When Jimmy beat Katy

  1. Scott Miller

    Replace
    windows()
    with
    dev.new()
    (2 places)
    to run under Linux (should work for Windows and Mac, too). Otherwise, ran exactly as advertised. Educational and entertaining; I have a new toy.

  2. pssguy Post author

    Thanks for the suggestion which I have now implemented. It is fun to play with. I have already made one more post and probably have another one or two pending. Let me know of any interesting analysis you come up with

Leave a Reply