There’s been a recent spat between the heavy metal bands Sepultura and Soulfly. For those unaware of the history, 50% of Sepulture used to be the Cavalera brothers (Max and Igor) until Max (the frontman and guitarist) left the band in 1996 and formed Soulfly. The full story is here. There’s a lot of bad blood even 20 years later, and according to a recent story on metal sucks, Soulfly’s manager (and Max’s wife) Gloria Cavalier recently posted a fairly pointed post on her Facebook page. This got picked up by my favourite podcast (the metal sucks podcast). What has this got to do with me, or statistics? Well, one of the presenters of the metal sucks podcasts asked me this over Twitter:

After a very brief comment about needing to operationalise ‘better’, I decided that rather than reading book proofs I’d do a tongue in cheek analysis of what is better max or no max and here it is.

First we need to operationalise ‘better’. I have done this by accepting subjective opinion as determining ‘better’ and specifically ratings of albums on amazon.com (although I am English metal sucks is US based, so I thought I’d pander to them and take ratings from the US site). Our questions then becomes ‘is max or no max rated higher by the sorts of people who leave reviews on Amazon’. We have operationalised our questions and turned it into a scientific statement, which we can test with data. [There are all sorts of problems with using these ratings, not least of which is that they tend to be positively biased, and they likely reflect a certain type of person who reviews, often reviews reflect things other than the music (e.g., arrived quickly 5*), and so on … but fuck it, this is not serious science, just a bit of a laugh.]

The first question is whether post-max Sepultura or Soulfly are rated higher. Figure 1 shows that the data are hideously skewed with people tending to give positive reviews and 4-5* ratings. Figure 2 shows the mean ratings by year post Max’s departure (note they released albums in different years so the dots are out of synch, but it’s a useful timeline). Figure 2 seems to suggest that after the first couple of albums, both bands are rated fairly similarly: the Soulfly line is higher but error bars overlap a lot for all but the first albums.

There are a lot of ways you could look at these data. The first thing is the skew. That messes up estimates of confidence intervals and significance tests … but our sample is likely big enough that we can rely on the central limit theorem to do its magic and let us assume that the sampling distribution is normal (beautifully explained in my new book!)

I’m going to fit three models. The first is an intercept only model (a baseline with no predictors), the second allows intercepts to vary across albums (which allows ratings to vary by album, which seems like a sensible thing to do because albums will vary in quality) the third predicts ratings from the band (Sepultura vs Soulfly).

Just because this isn’t fun enough, we could also just look at whether either Sepultura (post 1996) or Soulfly can compete with the Max-era-Sepultura heyday.

Interestingly if you write yourself a little bootstrap routine to get some robust confidence intervals around the parameters:

Look, this is just a bit of fun and an excuse to show you how to use a bootstrap on a multilevel model, and how you can use data to try to answer pointless questions thrown at you on Twitter. Based on this hastily thrown together analysis that makes a lot of assumptions about a lot of things, my 120 character twitter response will be: Sepultura Max better than everything, but post 1996 Max is no better than No Max;-)

morbid<-c(rep(1,2), rep(2, 4), rep(3, 3), rep(4, 8), rep(5, 36))

Schizo<-c(rep(1,1), rep(2, 2), rep(3, 4), rep(4, 10), rep(5, 33))

remains<-c(2, rep(3, 5), rep(4, 9), rep(5, 104))

Arise<-c(rep(2, 2), rep(4, 16), rep(5, 89))

Chaos<-c(rep(1,4), rep(2, 2), rep(3, 9), rep(4, 20), rep(5, 120))

Roots<-c(rep(1,9), rep(2, 8), rep(3, 17), rep(4, 24), rep(5, 94))

Against<-c(rep(1,16), rep(2, 14), rep(3, 11), rep(4, 20), rep(5, 32))

Nation<-c(rep(1,3), rep(2, 7), rep(3, 6), rep(4, 22), rep(5, 19))

Roorback<-c(rep(1,6), rep(2, 6), rep(3, 5), rep(4, 13), rep(5, 20))

Dante<-c(rep(1,1), rep(2, 3), rep(3, 4), rep(4, 8), rep(5, 30))

Alex<-c(rep(1,1), rep(2, 1), rep(3, 3), rep(4, 6), rep(5, 18))

Kairos<-c(rep(1,3), rep(2, 2), rep(3, 2), rep(4, 6), rep(5, 33))

Mediator<- c(rep(1,0), rep(2, 3), rep(3, 4), rep(4, 6), rep(5, 21))

morbid<-data.frame(rep("Morbid", length(morbid)), rep(1986, length(morbid)), morbid)

Schizo<-data.frame(rep("Schizo", length(Schizo)), rep(1987, length(Schizo)), Schizo)

Remains<-data.frame(rep("remains", length(remains)), rep(1989, length(remains)), remains)

Arise<-data.frame(rep("Arise", length(Arise)), rep(1991, length(Arise)), Arise)

Chaos<-data.frame(rep("Chaos", length(Chaos)), rep(1993, length(Chaos)), Chaos)

Roots<-data.frame(rep("Roots", length(Roots)), rep(1996, length(Roots)), Roots)

Against<-data.frame(rep("Against", length(Against)), rep(1998, length(Against)), Against)

Nation<-data.frame(rep("Nation", length(Nation)), rep(2001, length(Nation)), Nation)

Roorback<-data.frame(rep("Roorback", length(Roorback)), rep(2003, length(Roorback)), Roorback)

Dante<-data.frame(rep("Dante", length(Dante)), rep(2006, length(Dante)), Dante)

Alex<-data.frame(rep("Alex", length(Alex)), rep(2009, length(Alex)), Alex)

Kairos<-data.frame(rep("Kairos", length(Kairos)), rep(2011, length(Kairos)), Kairos)

Mediator<-data.frame(rep("Mediator", length(Mediator)), rep(2013, length(Mediator)), Mediator)

names(morbid)<-c("Album", "Year", "Rating")

names(Schizo)<-c("Album", "Year", "Rating")

names(Remains)<-c("Album", "Year", "Rating")

names(Arise)<-c("Album", "Year", "Rating")

names(Chaos)<-c("Album", "Year", "Rating")

names(Roots)<-c("Album", "Year", "Rating")

names(Against)<-c("Album", "Year", "Rating")

names(Nation)<-c("Album", "Year", "Rating")

names(Dante)<-c("Album", "Year", "Rating")

names(Alex)<-c("Album", "Year", "Rating")

names(Kairos)<-c("Album", "Year", "Rating")

names(Mediator)<-c("Album", "Year", "Rating")

SepMax<-rbind(morbid, Schizo, Remains, Arise, Chaos, Roots)

SepMax$Band<-"Sepultura Max"

SepMax$Max<-"Max"

SepNoMax<-rbind(Against, Nation, Dante, Alex, Kairos, Mediator)

SepNoMax$Band<-"Sepultura No Max"

SepNoMax$Max<-"No Max"

soulfly<-c(rep(1,8), rep(2, 9), rep(3, 4), rep(4, 16), rep(5, 89))

primitive<-c(rep(1,11), rep(2, 5), rep(3, 5), rep(4, 19), rep(5, 53))

three<-c(rep(1,1), rep(2, 10), rep(3, 12), rep(4, 7), rep(5, 19))

prophecy<-c(rep(1,2), rep(2, 5), rep(3, 5), rep(4, 25), rep(5, 42))

darkages<-c(rep(1,1), rep(2, 1), rep(3, 5), rep(4, 18), rep(5, 36))

conquer<-c(rep(1,1), rep(2, 0), rep(3, 5), rep(4, 5), rep(5, 31))

omen<-c(rep(1,0), rep(2, 2), rep(3, 1), rep(4, 6), rep(5, 17))

enslaved<-c(rep(1,1), rep(2,1), rep(3, 4), rep(4, 2), rep(5, 30))

savages<-c(rep(1,0), rep(2, 2), rep(3, 3), rep(4, 10), rep(5, 27))

archangel<-c(rep(1,3), rep(2, 2), rep(3, 4), rep(4, 7), rep(5, 21))

soulfly<-data.frame(rep("Soulfly", length(soulfly)), rep(1998, length(soulfly)), soulfly)

primitive<-data.frame(rep("Primitive", length(primitive)), rep(2000, length(primitive)), primitive)

three<-data.frame(rep("Three", length(three)), rep(2002, length(three)), three)

prophecy<-data.frame(rep("Prophecy", length(prophecy)), rep(2004, length(prophecy)), prophecy)

darkages<-data.frame(rep("Darkages", length(darkages)), rep(2005, length(darkages)), darkages)

conquer<-data.frame(rep("Conquer", length(conquer)), rep(2008, length(conquer)), conquer)

omen<-data.frame(rep("Omen", length(omen)), rep(2010, length(omen)), omen)

enslaved<-data.frame(rep("Enslaved", length(enslaved)), rep(2012, length(enslaved)), enslaved)

savages<-data.frame(rep("Savages", length(savages)), rep(2013, length(savages)), savages)

archangel<-data.frame(rep("Archangel", length(archangel)), rep(2015, length(archangel)), archangel)

names(soulfly)<-c("Album", "Year", "Rating")

names(primitive)<-c("Album", "Year", "Rating")

names(three)<-c("Album", "Year", "Rating")

names(prophecy)<-c("Album", "Year", "Rating")

names(darkages)<-c("Album", "Year", "Rating")

names(conquer)<-c("Album", "Year", "Rating")

names(omen)<-c("Album", "Year", "Rating")

names(enslaved)<-c("Album", "Year", "Rating")

names(savages)<-c("Album", "Year", "Rating")

names(archangel)<-c("Album", "Year", "Rating")

Soulfly<-rbind(soulfly, primitive, three, prophecy, darkages, conquer, omen, enslaved, savages, archangel)

Soulfly$Band<-"Soulfly"

Soulfly$Max<-"Max"

maxvsnomax<-rbind(SepMax, SepNoMax, Soulfly)

maxvsnomax$Band<-factor(maxvsnomax$Band)

maxvsnomax$Max<-factor(maxvsnomax$Max)

maxvsnomax$Album<-factor(maxvsnomax$Album)

sepvssoul<-subset(maxvsnomax, Band != "Sepultura Max")

sepvssoul$Band<-factor(sepvssoul$Band)

sepvssoul$Album<-factor(sepvssoul$Album)