I want to incorporate more R into my classes at Dalhousie. Problem is, I am a pretty bad R coder– I spent much of the past decade or so with SPSS and Mplus. But there’s lots of evidence that R is the future of science. I find that the best way to learn is project-based, so I’m going to start blogging on R code. I’m going to focus on topics that are inherently interesting to me, with a focus on data visualization. If I keep it fun, I’m more likely to stick with it.
So, to start I’m going to analyze data from the Pathfinder Monster Database, a comprehensive database of all 2812 monsters from Paizo’s tabletop roleplaying game, Pathfinder. I’ve played Pathfinder for years now and there are a lot of crunchy numbers in there. Probably why I like it so much! I’m going to look at the relationship between creature type two outcome variables (a) Armor Class (i.e., how hard the creature is to hit) and (b) Challenge rating (i.e., how tough the monster is overall). The goal is to see what creature type is “toughest” overall.
The data needed a little bit of cleaning (e.g., changing “Dragon” to “dragon” for some entries), but it was in good shape overall. I decided to try out ridge plots as the way to visualize the data, since I’ve never used them before. First thing to do is load the necessary libraries into R.
library(ggplot2) library(ggridges) library(dplyr) library(ggExtra)
Next, since I want the two plots to be in order from highest to lowest values of AC/CR, I need to use the next bit of code which requires dplyr. This creates two new variables I can use to re-order the y-axis with later. I also created a color palette of 13 random colors, since there are 13 creature types and I didn’t like the default ggplot2 colors here.
<h1>Order variables by AC</h1> avg <- mydata %>% group_by(Type) %>% summarise(mean = mean(AC)) ACorder <- avg$Type[order(avg$mean)] <h1>Order variables by CR</h1> avg2 <- mydata %>% group_by(Type) %>% summarise(mean2 = mean(CR)) CRorder <- avg2$Type[order(avg2$mean2)] <h1>Create color palette</h1> pal <- rainbow(13)
Ok, now I can create the two plots using the geom_density_ridges() function. This needs the ggridges package to function, as base ggplot2 can’t do this.
ggplot(mydata, aes(x = CR, y = Type, fill = Type)) + geom_density_ridges() + theme_ridges() + theme(legend.position = "none") + scale_y_discrete(limits = CRorder) + scale_x_continuous(limits = c(0,30), breaks = seq(0, 30, 5)) + scale_fill_manual(values = pal) + labs (y = "", x = "Challenge Rating") ggplot(mydata, aes(x = AC, y = Type, fill = Type)) + geom_density_ridges() + theme_ridges() + theme(legend.position = "none") + scale_y_discrete(limits = CRorder) + scale_x_continuous(limits = c(0,50), breaks = seq(0, 50, 5)) + scale_fill_manual(values = pal) + labs (y = "", x = "Armor Class")
So, the toughest monster types in Pathfinder are dragons, followed by outsiders. The weakest monster types are vermin and animals. The ranking of toughness by CR and AC are exactly the same, as it turns out. However, the distribution for oozes are way different than everything else: These creature types tend to be really easy to hit, but are still tough because of lots of other abilities and immunities. The positive skew in the distributions for CR are interesting, since it shows that there are generally a LOT more monsters under CR 10, which makes sense given that very few games get to such high levels.
I like ridge plots. They work a lot better than overlapping histograms when there are lots of groups and lots of cases. There was a bit of difficulty with numbers less than 1 for the CR plot (e.g., some CRs are 1/3). Without the “scale_x_continuous(limits = c(0,50)” function, the graph displayed values less than 0, which is outside the range of actual data. I believe that the graph is now bunching all the CRs that are less than 1 (~217 data points) as “0” on the graph above. Overall, a fun first attempt, and neat data to work with.
Datafile and syntax available on the blog’s OSF page.