Stat's Cradle: The Lord of the Pentagons: Return of This Thing

Guess what? Today, I missed my first Phillies game of the year! There will be more were that came from, considering I don't have Comcast and don't feel like paying for MLBTV, but I digress. The point is that the 2013 regular season is (relatively) just around the corner, so if it's time for the players to get into shape, it's probably a good idea for me to do so as well. You know, from a writing perspective. Let's be real, there's no aerobic activity going on here.

It's hard to pick a subject to start off a season like this, so my inspiration had to come from an external source. I was recently informed by my friend Matt that as part of a discussion on the merits of spider/radar graphs, he had shown my revolutionary analysis by way of the terribly-named Vigdergon. I had already meant to revisit that means of analysis with a bit more frequency this season, so it really was just a little nudge in this direction that I needed. In case you missed it, check out the first two posts here and here, and once you're done catching up, let's leap into action!

So what are we doing this time, besides, you know, throwing the One Ring back into the fires of Mount Doom? Well, I've made some modifications this time around that allow me to represent more orthogonal (i.e. conceptually distinct) statistical categories, especially for pitchers. Here's what I've adjusted:

	Pitchers		Hitters
2011	2013	2011	2013
K/BB	K/BB	BB/K	BB/K
IP/G	IP/G	BA	BA
FIP	Opp BA	ISO	ISO
LOB%	GB%	Spd	BsR
WHIP	SwStr%	UZR/150	Fld

Here's a quick primer for the changes:

Pitchers: I removed FIP (fielding independent pitching) for the reason I mentioned earlier about having distinct categories. If I'm trying to represent a player on 5 different axes, using a stat that relies heavily on the others isn't the way to go. So, to try to represent the different proficiencies of pitchers, I replaced FIP with GB% (percentage of balls in play that are grounders) and SwStr% (percentage of pitches that cause a swing-and-miss), which can more effectively distinguish between a pitcher that pitches to contact and one that has dominant "stuff". Similarly, instead of using WHIP and K/BB, which are both influenced by walk rate, I went with K/BB and Opponents' Batting Average, which combined provide something similar to WHIP but separately can paint a different picture (think about a guy like Joe Blanton who throws a lot of strikes, resulting in a high K/BB, but gets hammered because he throws a lot of strikes, raising his OBA).

Hitters: While I was satisfied with the orthogonality of the stats for hitters in the first iteration, I discovered some issues with the fielding and speed stats I was using. First of all, I was using FanGraphs' Spd stat, which is ported from a Bill James statistic that only uses standard (read: old) metrics, which don't really effectively capture the nuances of baserunning value. I replaced it with FanGraphs' newer BsR stat, which sums the values found by the UBR (Ultimate Base Running) and wSB (Weighted Stolen Base runs). I also switched my fielding statistic of choice from Ultimate Zone Rating to FanGraphs' Fld stat, pretty much exclusively because UZR doesn't try to measure catcher fielding, and I wanted to put Buster Posey on a graph.

So what can we visualize now that we couldn't before? Most visibly, catcher defense and nuances in pitcher approach. Take a look at this representation of National League MVP Buster Posey and American League MVP frontrunners Miguel Cabrera and Mike Trout.

It's not hard to see how sabermetricians tried to make their point with Trout over Cabrera in the AL race. Trout has a dramatic advantage over Cabrera in both fielding and baserunning and only minor deficits in the other categories, producing a curve that covers 25% more of the total area of the graph, which is actually in line with the WAR difference between the two. Posey's defense gives him a big boost as well, actually helping his graph cover more total area than Cabrera's, and also corroborating his increased WAR relative to Cabrera.

How about the AL Cy Young?

Justin Verlander, David Price, and Felix Hernandez were the top three American League pitchers in 2012 by WAR, but you can see by this graph that they got there through some different means. Verlander and Hernandez used dominant stuff and longer starts to get an advantage, but Price's increased groundball rate despite the below-average swinging strike percentage allowed him to keep his ERA the lowest of the three, and his 20-5 record against Verlander and Hernandez' combined 30-17 record probably helped cinch the award.

And finally, a quick stats tangent. Using a quick r-squared calculation, it turns out that the total area of these Vigdergons accounts for 65% of the variability in WAR among players in 2012. That's a pretty impressive number when you consider that there are only 5 stats used per player, and only two of the 10 stats used actually represent runs scored or prevented (baserunning and defense), which is the computational basis for WAR.

This season I'll be using these graphs to do all sorts of things, including comparisons from year to year and across different splits (e.g. home-away, month-by-month), so get used to it. And you'll be extra prepared for your Dance Dance Revolution sessions! No? Just me? Suit yourself.

Monday, February 25, 2013

The Lord of the Pentagons: Return of This Thing

No comments:

Post a Comment