Stat's Cradle: Zipf'rent Strokes

I’m not sure how many people actually realize this, but the NFL preseason is already underway; the Saints and Cardinals played in the Hall of Fame game this past weekend. While there really isn’t much of interest that came out of that game (that means that you’re not good enough to note your injury, Mr. Kolb), it does signify that we need to start getting ourselves prepared for the start of the real season – fantasy football season.

I’ve attempted to prepare for my fantasy drafts in a variety of ways the past few years, although it mostly involves mock drafts, listening to the ESPN Fantasy Focus podcast, and reading articles from all over the place. However, after looking at the abysmal player landscape this season – I implore you to find three running backs you feel confident about after Foster, Rice, and McCoy – I decided I needed to go at this a different way. After happening upon a piece by my high school classmate Paul Lopreiato in which he discussed applying the baseball sabermetrics concept of Wins Above Replacement to fantasy football, I decided to look into it further. Now, the idea of value above replacement in fantasy is not a new concept, but the fact that he took his own spin on it and was able to implement his idea encouraged me to try something of my own.

First, a quick primer on value-based drafting. One of the quirks about fantasy football relative to real football is that historically, running backs have incredible value compared to other positions. This is not because they score the most points, but because the top backs score so much more than the average back. Quarterbacks dominate the top fantasy scorers every year, but because so many of them score so many points, it isn’t so important to get a top one. Value-based drafting leans on that concept and tries to quantify how much more valuable a top player is at one position versus the others. In the past week or so, I’ve tried to calculate this information myself, and here’s what I’ve got.

OK, that didn’t mean anything to anyone, but numbers are just going to complicate the concept. First, the data, all acquired from FantasyPros.com: each point represents an NFL player – 29 quarterbacks, 72 running backs, 81 wide receivers, and 25 tight ends. I found statistical projections averaged over 11 different sources, which I used as a reasonable estimate of what we can expect from these players this year. The players are arranged horizontally according to their average draft position (ADP) this season (a consensus of six websites), with higher draft picks on the left. They are arranged vertically according to what Paul referred to as fantasy points above replacement (PAR), computed as the difference between a player’s projected fantasy point total and the projected points of the highest-scoring player that would not make a starting fantasy lineup at their position (i.e. the 13^th QB & TE, 25^th WR & RB in 12-team leagues). As an example, these projections expect Aaron Rodgers to score 379 points this season, while Ben Roethlisberger (the 13^th-highest-projected QB) would score 251. Therefore, Rodgers’ PAR would be 128, while Roethlisberger’s would be 0.

Now, it seems like a pretty reasonable idea that fantasy team owners should draft players that score more earlier in the draft than those that score less, after you take into consideration the player’s position. As I’ve pointed out, there is a little bit of value-based drafting already going on – 11 of the top 12 projected scorers overall are quarterbacks, but three of the top four picks, on average, are running backs. If drafters were perfectly efficient with regards to value-based drafting, the ADP rankings would be exactly the same as the PAR rankings, but this is not the case. Almost every player is getting drafted a little bit higher or lower than his relative value would suggest. Just take a look at the blue dot on the graph above that is clearly far away from everything else, and is in fact the lowest-PAR player in this sample. Say hi to Tim Tebow.

This is where my little wrinkle comes in. A few years ago, my STAT 112 professor introduced me to the Zipf law, which basically says that if you have some data that you want to rank, the graph of each data point against its rank follows a similar distribution for all kinds of data (i.e. word frequency in a book, Amazon top sellers, fantasy point scorers). Confused? Me too. Here are two pictures, one of NFL wide receivers, and one of the play counts of the songs on my iPod in 2009.

Compare the iPod graph to the graph at the top – look pretty similar?

Whoomp, there it is.

More importantly, this graph shows ADP among receivers on the x-axis and PAR on the y-axis. The colored points are wide receivers, while the black points represent the Zipf distribution of wide receiver PAR. This means that I sorted the players by PAR and then assigned a rank to each value in order (i.e. 1 for Calvin Johnson, 2 for Larry Fitzgerald, etc.). By graphing this, we can visualize what the draft order should look like, if everyone were drafting efficiently. As you can tell, most of the players aren’t being too over-or-under-valued until you get further down the draft, but you can find players earlier in the draft that are being drafted a good half-round or more above or below where they should. Now I could tell you who all these players are, but come on, I haven’t done any drafts yet! I can’t tell you my favorites so soon. Patience.

But I digress. It’s also cool to look at the tiers of players that form naturally when you display the graph like this (visualized by different colored dots for the top 30 receivers). Finding clusters with just the naked eye can help determine which players are statistically similar despite different draft prices, which gives you a chance to find some value when players that are undervalued within a tier drop into your lap. As you can see, some players in each tier are being drafted later than players in the tier below, and it’s those types of players that you want to target. Here, I’ll give you two freebies: the green dot that’s being drafted as a blue dot is Marques Colston, and the blue dot that’s being drafted as a red dot is Stevie Johnson.

Yes, there are definitely problems with this approach. Using a value-based approach as a guide for future action is only as good as your projections, and while 11 projections are better than one, there really is a limit on how confident I can be in them. Additionally, a lot of the inefficiencies I’ve found in terms of over-or-under-valued players aren’t really that impactful – the wide receiver whose draft position is furthest from his projected rank (by percent difference between the two) is Hakeem Nicks, who is being drafted as the 7^th receiver despite being projected as the 11^th. Not really that actionable, you know? But the concept is a really interesting one, and I look forward to seeing what I can pull out of it.

Wednesday, August 8, 2012

Zipf'rent Strokes

No comments:

Post a Comment