Even the baseball fan with no more than casual knowledge of advanced statistics is familiar with WAR, what the acronym stands for (Wins Above Replacement) and what it means (how many more wins a certain player is worth above a replacement-level player). Fans a little more familiar with the statistic know there are two basic measurements of WAR, the one used by Baseball Reference, abbreviated as bWAR, and the one from FanGraphs, shortened to fWAR. If you’re a devotee of advanced stats, you may know some of the differences in calculation between the two sites, but unless you work for Baseball Reference or FanGraphs, you ain’t calculating it.
Still, with WAR’s importance in determining how good certain players have been, you’d think the two methodologies would produce similar results throughout Major League Baseball. But one look at the statistics for Twins starter Kyle Gibson, whose season has been to all eyes anywhere from “up and down” to “down,” brings that hypothesis to a halt.
Baseball Reference gives Gibson a 2019 pitching WAR of 0.0 - exactly replacement-level, last in the Twins’ rotation, and 17th among all Twins who have pitched for the team this year.
FanGraphs calculates his WAR this season as 2.4. Not only is that more than two wins above his bWAR, it’s fourth among Twins pitchers - not only “the Twins’ rotation,” all Twins pitchers. Taylor Rogers - 2.2 fWAR - is fifth. (Rogers is the third-most valuable Twins pitcher per bWAR, with a 2.6, behind Jake Odorizzi’s 3.2 and Jose Berrios’ 3.0.)
How? How can two websites, looking at producing accurate values of how a player has performed, come up with such different results?
Let’s try and find out.
Calculating bWAR vs. fWAR
Fortunately, both Baseball Reference and FanGraphs go into detail in explaining how they calculate WAR, though neither gives every value that goes into their formula. I won’t retype out the entire process, but there is one definite and one possible difference that can be pointed to in explaining the two-plus win variance in Gibson’s WAR.
The possible difference, is that, while both sites calculate WAR so that cumulative MLB WAR across one season totals 1,000, FanGraphs explicitly divides that total 57 to 43 percent among position players and pitchers; all position players’ combined WAR in a season will be 570, while pitchers’ WAR in the same span will total 430. I could not find whether Baseball Reference makes a similar division, or if their position-players-to-pitchers division of WAR is consistent from season to season. But even if this is different, it would not likely make a large variance due to the sheer volume of position players and pitchers in the Major Leagues over a given season.
The definite difference is that, stripped of factors for comparing replacement level and so forth, Baseball Reference uses Runs Allowed to calculate WAR, while FanGraphs uses FIP (Fielding Independent Pitching).
In the case of Gibson, how much difference is there in the measurement of these stats?
But You Promised We Wouldn’t Have to Do Math
First, I made no such promise. Second, you don’t have to do the math because either the sites have already done it, or I’m doing it.
Starting with Baseball Reference’s measurement, Gibson has allowed a career-worst 98 runs (including unearned runs) in only 156.2 innings, which comes out to 5.63 runs every nine innings. Only in his 10-game rookie year of 2013 has Gibson put up a worse RA9.
Since we don’t know B-Ref’s exact numbers for converting this number to a comparison to replacement-level play, we can exhale knowing that the site has done that for us, with their RAA (Runs Above Average) and WAA (Wins Above Average - not the same as WAR) statistics. In these, Gibson measures out even worse than his rookie year, with a devastating minus-15 RAA and minus-1.4 WAA.
With these numbers in mind, no wonder he calculates out to a 0.0 bWAR. These numbers certainly give a picture of Gibson as no better than an average starter.
So what does FanGraphs’ numbers show?
If you’ve read Bill James or Moneyball, you’ll know that FIP is based on only those outcomes which the defense cannot control: strikeouts, walks, hit by pitches, and home runs. FanGraphs’ page explaining FIP, linked in the previous sentence, gives a table converting values of FIP to how good that pitcher is. Gibson’s 2019 FIP is 4.34, landing between the “average” and “below average” values of 4.20 and 4.40 on the table.
However, that same page alerts the reader that average FIP varies from year to year. Following their link, we can learn that the average FIP among MLB starters in 2019 is - drumroll please - 4.51.
With that value, we can see that Gibson’s FIP has actually been slightly better than league average, explaining how FanGraphs calculates his pitching as better than replacement value. I don’t see how, though, this becomes “better than two wins above replacement value” pitching, though, as Gibson’s FIP is not significantly better than average.
There is one other clue in the form of xFIP, which adjusts for how many home runs the pitcher should have allowed based on fly balls hit against them, rather than how many home runs he did allow. Gibson’s 2019 xFIP is 3.87 - significantly better than the MLB starter average of 4.49 this year. This suggests that Gibson has given up more home runs than his fly ball percentage would suggest. Unfortunately, FanGraphs does not take xFIP into account when calculating WAR, or I would think we have our solution here.
However, FanGraphs warns against using the FIP from a pitcher’s player page, meaning the last several paragraphs have been entirely pointless. To quote:
Unfortunately for those of you playing along at home, you can’t simply take the pitcher’s FIP from their player page because we treat infield fly balls (IFFB) as strikeouts for the purposes of WAR but not for the general FIP calculation found on the player’s page.
The only possibility open right now is that Gibson has induced enough infield pop-ups to better his FIP, therefore raising his fWAR, and this appears to hold up. Gibson’s 14.2 infield fly ball percentage leads the Twins’ rotation and is significantly better than the league average of 9.6 percent. Additionally, of the other Twins pitchers mentioned above, those with high infield fly rates - Odorizzi (10.3 percent) and Berríos (13.3 percent) - have a higher fWAR (4.0 for both) than bWAR, while Rogers, with a 5.9 percent infield fly rate lower than MLB relievers’ average of 10.1 percent, sees his fWAR drop below his bWAR despite having the third-best FIP (2.66) of any Twins pitcher this year.
Unfortunately, this has to remain a hypothesis. I tried calculating ifFIP, the version of FIP used by FanGraphs in their WAR calculations, but could not find one piece of information - total infield fly balls allowed - on their website. I would have approximated using infield fly ball rate times total batters faced, but could not find total batters faced either. Still, I think that though this is a very small sample size, the correlation between infield fly ball rate and fWAR is worth examining.
The point of this article wasn’t to conclude “Kyle Gibson is a bad pitcher” or “Kyle Gibson is actually a good pitcher,” but to answer the question “why does FanGraphs consider Kyle Gibson as 2.4 wins more valuable than Baseball Reference considers him?” Knowing the differences in how bWAR and fWAR are calculated, I feel comfortable suggesting, even without a confirmed conclusion, that Gibson’s high infield fly ball rate is a major contributing factor.
I hope you will excuse this interjection of math into your weekend. Please return to your normally-scheduled leisure.