clock menu more-arrow no yes

Filed under:

Twinkie Town Analytics Fundamentals: Busting the Myths of Pitcher Wins and ERA

New, 22 comments

Part 2: Who should get the credit? Building the foundation of evaluating pitcher performance

Texas Rangers v Minnesota Twins
In the mid 2000s, Johan Santana was the best pitcher in the game. But traditional metrics undervalue his dominance.

This is the second lesson in the series Twinkie Town Analytics Fundamentals. For more information on this series, what I hope to achieve by doing it, and the topics that will be covered please take a look at the series Introduction. If you have topics you’d like to be explored in this series, please leave a comment and let me know!

Previously:

Part 1: The Flaws of Batting Average


In 2004, his first season as a full-time starting pitcher, Twins left-hander Johan Santana won the American League Cy Young award after posting a 20-6 win-loss record in 34 starts with a league-leading 2.61 earned run average (ERA). Two seasons later, in 2006, Santana took home his second AL Cy Young award after pacing the major leagues with 19 wins and a 2.77 ERA in 34 starts.

Sandwiched between the two seasons, Santana finished just third in the AL Cy Young voting in 2005 following a 16-win, 2.87 ERA, 33 start campaign. The AngelsBartolo Colon and the Yankees Mariano Rivera finished first and second in the 2005 voting. Colon paced the AL with 21 wins and posted a respectable 3.48 ERA. Rivera, the future Hall of Fame closer, posted the best ERA of his career at 1.38.

By the traditional measures of pitcher performance – wins and ERA – the case was made in 2005 that Colon and Rivera had better seasons than the Twins ace. Are wins and ERA really the best way to evaluate an individual pitcher? This question makes Johan Santana and the 2005 AL Cy Young vote a great place to begin our second baseball analytics lesson.


Busting the Myth of Pitcher Wins

Throughout baseball history, perhaps the most commonly accepted measure of pitcher performance has been Wins. The game’s lore is filled with stories and tales that lionize the singular man on the mound defeating the opposing team to earn a win for his team. Winning twenty games in a season is recognized as a gold standard achievement. Leading your league in wins frequently results in winning the Cy Young award. Three hundred wins in a career makes a pitcher a legend and an almost certain Hall of Famer. In broadcasts a starting pitcher’s win-loss record is almost universally the first statistic provided in the game’s introduction. The point is, we’ve been led to believe that wins are a good way to evaluate a pitcher.

But is that really true? What is a pitcher win anyway?

Major League Baseball defines a win like this: “A pitcher receives a win when he is the pitcher of record when his team takes the lead for good”… … “A starting pitcher must pitch at least five innings (in a traditional game of nine innings or longer) to qualify for the win.” So, in plain terms – a pitcher gets a win if he’s pitching when his team scores to take the lead and keeps it until the end of the game. And for starting pitchers, they must complete five innings to qualify for a win.

Right away, with this definition, we should see something that makes us question. A pitcher receives a win… when his team takes the lead. Baseball is a team sport and winning a game requires a successful combination of hitting, pitching, defense, and baserunning. How can a pitcher singularly get credit for a win when so many others contribute in other ways? Why are we giving an individual credit for a team outcome?

The Athletic’s Keith Law, in his book Smart Baseball, has a nice approach for thinking about this, which I’ll paraphrase here. Think of a game of baseball as a pie cut into 10 equal slices (shown in the graphic below). Inherently, half the game is about scoring runs (5 slices) which is almost entirely attributable to players other than the pitcher (especially in the American League). Johan Santana scored exactly one offensive run in 2005. The other half of the game is about stopping your opponent from scoring runs (5 slices). The pitcher clearly plays a role in this run prevention half of the game, but how much is he individually responsible for? Ultimately this half is dependent on many interactions between the pitcher, the opposing hitter, and the fielders behind the pitcher. If the ball is put in play, one or more fielders are going to be involved. So, we have to give the fielders at least a slice. Maybe four slices then remain for the pitcher. However, it is rare in today’s game for one pitcher to complete a full game, so those four slices often must be split across multiple pitchers. The illustration might be simplistic, but the point is a single pitcher is perhaps responsible for only 3 of the 10 slices needed to win a game, yet we give them credit for a win as if they own all ten.

More than that, over time we’ve come to use pitcher wins as an evaluation tool for comparing how “good” individual pitchers are or to settle disputes about who had the better season. But wins say nothing of how well a pitcher pitched. You can get a win in an 11-10 game just as you can get a win 1-0. Somehow wins became a proxy for pitcher success. Bartolo Colon had more wins that Johan Santana in 2005, therefore he had a better season and is more deserving of the Cy Young award? Is that true?

All pitcher wins as a stat tells us is that Pitcher X was on the mound when his teammates scored the runs to take the lead and he and yet another set of teammates (most likely, given how the game is played today) were able to keep that lead throughout the remainder of the game until the team won. It does not tell us anything about Pitcher X’s performance in that game or relative to other pitchers. So, wins are a bad measure that don’t really tell us anything at all relative to an individual pitcher’s performance. To win a game is a team effort and assigning credit for a win to a single player is illogical. What might be better?

ERA and The Challenge of Measuring Pitcher Performance

Perhaps realizing the flaw with pitcher wins and seeking a better way to determine if a pitcher was good, baseball statistician and writer Henry Chadwick invented earned run average in the late 1800s. ERA is an improvement over wins as a measure of pitcher performance. ERA measures the number of earned runs a pitcher allows per nine innings and its formula is:

Advancing the school of thought beyond giving an individual credit for team outcomes, ERA recognizes that a pitcher’s fundamental job is to prevent runs from scoring. It gives us an easy metric to understand and communicate the average number runs a pitcher allows per nine-inning game. And like pitcher wins, it has become a very mainstream measure commonly used immediately after giving a pitcher’s win total in discussions and broadcasts. Having a metric like this is good and useful and it solves the problem of evaluating pitcher performance, right?

The answer is yes… and no. The name and definition of the statistic highlight something that we need to pay closer attention to because it exemplifies the difficulty of evaluating pitcher performance – the term “earned run.” An earned run is defined as any run that scores against a pitcher without the benefit of an error or a passed ball. This implies a run scored with those benefits is unearned by the pitcher. And unearned runs score all the time. From 2004-2006 there were 14 unearned runs scored when Johan Santana was pitching. The Twins as a team allowed 56 unearned runs in 2005 alone, almost 8.5% of their total runs allowed. Excluding runs that are scored as a result of fielding mishaps, ERA is getting us closer to measuring the performance of the pitcher, solely. But it’s not quite that cut and dried.

I’ll illustrate with an example from the 2005 season. On July 16, 2005 the Twins faced the Angels in the Metrodome. The starting pitchers were Johan Santana and Bartolo Colon (yep, them again). In the top of the first inning, the Angels play by play went like this:

  • Figgins reached on E3
  • Wild pitch; Figgins to 2B
  • Erstad groundout to first (1 out); Figgins to 3B
  • Guerrero reached on E5; Figgins scored (unearned); Guerrero to 2B
  • Anderson single to RF; Guerrero to 3B
  • Molina strikeout swinging (2 out)
  • Rivera single to CF; Guerrero scored (unearned); Anderson to 2B
  • Cabrera double to LF; Anderson scored (unearned); Rivera to 3B
  • Sorensen groundout, SS-1B (3 out)
  • Summary: 3 runs, 3 hits, 2 errors, 2 left on base; Angels 3, Twins 0

The three runs scored were all unearned against Santana, making his ERA for the inning 0.00. But did he not have any responsibility for the three runs scoring? Yes, the two errors added baserunners that probably should not have been on base. But he threw the wild pitch that moved Figgins into scoring position. He allowed the single to Rivera that scored Guerrero and the double by Cabrera that scored Anderson. When we think of evaluating performance, is it really the best thing to completely absolve Santana of any responsibility for the runs scoring?

More than that, bad defense is not only about errors (we’ll get more into defense in a later post in this series). How do we account for defenders not getting to a ball at all? Historically, a ball that falls in for a hit is attributable to the pitcher. But defense is also a skill, and some defenses convert more batted balls into outs than others. Different pitchers play in front of different defenses and some are better than others. Shouldn’t that be a factor in evaluating performance too? With ERA, we’re completely overlooking Santana’s role in preventing unearned runs from scoring and the defense’s role in catching the batted balls.

What Pitchers Control - Defense Independent Pitching

The crux of this whole issue is that a pitcher isn’t solely responsible for the runs he allows (or doesn’t allow). Teasing out who should get the credit and responsibility for outcomes on the run prevention side of the game has always been a tricky proposition because it is the result of numerous interactions between pitchers, hitters, and fielders. The fact is, a pitcher’s performance is impacted by all kinds of events that are outside of the pitcher’s control. At the core, we want to judge pitchers on their ability to prevent runs. But preventing runs is a team effort. Determining how to best measure pitcher performance at the individual level within that context has been a major focus of analytics for the past several decades. Fully understanding who is responsible for outcomes on the run prevention side of the ball remains a challenge, but it’s an area where we have made significant strides in recent history.

The biggest stride made for evaluating pitchers individually was the recognition there are only some things a pitcher can control – strikeouts, walks, home runs allowed, and hit by pitches – and everything else, including a ball put in play, is outside the control of the pitcher. This breakthrough forms the backbone of an analytic concept known as Defense Independent Pitching. The main idea is if we’re truly evaluating pitchers as individuals, we should do so with statistics that are independent of the performance of the defenses that play behind them. This theory became more understood when research by Voros McCracken demonstrated how little control pitchers have over balls in play outcomes in early 2001. Fundamental to DIPS theory is Batting Average on Balls in Play (BABIP), which measures how often a ball put in play goes for a hit. The main takeaway of McCracken’s research was that pitchers have almost no control over their BABIP results, due to its dependence on team defense and random luck. BABIP is known to swing wildly from season to season and even within seasons, which could wreak havoc on a pitcher’s traditional measures like ERA. This lack of consistency over time is a good indication it has little to do with a pitcher’s skill. McCracken uses Hall of Famers Greg Maddux and Pedro Martinez as examples. In 1998, Maddux had one of the best BABIPs in baseball. In 1999, he had one of the worst. Same thing for Pedro in 1999 and 2000. If BABIP were in the pitcher’s control and driven by skill it would be correlated season to season and we’d expect two of the greatest pitchers of all time to have consistently good BABIPs year to year. But it’s not and pitchers just don’t have much control over what happens when the ball is put into play (precisely how much control is a topic for a future part of this series and the next frontier of sabermetrics.)

Over the past twenty years, this line of thinking has become more mainstream, and has led to a new class of Defense Independent Pitching Statistics (DIPS) that attempt to strip out the effect of defense on pitcher performance. As we’ll see below, when we evaluate the 2005 Cy Young candidates in terms of what they individually could control, it’s clear the wrong player won the award.

The DIPS Family: Three True Outcomes and FIP

When we dig into DIPS, we start with only those statistics that a pitcher actually has control over – strikeouts, walks, home runs allowed, and hit by pitches. That’s it. Everything else – wins, losses, earned runs, hits allowed, and so on – are impacted to some degree by the defense. Home runs, strikeouts, and walks have come to be known colloquially as the “Three True Outcomes.”

This all leads us to the metric that we should rely on when evaluating pitcher performance, Fielding Independent Pitching (FIP). In simple terms, FIP measures what a pitcher’s ERA would be, if the pitcher experienced league average results on balls in play. Essentially, FIP strips out the role of the defense and luck, which makes it a more stable evaluator of pitcher performance. The main inputs of the formula are home runs allowed, walks allowed, strikeouts, hit by pitches, and innings pitched – all things that are in the control of the pitcher. I’ll spare you the math (check the link above if you’re interested) but the whole point is to have a metric that is on the same scale as ERA, that evaluates the pitcher’s individual performance, without the noise of the performance of the defense.

Let’s return to the 2005 AL Cy Young Race.

Did Colon and Rivera really have a better season than Santana? Looking specifically at the measures that pitchers can control, the picture starts to change. Santana had better home runs allowed and strikeout rates per nine innings than Colon, and their walk rates were essentially the same. Santana’s 9.25 K/9 rate led the American League. Rivera was better in HR/9 but worked only about one-third the innings pitched of Santana and Colon. By those measures, it seems Santana had the better year.

We discussed last week when we broke down batting average, it’s clunky to use multiple numbers. This is where we turn to FIP as the single, convenient number to evaluate pitchers. And by FIP, Santana clearly had a better season than the Cy Young winner Colon. In hindsight, and as ESPN’s Jayson Stark said at the time, Johan Santana should have won the 2005 AL Cy Young. Ultimately, Santana should have won three in a row 2004 to 2006, a feat that has never been accomplished in the American League. If only we had been as analytically inclined, then.

Lesson Takeaways

  • Wins are team statistic and awarding an individual pitcher full credit for a team outcome is flawed logic.
  • Fundamentally, baseball is about scoring and preventing runs. Offensive players are almost solely responsible for scoring runs. Pitchers rarely impact their own team’s run scoring.
  • Pitchers play a large role in preventing runs from scoring against their team, but they can’t do so without contributions from their defense.
  • Wins fail to account for these contributions from offensive and defensive players.
  • Measuring a pitcher’s performance in preventing runs from scoring is more informative and a better way to evaluate performance.
  • Earned Run Average (ERA) is a good start to measuring a pitcher’s ability to prevent runs, but it is limited because it ignores the impacts and performance of the defense.
  • Pitchers do not have control over the team’s defensive performance and ERA assumes a pitcher is solely responsible for the runs that do or do not score when he is pitching.
  • Outcomes that pitchers can control are limited to strikeouts, walks, home runs allowed, and hit by pitches.
  • Research has proven that pitchers have very little control over what happens when a ball is hit in play. Ball in play outcomes are driven largely by defensive performance and random luck.
  • To truly evaluate a pitcher’s individual performance, we should focus on defense independent pitching statistics that strip out the performance of defense on the pitcher’s ability to prevent runs.
  • Statistics like strikeout rate, walk rate, home run rate are fundamental building blocks of Defense Independent Pitching Statistics (DIPS).
  • Fielding Independent Pitching (FIP) is the most mainstream DIPS metric. It is scaled like ERA and derived from defense independent statistics, making it a much better statistic for valuing and comparing pitching performance.

Test Your Knowledge: Five Quiz Questions

Test your knowledge by analyzing the below blind player comparisons. In each comparison, determine which player season had the better performance. The answers are below:

Answer Key (Answers in Bold):

#1: A = 2004 Brad Radke (FIP: 3.55), B = 2004 Carlos Silva (FIP: 4.36)

#2: C = 2017 Jose Berrios (FIP: 3.84), D = 2017 Ervin Santana (FIP: 4.46)

#3: E = 2010 Carl Pavano (FIP: 4.02), F = 2010 Francisco Liriano (FIP: 2.66)

#4: G = 1991 Scott Erickson (FIP: 3.76), H = 1991 Kevin Tapani (FIP: 3.49)

#5: I = 1997 Brad Radke (FIP: 3.81), J = 1997 Bob Tewksbury (FIP: 3.51)


References:

The data and sources are cited or linked throughout this post. Like others who have tried to write and explain these subjects before, I relied significantly on the following resources:


John is a contributor to Twinkie Town with an emphasis on analytics. He is a lifelong Twins fan and former college pitcher. You can follow him on Twitter @JohnFoley_21.