clock menu more-arrow no yes

Filed under:

Twinkie Town Analytics Fundamentals: The Tyranny of the Save

New, 18 comments

Part 5: How a bad stat has contributed to bullpen mismanagement and what we should use instead

ALDS Game 4 - Yankees v Twins
Twins manager Ron Gardenhire waited two batters too long to go to his best reliever in the 2004 ALDS
Photo by Jonathan Daniel/Getty Images

This is the fifth lesson in the series Twinkie Town Analytics Fundamentals. For more information on this series, what I hope to achieve by doing it, and the topics that will be covered please take a look at the series Introduction. If you have topics you’d like to be explored in this series, please leave a comment and let me know!

Previously:

Introduction

Part 1: The Flaws of Batting Average

Part 2: Busting the Myths of Pitcher Wins and ERA

Part 3: RBIs, A Misleading Statistic

Part 4: Using Linear Weights to Measure Run Production


The Twins trailed the Yankees two games to one entering Game 4 of the 2004 American League Division Series. Ace Johan Santana provided 5 innings allowing only 1 run, reliever Grant Balfour added two more scoreless, and the Twins headed into the 8th with a 5-1 lead, hoping to keep their playoff chances alive. Manager Ron Gardenhire brought in reliever Juan Rincon to face the heart of the Yankee lineup. Rincon would allow the first 3 hitters to reach and a run to score. Now with a 5-2 lead, runners on the corners, and 1 out, Yankees DH Ruben Sierra came to the plate. Twins closer Joe Nathan (44 saves, 1.62 ERA) was in the bullpen, watching. Rincon would face Sierra and allow a game-tying 3-run homerun over the baggy on a hanging breaking ball. Now 5-5, Rincon would face another batter, allowing a double to John Olerud before Nathan was finally brought into the game. Nathan would go on to get five scoreless outs to take the game to extra innings, but the Twins would lose the game and the series in the 11th inning.

This game – and the situations and decision making of it – is the perfect backdrop for this fifth lesson of our analytics education series. In this lesson we’ll explore the fatal flaws of the save, how it has contributed to how we look at bullpen performance, changed how relief pitching is valued, and influenced managers to manage to a statistic instead of maximizing their chances to win games.


The save is one of baseball’s newer traditional statistics, having become an official statistic only in 1969. Baseball writer Jerome Holtzman is widely credited with developing and popularizing the criteria for a player to earn a save. But that’s not quite the full story. For the first eighty years or so of baseball history, the save that we are familiar with today did not exist. Relief pitchers didn’t really exist – relievers did not cover even 15% of the total innings pitched in a season until 1940. It wasn’t until this era that dedicated relief pitchers became commonplace on rosters. In 1951, Dodgers front office statistician Allan Roth (working for Branch Rickey) devised the first definition of the save, with the idea of creating a stat for relievers akin to the pitcher win (debunked in part 2 of this series). This was a noble intent, as relievers were very underappreciated relative to starting pitchers at the time and a stat such as this could help when it came to contract and salary negotiations.

For the next 25 years the criteria for the stat would be tweaked and changed every few years as baseball tinkered to find something reasonable and useful. Holtzman redefined the stat with more stringent criteria in 1959 and the American League tried a tweaked version of Holtzman’s stat on a one-year trial in 1964 before abandoning it. After another revision the save became official across MLB in 1969 as part of an overhaul of the pitching rules following the 1968 “Year of the Pitcher.” The 1969 version was a broader version than Holtzman’s that awarded a save for any pitcher other than the winning pitcher that finished the game with the lead, regardless of the margin of victory. This was changed again in 1974 to be much more stringent. That change was deemed to have gone too far and finally in 1975 the scoring rules committee revised the save rule to be what we recognize today.

A pitcher is credited with a save when:

  • 1) A pitcher is the finishing pitcher in a game won by his club; and
  • 2) He is not the winning pitcher; and
  • 3) He qualifies under one of the following conditions:
  • 3a) He enters the game with a lead of no more than three runs and pitches for at least one inning; or
  • 3b) He enters the game, regardless of the count, with the potential tying run either on base, or at bat, or on deck; that is, the potential tying run is either already on base or is one of the first two batsmen he faces; or
  • 3c) He pitches effectively for at least three innings. No more than one save may be credited in each game.

The main adjustment in 1975 was specifying a reliever would have to face the potential tying or winning runs, or pitch at least three innings to preserve a lead. This improved the definition and eliminated the craziness of a save being awarded in an 8-run game, but it didn’t solve the issues with the stat.

The issues with saves

Having a statistic (or suite of statistics) to measure the valuable contributions of relievers makes sense, especially as relievers have become a larger and more impactful part of the game. However, there are a few issues with the stat:

  • The criteria are arbitrary and context dependent
  • It devalues relief appearances that do not result in saves
  • It is not an effective measure of individual performance

Beyond these issues, the save has also altered baseball in two major ways:

  • It has influenced managerial decision making negatively
  • It has created a subjective distinction between closers and non-closers that has economic ramifications for non-closers

Arbitrary criteria and context dependence

The criteria to earn a save make it perhaps the most context dependent basic stat we have in use today. The problem is, that dependence allows for a wide variety of pitcher outings to earn a save. Let’s illustrate with some examples.

On May 28, 2005 against Toronto,, Twins closer Joe Nathan entered the game in the top of the 9th with a three-run lead. After retiring the leadoff batter, he would proceed to allow three straight hits and two runs before closing out the inning to secure a 4-3 win for the Twins. He was awarded a save for his efforts but did not pitch well.

On July 27, 2005 against the Yankees, reliever Juan Rincon was melting down in the bottom of 8th. The Twins had entered the inning with a 6-0 lead, but Rincon had allowed three runs and runners were on the corners with 2 outs. Nathan entered to try to put out the fire. He retired four of five batters faced – Bernie Williams, Derek Jeter, Robinson Cano, Gary Sheffield, and Alex Rodriguez – to earn the save.

Both outings resulted in saves for Nathan and Twins wins. But these were clearly two different appearances. Nathan’s performance was different, and the level of difficulty was different. The save says they were equivalent.

Further convoluting this is criterion 3.C – “he pitches effectively for at least three innings.” The other parts of the rule seem to suggest the save is intended to measure “clutch” pitching late in games with close scores. But then 3.C is thrown in and makes it confusing. Take for example, then rookie Francisco Liriano earning a save for three scoreless innings in a 15-5 Twins win over the Rangers on May 9, 2006. Then compare that with Liriano closing out a 6-1 Twins win with two scoreless innings just a few days earlier on May 3, 2006 against Kansas City.

One outing earned a save, and the other did not. At no point during either was the score margin less than 5 in the Twins favor and Liriano was effective in both. Somehow, a third inning in a ten-run game equates Liriano’s contributions to Nathan’s above? In what method of thinking are those appearances worthy of the same statistic? How does the save tell us anything about how the players performed?

De-valuing other relief contributions

The save also says nothing of the contributions of relief pitchers earlier in the game. In the example above against Toronto, Juan Rincon threw a scoreless, perfect 8th before Nathan’s messy save in the 9th. Rincon gets nothing in the line score, but we must wonder which player, Rincon or Nathan, contributed more to the Twins winning that game?

Or take Game 1 of the 1991 World Series. The Twins took a 5-1 lead into the 8th inning when a tiring Jack Morris walked the first two batters and was replaced by reliever Mark Guthrie. Guthrie would coax a double play before walking the next hitter. With runners on the corners, two outs, and a 4-run lead Twins manager Tom Kelly summoned closer Rick Aguilera (42 saves, 2.35 ERA) to put out the fire. Aguilera would allow one of the inherited runners to score before getting the next 4 outs to close out a 5-2 Twins win.

Guthrie objectively faced the most difficult situation in this game, yet he is forgotten while Morris gets the win and Aguilera a save. If the save was intended to track valuable relief pitching contributions, it is leaving out quite a few.

Poor measure of individual performance

We have been conditioned over time to believe high save totals are a mark of a high performing pitcher, but the fact is it just isn’t very informative to that end. In part 2 of this series I explained how the most objective way to evaluate pitching performance is with defense independent pitching stats: strikeouts, walks and home runs allowed, and fielding independent pitching (FIP).

I pulled the data of all pitchers in the past five seasons who have earned ten or more saves in a season (199 pitcher seasons) and ran correlation analyses of their save totals against their strikeout rates (K/9), walk rates (BB/9), homerun rates (HR/9), and FIP. Correlation helps us understand the relationship between two sets of data and is quick way to assess if one “predicts” the other. The closer the correlation is to positive or negative 1.0, the stronger the relationship of the data. We find in this data that save totals are not correlated well with any of these pitching performance metrics.

The strongest relationship of the four exists between Saves and FIP, but it’s weak at -0.328. The negative relationships indicate the stats move in opposite directions, which makes logical sense as FIP, HR/9, and BB/9 are better when smaller whereas save totals are better when larger.

If the math doesn’t convince you, let me add that baseball is also littered with examples of poor performing pitchers accumulating large save totals. In 1984, Twins closer Ron Davis tied for 8th in MLB with 29 saves. By objective measures of pitching performance, though, Davis had a below average season – walking 4.4 batters per 9 innings (3.18 MLB average) and posting a 4.26 FIP (3.81 MLB average). By comprehensive measures, like Wins Above Replacement (WAR – which we’ll cover in a future part of this series), Davis performed below replacement level in 1984 (-0.9 bWAR) meaning the Twins would have been better off with a generic AAA call-up filling Davis’ innings.

The disastrous 2011 season gives another example, with Matt Capps (15 saves) and a returning from Tommy John surgery Joe Nathan (14 saves) splitting closing duties at different points. Neither pitched well despite those 29 total saves and the Twins cratered to an MLB worst 63-99 record. Capps posted a 4.75 FIP and just 4.66 K/9. Nathan was only slightly better at 4.28 FIP and both players were below replacement level by Fangraphs’ version of WAR.

The lasting impacts of the Save

Beyond not giving us anything useful regarding performance, the save has dramatically impacted managers’ decision making for using relievers. Increasing bullpen usage has been a trend for more than one hundred years. Both the number of innings covered by relievers and the number of relievers used to cover those innings have increased steadily, leading to the average length of a relief outing decreasing.

The 2019 season was the fifth consecutive year in which relievers set a new high for percentage of innings pitched – just over 42%. Analytics has played a role in this trend, but it’s also the result of a biased conventional wisdom that has influenced managers to manage to the save statistic.

Pitching changes are one of the few decisions managers make that provide an observable cause and effect within the game. Along with setting the lineup, they are the most scrutinized and second-guessed decisions a manager makes. As more pitching change decisions needed to be made, managers naturally sought mental frameworks to help them navigate these difficult choices and to explain their rationale to the post game media. What emerged is the modern “closer” as a 9th inning, three out in a close game, specialist. In some ways this is due to human instinct for self-preservation. It’s hard to second guess a manager that has his closer on the mound for the final three outs of a game. Even if the lead is blown in the 9th with the closer on the mound no one will direct their anger to the manager. That trend then manifested the 8th inning three out specialist, which led to the 7th inning three out specialist. This is the approach the Royals used in the recent past with Kelvin Herrera, Wade Davis, and Greg Holland. In time these defined bullpen roles made reliever decision making formulaic for managers.

A color by numbers approach

The Twins have contributed to the broader trend of shorter relief stints and formulaic decision making too. The franchise leader list for saves includes names like 1960s stopper Al Worthington, Jeff Reardon in the late 1980s, Rick Aguilera in the 1990s, Eddie Guardado and Joe Nathan in the 2000s, and southpaw Glen Perkins in the 2010s. If you go back even further to the 1920s and 1930s, the franchise deployed who might have been the first closer in baseball history in Firpo Marberry. In that different era of gameplay, Marberry led the league in saves six seasons while also starting an average of 14 games in those seasons.

Below, I’m showing a few data points for the more recent pitchers that gives us some insight into how they were utilized – namely, the average length of their appearances (IP/App), the frequency in which they pitched in multiple innings (% Multiple IP), the frequency in which they entered the game with runners already on base (% ROB), the frequency they entered the game in the 8th inning or earlier (% <=8th), and how often they were deployed in tied or trailing game situations (% Tie or Trailing). I’ll note I only included full seasons where these players closed for the Twins.

These data paint a clear picture of shorter appearances over time, a growing preference for the closer entering the game in “clean” situations, like with the bases empty to start the 9th inning, and a trend of holding the closer to work mostly in games where the Twins already had the lead. The takeaway is managers have evolved to let their reliever decision making be dictated by the score and inning formula instead of critically examining how to use their best relief arms in the most critical game situations.

A product of opportunity

A byproduct of this evolution is the now accepted distinction of the 9th inning over previous innings and of closers over non-closers. This distinction has created an economic disparity for relief pitchers where players with high save totals get paid and those without do not. The economic system of the game is very rooted in traditional statistics and a player fortunate enough to accumulate saves is positioned to make significantly more money in free agency or arbitration than a reliever who does not accumulate saves.

Take, for example, Twins fan favorite Eddie Guardado. After several years as a solid, but unheralded, middle reliever / left-handed matchup guy, “Everyday Eddie” was given the chance to close games at the end of the 2001 season. In 2002 and 2003 he went on to accumulate 45 and 41 saves respectively for the Twins. These seasons netted him the only two All-Star appearances of his career and a $13 million-dollar free agent contract with Seattle. Through 11 major league seasons with the Minnesota organization to that point Guardado had been paid just shy of $9.5M total.

Guardado should get credit for capitalizing on the opportunity. He performed when given the chance and was rewarded – but it’s a case that illustrates how the game’s collective emphasis on saves obfuscates the relative value of relief contributions and prioritizes the ninth inning as more important than others.

A better approach: Win Probability and Leverage

So, how do we improve from here? The save has failed. What should we use instead?

First, we need to recognize and accept that the ninth inning is not always the most important and impactful set of outs in a game. While the formulaic approach to bullpen management might have made it harder to second guess decisions, it also deceived managers into making more than a few poor decisions by saving their best relief arm for the 9th. Often, the most critical situations occur in earlier innings, like the 2004 ALDS example I highlighted in this post’s introduction. We’ve long known this to be true. Even Mariners manager Bob Melvin, back in 2004 after Eddie Guardado signed with Seattle said, “He’s a guy who has proven he can get both left-handers and right-handers out at the end of a game and, typically, setup guys get bigger outs than closers.” But managers don’t always act on this knowledge.

Advanced analytics can help us understand game context and identify the criticality of different situations to aid that decision making. We’ve discussed in depth the concept of run expectancy in this series. That foundations of that concept can be extended to the idea of win expectancy (also called win probability), which is the percent chance a team will win based on the score, inning, base/out state, and the overall run environment. Like run expectancy, win expectancy is calculated using real historical data. For any given game situation, we know the percentage chance that each team has ultimately won the game. Often the flow of a game and the associated win expectancy for each situation are depicted in a chart like this one of Game 4 of the 2004 ALDS:

Each bar on the chart is a game situation and the height of the bar indicates the probability of each team winning the game given that situation. The situation pointed out by the gold arrow above is the Rincon-Sierra matchup and shows the Twins were 85% to win before Sierra batted.

From this information we can use another advanced metric called Leverage Index (LI) that tells us how critical a situation is. Simply put, leverage index is a measure of the possible changes in win expectancy that can occur in a plate appearance. The larger the possible swings in win expectancy, the higher the leverage of the situation. It is shown as a single number, with the average being 1.0. Anything above 2.0 is considered high leverage and indicates the potential outcomes of that situation could have twice the impact of an average play on the game’s outcome. When Sierra came to the plate against Rincon, the leverage index was 2.28. This is exactly the thing to evaluate and identify when the game hangs in the balance and it gives us an objective way to assess how well our manager is utilizing his bullpen weapons. Sierra’s home run to tie the game yielded a 30-point swing in win expectancy for the Yankees – increasing their win probability to 45% – the most of any play in that game. Objectively, this was the point at which Gardy should have turned to Joe Nathan (if not earlier still).

If you’ve followed this series so far you can probably guess that we can also use these concepts at an individual level. For pitchers, several slices of leverage index are commonly tracked – at his entry into the game, when he exits the game, and the average for all events while he is in the game are a few. Over the course of a season or a career this is a good indicator of how a reliever is used. We want to see our best relievers used in high leverage situations regardless of the inning in which that usage occurs. What’s interesting is research about leverage index has shown the ninth inning to be less critical and tense than earlier innings on average.

We can also measure a player’s performance in terms of his contributions to the final outcome (a similar concept to RE24) with the advanced stat, win probability added (WPA). It captures the change in win expectancy from one plate appearance to the next and credits or debits the players based on how much their action increased their team’s odds of winning. In the case of Juan Rincon against Ruben Sierra – Sierra was credited with 0.30 WPA for his homerun, and Rincon was docked -0.30 WPA.

WPA is context dependent so it’s helpful to think of it as a story-telling stat and not a prediction stat. It highlights the big plays in a game and the players who contributed most to a win or loss. It’s arguably the best way to evaluate relievers. WPA is also a counting stat meaning players accumulate it over the course of a season and those with more playing time will have more opportunities to accrue WPA, if they perform well. Over the course of a season, an average performing player will accrue around 1.0 WPA. The upper end of the range is around 6.0 in most seasons.

Turning a Corner

To wrap up this lesson, I want to express some optimism for the future of bullpen management. Recent seasons have started to show a shift in thinking across MLB toward using your best relievers in the highest leverage situations. The 2016 stretch run and postseason broke the dam with Terry Francona’s aggressive use of Andrew Miller early in games for multiple innings. The 2019 postseason went a step further with the Nationals use of their starting pitchers (their best pitchers, period) out the bullpen. There are some constraints and barriers to operating this way over the course of 162 games – affording adequate rest for pitchers as an example – but deploying your best relievers in the highest leverage situations as often as possible works as a best practice.

The Twins are part of this shift as evidenced by Rocco Baldelli’s approach with Taylor Rogers in 2019. Yes, Rogers saved 30 games in 60 appearances as the Twins closer – but he also averaged more than an inning per appearance, made 17 multi-inning appearances (28%), and entered the game in the 8th inning or earlier in half of his appearances. Add that all up and Rogers’ average leverage index was 2.15 – the 2nd highest among all relievers and the 2nd highest total for a Twins pitcher ever, behind Rick Aguilera in 1998. He delivered too – his 3.51 WPA was 6th best among relievers in all of baseball and the highest for the Twins since Joe Nathan in 2009.

Lesson Takeaways

  • Saves are very context dependent and awarded based on an arbitrary definition.
  • Saves are not an effective measure of individual performance.
  • The save has created a subjective perception that 9th-inning relief performances are more valuable and more difficult than earlier innings.
  • This perception has created a disparity between “closers” and other relievers that is not based in objective data.
  • Baseball has a century long trend of using more relievers in shorter stints that has led to a widespread formulaic approach to bullpen decision making.
  • Leverage Index (LI) is an advanced stat that tells us how critical a situation is. High leverage situations occur at different points in the game (not just the 9th inning). We want to deploy our best relievers in the highest leverage situations as often as possible.
  • Win Probability Added (WPA) is an advanced stat that measures a player’s performance in terms of his contributions to the outcome of the game (similar to RE24 for runs).

Test Your Knowledge: Five Quiz Questions

Test your knowledge with the questions below. The answers are below:

#1: The save first became an official statistic in the 1969 and went through multiple definition changes before being finalized as we know it today in 1975. True or False?

#2: By today’s rules, a player can earn a save by throwing three scoreless innings in a game that is won by his team with a margin of 9 runs. True or False?

#3: Leverage Index tell us how critical a situation is. Above what threshold is a situation considered “high leverage”?

#4: The ninth inning is always the highest leverage situation. True or False?

#5: Determine the Win Probability Added for the pitcher and the batter in the following scenario:

  • Max Kepler comes to the plate facing Trevor Bauer to lead off the game. The Twins win probability is 50% at the beginning of the plate appearance. Kepler hits a solo homerun. The Twins win probability increases to 59%.

Answers:

#1: True

#2: True

#3: 2.0 and above

#4: False

#5: Kepler: +0.09 WPA; Bauer: -0.09 WPA.


References:

The data sources are cited throughout this post. Like others who have tried to write and explain these subjects before, I relied significantly on the following resources:

  • Book: Smart Baseball by Keith Law
  • Book: The Book – Playing the Percentages in Baseball by Tom Tango, Mitchel Lichtman, and Andrew Dolphin
  • Book: The Hidden Game of Baseball: A Revolutionary Approach to Baseball and Its Statistics by David Reuther, John Thorn and Pete Palmer
  • Fangraphs’ indispensable library: library.fangaphs.com
  • MLB’s glossary: mlb.com/glossary
  • Baseball-reference.com

John is a contributor to Twinkie Town with an emphasis on analytics. He is a lifelong Twins fan and former college pitcher. You can follow him on Twitter @JohnFoley_21.