/cdn.vox-cdn.com/uploads/chorus_image/image/66868034/513656304.jpg.0.jpg)
This is the third lesson in the series Twinkie Town Analytics Fundamentals. For more information on this series, what I hope to achieve by doing it, and the topics that will be covered please take a look at the series Introduction. If you have topics you’d like to be explored in this series, please leave a comment and let me know!
Previously:
Part 1: The Flaws of Batting Average
Part 2: Busting the Myths of Pitcher Wins and ERA
The story of the 1987 Twins is one that is well known by readers of this site. Unexpected World Champions. A roster filled with players who would become franchise icons – Puckett, Viola, Blyleven, Hrbek, TK, and more. What many may not realize from this storied season is the team was led in Runs Batted In (RBI) for the second season in a row by third baseman and usual fifth place hitter Gary Gaetti, with 109 (he had 108 RBI in 1986). Gaetti finished seventh overall in the American League in RBIs, in addition to taking home a Gold Glove for his work at third and being named the Most Valuable Player of the American League Championship Series. Glancing quickly at those facts, we might generally conclude Gaetti had a very good 1987 season. After all, he was the team’s leading “run producer” and knocking in more than 100 runs is no small feat. We might even credit Gaetti for having a knack for driving in runs.
Throughout baseball history we’ve been led to believe 100 RBIs is a mark of a great season. But is that true? This question makes 1987 the perfect place from which to launch into our third analytics fundamentals lesson.
Runs Batted In – A Misleading Statistic
If you’re like many baseball fans, baseball cards were one of the primary ways that you became familiar with baseball statistics. On the back of each card were a player’s traditional statistics – primarily batting average, runs scored, home runs, runs batted in (RBI), and stolen bases, among a few others. Among the stats available, we most often condensed those statistics to just batting average, homeruns, and RBI – collectively the “triple crown statistics”. It was common to simply view a hitter’s value through the lens of only these three statistics. But there are some significant drawbacks to that approach. We’ve already tackled the issues that come with batting average in Part 1. Today, let’s dig deeper on RBIs.
By Major League Baseball’s definition an RBI is a stat “a batter is credited with in most cases where the result of his plate appearance is a run being scored.” They go on to explain there are a few exceptions, however. A player does not receive an RBI when the run scores as a result of an error or ground into a double play. The most common examples of RBIs are run-scoring hits. However, players also receive an RBI for a bases-loaded walk or hit by pitch. Players can earn RBIs when they make outs, as well, provided the out results in a run or runs.
Relative to other traditional stats like batting average and home runs, the RBI is a young statistic – first introduced by New York Press writer Ernie Lanigan and becoming an official statistic in 1920. Over the last one hundred years, the statistic has become mainstream and is widely used in valuing players, assessing performance, and determining awards. The use of RBIs as a measure of value is so prevalent that about 32.5% of all Most Valuable Player awards (first awarded in 1931), have been given to the player that led their league in RBIs. One of those was the Twins’ Harmon Killebrew, in 1969. A Twin has led the American League in RBIs five times – Killebrew in 1962, 1969, and 1971; Larry Hisle in 1977; and Kirby Puckett in 1994. Despite RBI being used as a marquee statistic, there were baseball thought leaders that expressed doubts. Perhaps the most notable early voice was legendary General Manager and serial baseball innovator Branch Rickey, who in a very well-known 1954 Life Magazine article titled Goodbye to Some Old Baseball Ideas, called runs batted in “a misleading figure” and wrote “as a statistic, RBIs were not only misleading but dishonest. They depended on managerial control, a hitter’s position in the batting order, park dimensions, and the success of his teammates in getting on base ahead of him.” Those are some strong points of dissent from one of baseball’s leading thinkers at the time. Are these reservations about RBIs merited?
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008362/515454040.jpg.jpg)
The Issues with Runs Batted In
On the surface, we might not see much wrong with runs batted in as a measure. After all, baseball is a game about scoring more runs than your opponent, so how could a stat that counts who drives in a run be flawed? In truth, there are four major issues with RBIs.
The first of these is that RBIs assumes a run scoring is an individual activity and gives full credit for a run scoring to an individual player. Much like we discussed in part 2 about pitcher wins, RBIs confuses a team outcome with an individual outcome. The fact is, scoring a run (outside of a solo home run), is primarily a team outcome because it most often requires more than one hitter to occur. Let’s look at the below hypothetical example, using the 1987 Twins:
- Kirby Puckett leads off the inning and reaches base safely on a single
- Kent Hrbek singles to right field; Puckett reaches third base safely
- Gary Gaetti hits a chopping ground ball to the shortstop; Hrbek is forced out at second base; Gaetti beats out the relay throw to reach first base safely - fielder’s choice; Puckett scores
It took contributions from each of the three players for that run score for the team. In the scorebook, Puckett is awarded a run scored for touching home plate and Gaetti is awarded an RBI for being the batter whose outcome resulted in the run scoring. The Twins score a run as a team.
But wait, that illustrates something strange. At an individual level baseball has this weird accounting method that counts “runs” on offense twice. The player who touches home plate is awarded a run scored and the player who drives him in gets a run batted in (except on a double play). That’s two runs counted for the players, but only one for the team. That’s odd.
Further, this approach says nothing of the contributions of players who may have helped that run scoring for the team, but who did not score or drive in the run individually. What about Hrbek in the example above? This is the second issue with RBIs. In the traditional accounting of runs and RBIs, Hrbek is awarded nothing more than his single even though Puckett advanced more bases (2) as a result of his hit than any other play in the sequence. Gaetti gets full credit for the run being scored but he contributed comparatively less than Hrbek to the overall team outcome. How does that make sense? This approach furthers the (wrong) impression that Gaetti alone was responsible for that run scoring and contributes to the misconception that RBIs are a good measure of an individual’s contributions to helping the team score runs. It’s not necessarily that RBIs are bad. After all, a run scored and that’s a good thing – but the way they are counted and (mis)applied makes them an incomplete measure of an individual.
The third and fourth issues with RBIs were identified by Rickey in the quote above and deal with a hitter’s position in the lineup and the success of his teammates in getting on base. There has been a lot of historical, empirically based research completed on optimizing batting orders. The impact of different lineup spots was comprehensively analyzed in The Book, by Tom Tango, Mitchel Lichtman, and Andrew Dolphin. According to their research of the 1999-2002 seasons, there are a couple of key takeaways about batting orders. The first is the raw number of plate appearances from each lineup slot (1 through 9) over the course of a season decreases as you move down the lineup. The second is the likelihood of each lineup spot coming to bat with runners on base peaks in the middle of the lineup. What they found is each subsequent spot in the order will receive about 2.6% fewer plate appearances than the one above it. Over the course of a full season this works out to about 18 to 21 plate appearances from one spot to another (see third column of table below). This can add up to a dramatic difference in plate appearances when comparing, say, the second and seventh spots in the lineup. Further, they found the first (36% of PAs) and second (44% of PAs) spots of the lineup are least likely to come to bat with runners on base, and the third (48%), fourth (51%), and fifth (48%) spots are most likely (see fifth column).
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008364/Batting_Order_Impacts.jpg)
The data above is aggregated across all teams and all players from 1999 to 2002, which somewhat hides Rickey’s point about the on base skill of the other players in the lineup. Let’s go back to 1987 and Gary Gaetti. Of Gaetti’s 154 games played in 1987, he started 149 of them batting fourth (55) or fifth (94) in the Twins lineup. Kirby Puckett batted third 142 times. Kent Hrbek batted fourth 93 times. Puckett and Hrbek had excellent seasons, both getting on base at high clips – Hrbek (.389 OBP, 11th AL), Puckett (.367, 25th AL). The net result for Gaetti was he led the Twins, and was 16th in the American League, in baserunners aboard when he came to bat. His season total was 434 baserunners, which averages out to 2.82 baserunners per game he played. That total was 36 more than the next closest Twin (Puckett) and 65 more than Tom Brunansky, who finished third on the club.
What this tells us is RBIs are context dependent and they are influenced by opportunity. As Rickey observed, a player’s position in the lineup and the on base ability of the players that bat in front of them drive the number of RBI opportunities a player might get. Over the course of a season, a player who consistently bats fourth or fifth is likely to have quite a few more opportunities with runners on base. RBIs, as a counting stat, are misleading because they don’t account for the opportunity to obtain RBIs. We have lionized the 100 RBI man as a “run producer” without considering this key context.
The takeaway of all this is that, while Gaetti led the Twins in RBI with 109, we must at least recognize that he might have benefited from greater opportunity because of his place in the lineup and the quality of the players who batted in front of him. And we also must recognize RBIs are a misleading measure of individual player productivity and performance because they are measuring a team outcome and fail to account for non-run scoring contributions. Rickey called the RBI a misleading statistic because taking a player’s RBI count at face value as a measure of performance and productivity is just not very informative. So, what should we use instead?
A Crude, but Better Measure
Preferably, if your goal is to evaluate the productivity of the individual player, you should eschew the RBIs and stick to the more comprehensive measures we’ve discussed previously in this series – on base percentage (OBP), slugging percentage (SLG), weighted on base average (wOBA), and generally avoiding making outs. By these measures Gary Gaetti had a decidedly average or slightly below average 1987 season:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008366/Gary_Gaetti___1987.jpg)
Avoiding making an out is arguably the most important thing a batter can do. And Gaetti made a lot of outs in 1987. Those 470 outs made were the most on the Twins and ranked 9th most in the American League. To top it off (and driven in part by his good fortune of batting with runners on base so frequently) Gaetti also led the American League in double plays grounded into with 25.
But let’s say you’re a traditionalist and want to stick in the general genre of runs batted in for your stat of choice. A better way to look at runs batted in would be to make a simple adjustment to normalize for the opportunity to drive in runs and look at the percentage of baserunners who scored as a result of the batter’s play. Baseball-Reference.com maintains a statistic for this, called Base Runner Scored Percentage (BRS%), which divides the number of baserunners who scored (not necessarily only with an RBI) by the total number of baserunners on base when a player batted. This approach helps to correct for the opportunity disparities that plague traditional RBI counts and for the “hidden” runs, like those driven in as a result of double plays. It’s a simple approach that improves on some of RBIs flaws and allows us to more appropriately compare players in terms of their performance turning available baserunners into runs scored. When we use BRS%, we find that Gaetti turned 18.4% of the 434 baserunners aboard when he batted into baserunners scored, which was second on the Twins, trailing only Kirby Puckett who had 19.4%. The 1987 Twins had a team average of 15.6% BRS%, and the American League average was 15.3% - both of which put Gaetti’s performance in a more favorable light in terms of driving in runs.
So, if you must use a statistic like RBIs, use BRS% instead. But be aware that it is also incomplete. We’re still not accounting for the contributions of a player who’s at bat did not result in a run being scored. Remember Hrbek’s single that moved Puckett from first base to third base in the example above? How do we more completely account for these “hidden” contributions?
Thinking about Runs in Fractions
Ultimately, the best approach to tackle the question of completely accounting for a player’s contributions (positive and negative) is to change our mindset from thinking about runs in integers to thinking about runs in fractions. To wrap up this lesson and set up Part 4, I want to introduce the two most basic elements of thinking about runs in fractions: base/out states and run expectancy. These ideas are borne from early baseball analytics treatises – The Hidden Game of Baseball by David Reuther, John Thorn, and Pete Palmer, and The Book.
When we think about game situations in baseball the first two pieces of data we need to know are 1) how many outs? And, 2) where are the baserunners (if any)? These two pieces of data create the context in which the pitcher vs. batter matchup will occur. When we combine all possible combinations of baserunners and outs you find that there are 24 possible base/out states. These range from bases empty and no outs to bases loaded and two outs. From the start of an inning to the end of an inning the game transitions between different base/out states as events happen.
Thanks to the nearly ubiquitous availability today of historical play by play data and the hard work of many people we can leverage those base/out states and the real outcomes of past games to calculate an expectation for runs to be scored from each base/out state. Said more simply, we know today based on historical data, how many runs score from each base/out state on average. This is called run expectancy and was first introduced by George Lindsay in 1963. The analysis for The Book, based on 1999 to 2002 seasons, yielded the below run expectancies for each base/out combination:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008368/Run_Expectancy.jpg)
Ultimately this tells us the number of runs that can be expected to score, on average, from each combination of runners and outs. Runners on the corners with 1 out will result in 1.243 runs scoring by the end of the inning on average. Bases empty and two outs will result in 0.117 runs on average. This becomes very useful for evaluating individual players because we can use these figures to calculate how the result of a player’s plate appearance impacted the run expectation and use it as a measure of their productivity. This is the idea behind the metric RE24, a statistic that credits or debits batters and pitchers for their role in changing their team’s odds of scoring (or preventing runs) in a given inning. It is simply calculated as:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008370/RE24_Formula.jpg)
Let’s return to our example to illustrate. The base/out situation is runner on first (Puckett), no outs. The beginning run expectancy of this situation (for simplicity we’re using the run expectancy table above, not 1987 numbers) is 0.953 runs on average. This is the beginning expectation for Hrbek’s plate appearance. He singles and moves Puckett to third. Now the situation is runners on first and third with no outs, which has a run expectancy of 1.904. No runs have scored. Using this approach, we can calculate the run value of Hrbek’s single using the RE24 formula:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008371/Hrbek_RE24.jpg)
The resulting value of Hrbek’s single is a positive 0.951 runs above average. Gaetti then steps up into a situation with a run expectancy of 1.904 and grounds into the fielder’s choice, leaving a runner on first with one out and pushing across a run. His RE24 value would be found as follows:
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008372/Gaetti_RE24.jpg)
Even though the run scored, Gaetti’s result lowered the overall average run expectancy for the inning, largely because an out was made. This is how to calculate run values for individual plate appearances, and it can be extended to games, seasons, and even careers by simply adding the results for each plate appearance together.
Ultimately, RE24 is a measure of how well hitters are capitalizing on their opportunities while not over-weighting credit to hitters who happen to come to plate with men on base frequently. It is a simple way to assess a player’s performance relative to the average expectations of the situation in which they were placed. The 1987 Twins were led by Hrbek with 31.92 RE24 above average – a figure that ranked 13th in the AL. Gary Gaetti finished the season at 0.59 RE24. This was just slightly above average (which is zero), 8th most on the team, and 53rd of 77 qualified AL hitters.
The ideas of runs as fractions, base/out states, and run expectancy are fundamental key concepts for sabermetrics. Nearly all the advanced metrics in use today are based upon these ideas and we will build from this foundation next week in Part 4.
Lesson Takeaways
- RBIs are misleading because scoring runs is a team outcome (except for solo homeruns)
- RBIs fail to account for opportunity to drive in runs, which is influenced significantly by batting order position and the on base ability of other batters in the lineup
- RBIs fail to account for valuable contributions that do not directly result in runs scoring
- Each spot in the batting order will receive about 2.6% fewer plate appearances than the spot above it over the course of a full season
- Batters who slot higher in the order and who bat behind players with high on base abilities will receive more opportunities to bat with runners on base (and thus more RBI opportunities)
- Base Runner Scored Percentage (BRS%) is a better measure than RBIs because it normalizes for opportunity to drive in runs
- Plate Appearance outcomes (positive and negative) that do not result in runs scoring also have values and need to be accounted for to accurately assess a player’s productivity
- Thinking about runs in terms of fractions using the 24 base/out states and their associated run expectancies is the best tool we have to completely assess a player’s productivity and forms the foundation of sabermetrics
Test Your Knowledge: Five Quiz Questions
Test your knowledge with the questions below. The answers are below:
#1: The are four main issues with runs batted in. What are they?
#2: An RBI is awarded when a run scores as a result of a double play. True or False?
#3-#5: Analyze the below blind player comparisons. In each comparison, determine which player was more efficient at driving in runs. The data provided are RBIs, baserunners, and baserunners scored.
:no_upscale()/cdn.vox-cdn.com/uploads/chorus_asset/file/20008376/Quiz.jpg)
BONUS QUESTION:
(Using the run expectancy table from The Book above, analyze the below scenario)
With one out and a runner on 2nd, Max Kepler doubles off the right center field wall at Target Field. The runner from second scores on the play. What is Kepler’s RE24 for the play?
Answer Key (Answers in Bold):
#1: 1) Confusing team outcome with individual outcomes; 2) Failure to account for outcomes that do not directly result in runs scoring; 3) RBI opportunities are impacted heavily be lineup positioning; 4) RBI opportunities are impacted heavily by the on base skills of other batters in the lineup
#2: FALSE
#3: A = 1991 Chili Davis (BRS%: 15.2%), B = 1991 Kent Hrbek (BRS%: 19.4%)
#4: C = 2006 Michael Cuddyer (BRS%: 18.0%), D = 2006 Justin Morneau (BRS%: 21.3%)
#5: E = 2017 Brian Dozier (BRS%: 15.1%), F = 2017 Joe Mauer (BRS%: 19.2%)
Bonus: (RE End 0.725 – RE Begin 0.725 + 1 Run Scored) = 1.0 RE24
References:
The data sources are cited throughout this post. Like others who have tried to write and explain these subjects before, I relied significantly on the following resources:
- Book: Smart Baseball by Keith Law
- Book: The Book – Playing the Percentages in Baseball by Tom Tango, Mitchel Lichtman, and Andrew Dolphin
- Book: The Hidden Game of Baseball: A Revolutionary Approach to Baseball and Its Statistics by David Reuther, John Thorn and Pete Palmer
- Goodby to Some Old Baseball Ideas, by Branch Rickey
- Fangraphs’ indispensable library: library.fangaphs.com
- MLB’s glossary: mlb.com/glossary
- Baseball-reference.com
John is a contributor to Twinkie Town with an emphasis on analytics. He is a lifelong Twins fan and former college pitcher. You can follow him on Twitter @JohnFoley_21.