clock menu more-arrow no yes

Filed under:

Twinkie Town Analytics Fundamentals: Measuring Defense

New, 17 comments

Part 6: Errors, Fielding Percentage, and Gold Gloves are not enough — what should we use instead?

Minnesota Twins
Kirby Puckett’s reputation as a great defender is supported by his six Gold Gloves
Photo by Focus on Sport/Getty Images

You can check out all parts of the Twinkie Town Analytics Fundamentals series by clicking the tag at the top of the post. If you have other topics you’d like to be explored in this series, please leave a comment and let me know!


Kirby Puckett is a Baseball Hall of Famer and one of the most revered legends in Twins history. A career .318 hitter and core piece of the franchise’s World Series championships in 1987 and 1991, Puckett’s reputation is that of a well-rounded ballplayer who was as productive at the plate as in the field and on the bases. The franchise’s all time leader in games and innings played in center field, Puckett’s reputation as an outstanding fielder is supported by all the traditional measures. In 1,432 games covering 12,245 innings in center (both 25th-most all time), Puckett made only 42 errors while he grabbed 3,853 putouts (22nd-most all time) and threw 110 outfield assists (9th-most all time). In part for his .990 career fielding percentage in center, Puckett was awarded six Gold Gloves (third most in franchise history behind pitcher Jim Kaat’s 11 and fellow center fielder Torii Hunter’s 7).

By these measures, Puckett stacks up with some of the all-time best on defense. His reputation as a superlative defensive outfielder was established early in his career and cemented with a certain catch in the 1991 World Series:

In 1954, Branch Rickey referenced the inherent challenge of measuring defense in baseball, writing “there is nothing on earth anybody can do with fielding.” Despite that sentiment, we have made some progress on that front in the nearly 70 years since. The traditional measures we might have grown up thinking were informative have proven to be inadequate for assessing how well a defender does their primary job — turning batted balls into outs — and the awards we have used to recognize top defenders have often been misleading. The late Puckett makes an excellent case study to illustrate that evolution.

In this lesson we’ll explore what’s flawed about the traditional measures of defense, walk through the evolution to more comprehensive stats, and explain the best we have today. In the end, measuring defense accurately remains one of the most difficult riddles in baseball analysis and it is a very active area of research that will undoubtedly continue to evolve.


The Challenge of Measuring Defense

Measuring defense has always been a challenge in sports. We have developed many effective ways to track how teams score but tracking how they stop their opponents from scoring has proven to be more difficult. This is mostly because it is easier to count things that have happened, as opposed to counting (or estimating) things that were prevented.

In baseball, “defense” includes both pitching and fielding. There has always been an abundance of things to easily log without much more than a piece of paper and pencil for pitching. Balls and strikes. Swings and misses. Batted balls in play. Many, many others. The ease of recording these events enabled the development of a huge number of statistics intended to inform us about pitching performance.

The fielding of batted balls has proven less fruitful. The events that happen on the field are fewer and farther between and involve more complex interactions between the ball, defenders, and baserunners. As a result, the beginning of measuring defense started with counting the few things that could be easily logged: putouts, assists, and errors.

Traditional Fielding Stats

A fielder is credited with a putout when he is the fielder who physically records the act of completing an out — whether it be by stepping on the base for a forceout, tagging a runner, catching a batted ball, or catching a third strike.

Putouts are sometimes preceded by another defender fielding and throwing the ball (especially on the infield). An assist is awarded to a fielder who touches the ball before a putout is recorded by another fielder. Typically, assists are awarded to fielders when they throw the ball to another player — but a fielder receives an assist as long as he touches the ball, even if the contact was unintentional.

Both assists and putouts are easy to identify and track. They clearly happen or they don’t, and there is not any subjectivity in their definitions. But they are not a complete accounting of defense. Plays not made are also an important piece of understanding defense.

This gap in accounting helped lead to the introduction — and first of many definitions — of an “error” all the way back in 1878. An error is now defined in the official rules of baseball as a statistic charged against a fielder whose action has assisted the team on offense. Today an error is given if, in the judgment of the official scorer, a fielder fails to convert an out on a play that an average fielder should have made (among other things).

For most of baseball history, putouts, assists, and errors have been the backbone of evaluating defense.

From those, it is straightforward to add together a player’s (or team’s) putouts, assists, and errors to calculate their total chances. Then, you can easily answer how often the fielder (or team) successfully made those chances into outs by dividing the sum of putouts and assists by total chances. This simple formula gives you fielding percentage.

Written much like batting average or on base percentage, fielding percentages tend to run between .950 and 1.000 at the Major League level. In theory, a fielding percentage closer to 1.000 is best because it indicates that the fielder (or team) is making all the plays expected of them. But is that really accurate?

The Flaws of Traditional Fielding Stats

If you followed the formula above closely, you might have noticed that errors are the only thing that can decrease fielding percentage.

Since the error rule was introduced more than a hundred years ago, it has been expanded, clarified, refined, and caveated no fewer than eight different times. As a result, the current Official Major League Baseball rulebook section on errors — Section 9.12 — now spans about five pages and more than 1,600 words of definitions and explanations. The rule begins like this:

9.12 Errors An error is a statistic charged against a fielder whose action has assisted the team on offense, as set forth in this Rule 9.12.

(a) The official scorer shall charge an error against any fielder: (1) whose misplay (fumble, muff or wild throw) prolongs the time at bat of a batter, prolongs the presence on the bases of a runner or permits a runner to advance one or more bases, unless, in the judgment of the official scorer, such fielder deliberately permits a foul fly to fall safe with a runner on third base before two are out in order that the runner on third shall not score after the catch;

Right there in the beginning we see the troublesome phrase “in the judgment of the official scorer.” This phrase shows up several times throughout the rest of the rule and makes the application of the rule and charging of errors inherently subjective. In practice, errors are fraught with inconsistency across different official scorers at different ballparks. For example:

Excerpted from Rule 9.12(a )(1) Comment: For example, the official scorer shall charge an infielder with an error when a ground ball passes to either side of such infielder if, in the official scorer’s judgment, a fielder at that position making ordinary effort would have fielded such ground ball and retired a runner

How can we expect 30 different official scorers to judge “ordinary effort” the same way?

Surely you can recall one of your own experiences seeing a fielder fail to convert a chance and then waiting to see the official scorer’s ruling of hit or error. How often have you judged that play the same way as the scorekeeper? That determination cannot be objective. In the end, the same play in different cities has a good chance of being scored differently, which serves to tell us nothing about the quality of the defense being played across players and teams.

The second issue is that errors are only a subset of bad defensive plays. Given the definition of an error — a statistic charged against a fielder whose action has assisted the team on offense — you would expect errors to cover all manner of bad defensive plays that benefit the offense. But the expansive caveats and comments in the rule have excepted all kinds of misplays. Consider:

The official scorer shall not score mental mistakes or misjudgments as errors unless a specific rule prescribes otherwise... ...The official scorer shall not charge an error if the pitcher fails to cover first base on a play, thereby allowing a batter-runner to reach first base safely. The official scorer shall not charge an error to a fielder who incorrectly throws to the wrong base on a play

Mental mistakes, even if they serve to benefit the offense, do not count as errors? How are those anything but “errors”?

The rule goes on with several more of these kinds of confusing and inconsistent exceptions. The original intent of the error was noble — somehow accounting for plays that should have been made into outs — however, as you might surmise from a rule that requires 1,600 words of explanation, its implementation fails to meet its intent.

Another issue is the calculations of the traditional stats treat all positions on the field the same, which can make fielding percentage misleading. Twins and Senators first basemen have a collective .991 fielding percentage, while their shortstops have a collective .955 fielding percentage. By fielding percentage alone we might conclude that the first basemen have been better defenders. But is it really that simple? Different positions handle different types of chances that have different degrees of difficulty. Shouldn’t that be part of our thinking?

And finally, the biggest flaw with traditional defensive stats is their omission of plays not made. Consider the ground ball that sneaks through the infield for a hit because the shortstop was a step too slow. Or the line drive that falls in the right center field gap between the center fielder and right fielder. Putouts, assists, errors, and the resulting fielding percentage do not account for these kinds of plays at all, even though they are opportunities to turn batted balls into outs.

Wouldn’t superior defenders turn more of those kinds of plays into outs? Shouldn’t we want to consider those plays as part of our evaluation of defense? As Keith Law pointed out in his book Smart Baseball, “with fielders, however, we have to consider what didn’t happen, because part of the difference between a good fielder and a bad one is the play the good fielder makes on a ball the bad fielder never even touches.”

The traditional stats are incomplete. What can we use instead?

Early Alternatives to Fielding Percentage

Advancing beyond putouts, assists, and errors has been challenged by the availability of more granular data. The abundant world of ubiquitous accessible baseball data, much of it captured by machines, is a recent development. Early analysts did not have such luxury. As a result, any improved fielding measures still had to be rooted in assists, putouts, errors, and play by play data.

One of the earliest advancement attempts was Range Factor, developed by Bill James. It is a rate stat that attempts to estimate a player’s range by measuring the frequency he makes plays in the field. Much like earned run average for pitchers, Range Factor is calculated by adding together putouts and assists, multiplying them by 9, and dividing them by innings played (or games in earlier versions).

To illustrate Range Factor, let’s compare the Minnesota careers of two well known Twins shortstops:

By raw error count and fielding percentage it looks like Guzmán may have been the better defender. With Range Factor, the numbers lean clearly in favor of Gagne. His Range Factor per 9 innings (RF/9) works out to 4.66 versus Guzmán’s 4.41.

Said more simply, Gagne made 0.25 more plays per nine innings than Guzmán. In terms of turning batted balls into outs — the primary function of defense — Gagne was the superior defender because he got to more batted balls.

Range Factor was a step in the right direction but it still had flaws. Notably, it lacked context. The shortstop playing behind a ground ball generating pitching staff (more balls in play) would probably grade out better by Range Factor than the one behind a strikeout generating pitching staff (fewer balls in play).

Another approach, Total Zone, was created by Sean Smith and is used by Baseball-Reference. It leveraged historical play by play data cataloged by Retrosheet. While play by play data was still limited, it offered a bit more insight because it included some batted ball type information (ground ball, line drive, etc.). With some assumptions and a little math, Total Zone estimated a player’s total defensive contributions. It has proven to be an improvement that rates well with today’s more technology-driven data gathering approaches. Total Zone is presented in runs relative to league average with league average set to zero, meaning positive marks are above average and negative marks are below.

Thanks to the availability of historical play by play data, Total Zone has been retroactively calculated throughout all of recorded baseball history. As a result, it’s the best measure available today for comparing historical defensive performance in different eras.

Range Factor and Total Zone bring me back to Kirby Puckett.

Kirby Puckett’s Defense

As you saw in the introduction, Puckett rates very well by the traditional measures. However, his case becomes more circumspect when we expand our view with the newer approaches. In Puckett’s 10 seasons as primarily a center fielder (1984-1993), he was above average in RF/9 five times and below average five times. By Total Zone he was above average four times and below average six times.

In total, RF/9 rated him as slightly above average in his career — 2.91 compared to the league average of 2.83. By Total Zone’s estimates he graded out as 12 runs below average.

Those numbers aren’t quite in sync with our understanding of Puckett’s defensive reputation and his six Gold Glove awards. His reputation as a plus defender in center field was established by two outstanding defensive seasons to begin his career. In Puckett’s 1984 rookie season he was fourth among American League outfielders in putouts (443) and led the league’s outfielders in assists (16), RF/9 (3.63), and Total Zone Runs (30) — despite only playing in 128 games. In 1985 Puckett led center fielders in putouts (466) and assists (19), ranked third in Total Zone runs (10), and fifth in RF/9 (3.11).

Despite those two superlative seasons Puckett did not win his first Gold Glove award until 1986. That recognition came despite just average-ish defensive marks (2.82 RF/9 vs. league average 2.83 and minus-7 Total Zone runs) in that season. That his defensive stats declined at the same time as a significant body change that saw him arrive in Spring Training about 30 pounds heavier than he had been previously makes sense. While he was robbed of some speed and range on defense, the additional bulk and a new leg kick at the plate allowed him to hit for power for the first time. Puckett launched 31 homers in 1986 after hitting only 4 in his career to that point.

For the next seven seasons Puckett would grade out as an average or slightly below average defensive center fielder (by RF/9 and Total Zone) while also ranking as one of the best at the position offensively. He would take home five more Gold Gloves, including four straight from 1986-1989 and two in a row in 1991 and 1992. By both newer metrics, Puckett graded out below average in four of the six seasons he won Gold Gloves.

To be clear, Puckett wasn’t a poor defender that was perceived as a good one. But the data makes plain — especially with the benefit of hindsight and newer analytics — that he was an average-ish defender that was mistakenly perceived as one of the best in the game.

My point here is not to denigrate Puckett but instead to show how the traditional defensive metrics and the stories we tell ourselves from them often don’t align with what actually happened on the field. He developed a reputation early in his career for being a great defender. That reputation was deserved by both the traditional and newer measures then, but was no longer accurate as he aged. Our limited data and understanding of defense prevented that narrative from being updated.

Better Data, UZR, and DRS

While Range Factor and Total Zone were improvements over fielding percentage, they remained limited by the lack of granular data about batted balls. The need for new kinds of data helped firms, like Stats Inc. and Baseball Info Solutions, come into the market with a focus on providing more detailed and specific data points. These companies employed people to chart games in greater detail, creating a bunch more data useful for analyzing defense. This data, when combined with two major conceptual advancements, created the foundation we have today for defense stats.

John Dewan’s Zone Rating divided the field into zones of responsibility for each fielder. Doing this, with enough data, would allow a fielder to be graded by how often they turned batted balls in their zone into outs, helping to address the balls in play omitted by traditional stats.

Another key advance came from thinking about the value of different defensive plays. For most of history a batted ball converted to an out was valued the same regardless of the type of batted ball.

That began to change with Pete Palmer and John Thorn’s book, The Hidden Game of Baseball, and their linear weights approach (see Twinkie Town Analytics Fundamentals Part 4) that helped place empirical run values on different plays depending on the type of batted ball. A ground ball for a single has less value than a line drive going for extra bases in the outfield gap. If a defender is able to convert the line drive into an out, avoiding extra bases, that is a more valuable defensive play than preventing a ground ball single.

Two new measures, Mitchel Lichtman’s Ultimate Zone Rating (UZR) and The Fielding Bible’s Defensive Runs Saved (DRS), combined the zone concept with methods similar to linear weights to permanently change the way we evaluate defense.

While the methodologies differ somewhat, UZR and DRS are both based on the probability a defender turns a certain batted ball type and location into an out and how much that play is worth in terms of run value. The Fangraphs’ UZR primer explains this well:

Let’s say that [of hard-hit line drives hit by a LH batter to a certain area of the outfield, 15% are fielded by the CF’er, 10% by the LF, and 75% fall for a hit] Let’s say that that same batted ball was caught by the CF’er on the first play of a game. Since typically someone will catch that same ball only 25% of the time (see above), this particular CF’er will get credit for an extra .75 plays – 100% minus 25%. We then convert .75 plays into runs by multiplying .75 by the difference between an average hit in that location and the average value of an air ball out. A typical outfield hit is worth around .56 runs and any batted ball out is worth around -.27 runs, so the difference between a hit and an out is worth around .83 runs. (We don’t vary the value of the hit or out based on the outs or base runners because we want “game situation-neutral” defensive evaluations.) Since our fielder gets credit for .75 extra plays, we give him credit for .75 times .83 runs, or +.6255 runs for that play.

If you add up all the values of plays a defender makes (or doesn’t make) over a period of time, you get an estimated total value of his defense in terms of runs prevented (or allowed). Like Total Zone, both UZR and DRS both set league average to zero, meaning positive values are runs above average and negative values are runs below average. As a result, in any given season, an individual player’s performance is likely to fall into the following tiers:

Because of their slight methodology differences, the two systems occasionally disagree but tend to point in the same direction most of the time, especially over longer periods of time with larger sample sizes. Both metrics come with cautions that defensive data can be noisy; therefore, it is recommended to use at least three seasons of defensive data before drawing any definitive conclusions.

While UZR and DRS are significant improvements over anything we had before, they remain imperfect and the people that manage them are regularly attempting to update and improve them.

Statcast and Outs Above Average

In 2015, the data capture limitations that have plagued measuring defense in baseball were dramatically reduced with the implementation of Statcast technology in all 30 MLB parks. That system was upgraded for the 2020 season and now includes 12 cameras around each park for full-field optical pitch, hit and player tracking. Five cameras operating at 100 frames per second are primarily dedicated to pitch tracking, while an additional seven cameras are focused on tracking players and batted balls at 50 frames per second. Now, almost every movement that takes place on a big league diamond is automatically captured and made available to potentially be analyzed.

One of the new measures made possible from this trove of tracking data is Outs Above Average (OAA), a range-based metric that shows how many outs a player has saved. OAA was introduced in 2017 for outfielders and expanded to include infielders in 2020.

The many data points collected on every play allows us to know exactly “how far” and “how much time” a fielder has to make a play (regardless of positioning and shifts). That data is used to estimate the probability of a play on a batted ball being made.

For example, if Byron Buxton has a ball hit to him with a 75 percent catch probability and he catches it, he’ll receive a +.25 credit. If he misses it, he’ll receive -.75, reflecting the likelihood of that ball being caught by other outfielders. The approach is generally similar in the infield, but with a few other factors included, like the speed of the baserunner.

Add up all the plays made (or not) for a season and you get a player’s outs above average. Much like DRS and UZR, OAA is set with zero as average. The best defenders have been accumulating 20 or more OAA in a full season since it was introduced.

This data and stat are the most objective to date for measuring defensive performance. They enable direct comparison of players at different positions, negate issues concerning positioning and shifts, minimize human measurement errors in data collection, capture every ball put in play, and most importantly, eliminate the subjectivity of trying to determine plays that should be made with things like “ordinary effort”.

Conclusion

UZR, DRS, and OAA are all commonly used and accepted as the most comprehensive measures of defense we have. None of the three is considered “right” or “better” than the others and you can feel confident using any or all of them when discussing defense. All of them are leaps and bounds better than fielding percentage and errors.

With that said, OAA is likely just the beginning of what will become many more major advancements in our understanding of defense. You’ll certainly see lots of discussion about OAA this season and for the foreseeable future and the early returns suggest OAA is an improvement over DRS and UZR. However, it’s important to remember how new this is. We still have a long way to go before we properly understand context, performance baselines, reference points, and predictability of OAA.

It is fair to treat UZR, DRS, and OAA (or any defensive metric) with a healthy dose of skepticism — after all, it seems every statistical advance we’ve made throughout baseball history has mostly served to close one question and bring up two or three more.

But, nearly 70 years after Rickey lamented about the challenge of measuring defense, it seems we have made a lot of progress and have the tools needed to continue to do so.

Lesson Takeaways

  • Measuring defense has long been limited by the availability of quality, granular data
  • Traditional measures of defense (putouts, assists, errors, and fielding percentage) focused on things that happened on the field and could be easily logged. This approach ignored more subjective things like plays that could or should have been made.
  • Errors are an attempt to quantify plays that should be made, but the definition is highly subjective, ignores many kinds of bad defensive plays, and is inconsistently applied.
  • The traditional defensive stats largely ignore the concept of range, making them incomplete for evaluating how well a defender turns batted balls into outs.
  • Range Factor and Total Zone make the most of the available box score and play by play data to attempt to account for range. They improve on traditional stats as measures of defensive performance but remain limited by available data.
  • Improved manual data collection helped lead to more comprehensive defensive statistics rooted in fielding zones, probability, and run values of batted ball types.
  • Ultimate Zone Rate (UZR) and Defensive Runs Saved (DRS) utilize similar methodologies to estimate a defender’s total defensive contributions in terms of runs permitted or prevented.
  • Statcast automatic tracking technology provides the most comprehensive data to date and has enabled key components of defense to be objectively measured and analyzed.
  • The newest measure made possible from this tracking data is Outs Above Average (OAA), a range-based metric that shows how many outs a player has saved.
  • UZR, DRS, and OAA are widely accepted as the best measures of defense available today.
  • Defensive data is inherently noisy. It is advised to use at least three seasons of data before drawing definitive conclusions.

Test Your Knowledge: Five Quiz Questions

#1: The definition of errors has remained the same since its inception. True or False?

#2: Official scorers apply errors to plays consistently and objectively. True or False?

#3: Plays not made are an important part of evaluating defense. True or False?

#4: Fielding percentage is an incomplete way to evaluate defensive performance. True or False?

#5: UZR, DRS, and OAA set average to zero and consider positive figures to be above average and negative figures to be below average. True or False?

Answers

#1: False

#2: False

#3: True

#4: True

#5: True


References:

The data and sources are cited and linked throughout this post. Like others who have tried to write and explain these subjects before, I relied significantly on the following resources:


John is a contributor to Twinkie Town with an emphasis on analytics. He is a lifelong Twins fan and former college pitcher. You can follow him on Twitter @JohnFoley_21.