If you search for quotes about baseball umpiring, you are sure to find a large fraction are negative in tone. Hall of Fame pitcher Christy Mathewson is attributed the following: “Many baseball fans look upon an umpire as a sort of necessary evil to the luxury of baseball, like the odor that follows an automobile.”
Hall of Fame manager Leo Durocher articulated a thought many baseball fans have shared over the years when he said “I’ve never questioned the integrity of an umpire. Their eyesight, yes.”
Humorous as those may be, they tend to be representative of the popular feelings toward umpires. They are needed for the games to be played but we often think they are bad at their jobs.
Prior to the last couple of decades, our ability to quibble with umpire calls has largely been subjective. We did not have the data to definitively prove a call had been missed.
That has changed thanks to the technology implemented in major league baseball stadiums. Now, with slow-motion video and precise measurements of just about everything happening on the field, our claims of our team being robbed by a bad call have become more objective. We have proof of when our team gets robbed. (And proof of the more common occurrence of our team not being robbed.)
That technology has enabled instant replay rules that have mostly eliminated blown calls in the field and on the bases. The system is not perfect — it certainly could be faster and less intrusive to the flow of the game, missed calls occasionally stand, and some calls will always rely on judgment, no matter how controversial — but it has improved the accuracy of umpire calls in the field.
Adjudicating balls and strikes at home plate have been another matter.
An Impossible Job
Last season, the average game had about 291 pitches thrown, making the number of ball-strike calls dwarf the number of other calls several times over. And, those ball-strike calls are hugely consequently, given the ball-strike count plays a huge role in offensive performance.
Despite their importance, ball-strike decisions are not reviewable or even arguable. In fact, the MLB rule book explicitly forbids the arguing of balls and strikes in Rule 9.02.
Any umpire’s decision which involves judgment, such as, but not limited to, whether a batted ball is fair or foul, whether a pitch is a strike or a ball, or whether a runner is safe or out, is final. No player, manager, coach or substitute shall object to any such judgment decisions.
(But, that doesn’t stop even the most mild-mannered players and coaches from trying.)
Thanks to Statcast and its predecessors we have broad access to very granular pitch location information, and almost all broadcasts and apps offering live game tracking include a visual aid for the strike zone, making it so we know immediately which ball-strike calls were incorrect.
Simple queries of the Statcast show that about 2.7% of all pitches thrown since 2015 were located outside the strike zone yet called strikes and about 2.1% of all pitches were located inside the strike zone and called balls. So, at least by the rule book definition of the strike zone, umpires have missed around 5% of their ball-strike calls (or, said another way, they’ve gotten around 95% of them correct).
Accuracy, Consistency, or Both?
Perfect 100% accuracy is the holy grail, but so long as we have human umpires behind the plate, it will never be obtained in any permanent sense. The speed and movement of the pitches are too great (and increasing) and human perception and judgment are too fallible. (If you want to get a feel for the challenge of the job, I encourage you to try your hand at the You Be The Ump bot developed by The New York Times.)
Perhaps recognizing that perfect accuracy is an impossible ask, we have tacitly accepted consistency as a reasonable alternative. Over time, we have learned to tolerate the pitch 3 inches off the plate being called a strike or a pitch on the plate at the knees being called a ball, so long as that was done consistently within a game.
But, how can consistency be measured?
Mathematics Professor David Hunter of Westmont College applied some tools from computational geometry to the pitch location data to come up with a mathematical method to objectively evaluate umpire consistency.
Before the mention of computational geometry drives you off, know that the approach is surprisingly simple. The umpire’s “established strike zone” is created by drawing a line between the called strikes so that they are all encircled.
A consistently called game should have the entire established ball zone lie outside the established strike zone. Any pitches inside of the established strike zone called balls are inconsistent calls, and vice versa. Dividing the inconsistent calls by the total number of calls gives an inconsistency rate for the umpire.
With some iteration on this approach, Ethan Singer, a Computer Science and Statistics major at Boston University, developed Umpire Consistency Scores (UCS) and combined them with accuracy data to publish summary reports on home plate umpire performance in every MLB game via a popular website and Twitter account, Umpire Scorecards:
This has brought a new era of transparency regarding umpire performance. Umpire Scorecards tracks this in numerous ways, making it easy to see which umpires are doing good or bad jobs and which teams are reaping the benefits or bearing the brunt.
Among umpires to work 25 or more games behind the plate in 2022, Jeremie Rehak (95.5%) edged out Pat Hoberg (95.3%) for the most accurate. Dan Bellino (94.6%) was the most consistent. CB Bucknor was both the least accurate (92%) and least consistent (92.1%).
Using the familiar change in run expectancy method as we do for other on-field events, we can also estimate the impact of these calls. By this approach, the 2022 Twins were not strongly impacted by umpire ball-strike calls, ranking 16th in total run impact (-2.26) and tied for 15th in being favored in 49.4% of their games.
Contrary to Perception, Umpires Are Getting Better
There is an old management axiom that “what get’s measured, gets managed,” and that seems to be playing out behind the plate. Analysis of data from the past decade shows that umpires have been improving their ability to accurately call balls and strikes.
Per @UmpScorecards, 2022 set records for lowest percent of games where Overall Accuracy <90%, highest percent of games where OA = 95%+ and lowest percent of games where Overall Favor = 1+. pic.twitter.com/M32CxnPeGB— Umpire Scorecards Analysis (@UmpAnalysis) October 7, 2022
Umpires are having fewer bad games and more good games, and the amount of favor toward one team or the other is shrinking over time. This is despite the level of difficulty increasing due to pitch velocity, movement, and the usage of breaking pitches reaching all-time heights.
While that data might suggest umpires are learning from experience, analysis has also revealed that more experienced umpires have tended to be less accurate than less experienced umpires:
Per @UmpScorecards, in every regular season from 2018-2021, the most experienced umps (501+ games of home plate experience prior to start of each season) had the lowest Overall Accuracy. In the '22 regular season the most experienced also had the lowest OA. pic.twitter.com/Ax1beOREIF— Umpire Scorecards Analysis (@UmpAnalysis) October 7, 2022
Ultimately, we care about accurately calling balls and strikes because of the potential impact those calls can have on winning and losing. While we most often remember when a specific call seemingly swung the balance of a game, in the aggregate, the data suggests that the impact of ball-strike bias on game outcomes has not been large.
Calls for Change
Despite the data showing that umpires are getting more accurate (yes, even in the playoffs), the shouts for replacing human home plate umpires with automated systems have only grown louder. The umpires might be getting better, but they are still far from perfect.
Those pleas have been fueled by multiple, generally successful pilots of automated ball-strike technology (ABS) in the minor leagues, including across Triple-A this past season. Those pilots have served to calibrate the technology and gather feedback from game situations.
Those learnings have been applied to tweak the electronic strike zone from a narrower and taller zone consistent with the letter of the law in the rule book to one that’s wider and squatter and more consistent with how the human umpires have been calling it.
The Athletic reported that the zone used in Triple-A extended 19 inches wide at the middle point of home plate, including an inch off either edge (home plate is 17 inches wide). The top and bottom edges of the strike zone were based on specific percentages of the batter’s height. The size, MLB said, was similar to the major league zone.
While the lockout negotiations last winter did not yield an agreement between MLB and the MLBPA about implementing an automatic strike zone, Commissioner Manfred publicly said earlier this season that he’s anticipating it to be implemented for the 2024 season.
Robot Umpires or Robot Assistants?
If that were to happen, as seems likely, it could be done a couple of different ways. One option, known as “Full ABS” is to have all the ball and strike calls made electronically and then relayed to the home plate umpire via an earpiece, who then makes the signal on the field.
The other option is a challenge system, similar to that used in professional tennis. In the pilots of this approach, the home plate umpire called the pitches as they have for the entirety of baseball history. Each team got three challenges per game, with successful challenges retained for future use in the game. If a team chose to challenge a call, ABS was used as the arbiter of the dispute.
A lot of buzz around video of #Yankees prospect Jasson Domínguez successfully using the ABS challenge system to overturn a called strike in the Fall League.— MLB Pipeline (@MLBPipeline) October 20, 2022
Here's more on how that works: https://t.co/3fooHZKlHr pic.twitter.com/n0pHF598ZA
Either approach is likely to be a significant step toward perfect balls and strikes. However, opinions will run strong as to which option to implement.
The challenge system is perhaps the most reasonable compromise for preserving the human element in the game while also taking advantage of the accuracy offered by technology. It’s proven to be quite popular with hitters and pitchers alike in the trials and has been used four to five times per game. In 359 Single-A games, player challenges were correct 43.8 percent of the time, while a smaller sample of Triple-A games (91 games) had a 48.3 percent success rate.
Also in its favor is that it adds the strategic element of when to deploy your challenges to the game. It would also mostly preserve the value of catcher pitch framing skills, something that many organizations have invested in significantly over the past decade-plus.
On the other hand, Manfred has said that MLB’s data shows that the full ABS system is also a useful pace of play expediter that they credit for reducing game times by about nine minutes on average. That may be a significant pro over the challenge system in the eyes of the league, given their interest in quickening the pace of play. Moreover, the full ABS approach would also enable the strike zone to be modified, giving the league another lever in its quest to boost action.
Which option for calling balls and strikes do you prefer?
This poll is closed
Give Me The Robots! — Full ABS
Maybe The Robots Can Help? — ABS Challenge
Stop Messing With The Game! — Human Umpires
Into the Unknown
No matter how successful a pilot is, these kinds of changes will always have unintended consequences. Everyone can probably agree that we want the game to be called as accurately as possible because we want the players and teams to decide the outcomes and be the stories, not the umpires. But, we won’t really know if ABS gets us the ends we desire (or even closer to them) until it’s implemented.
In the end, there’s a chance this is a case of “be careful what you wish for.” For fans, especially in the tortured sports landscape of Minnesota, there has always been some psychological safety in having the “human element” play a prominent role. As Hall of Fame umpire Billy Evans once said: “The public wouldn’t like the perfect umpire in every game. It would kill off baseball’s greatest alibi - ‘We was robbed.’”