clock menu more-arrow no yes mobile

Filed under:

Similarity Scores and the Twins: Comparing Apples to Blylevens

A few years ago, professional sabermetrician Bill James came up with the concept of "Similarity Scores," a method of numerically comparing one baseball player to another.  These were introduced in The Politics of Glory, and for more on the methodology, see this link at  The scoring is pretty simple: a score of 1000 would be awarded for players that are exactly similar, and the lower the score from there, the less similar players are.

The tool is perhaps most useful in comparing players from different eras, which can give us a tool in argument about whether somebody belongs in the Hall of Fame or not.  For example, Bert Blyleven is most similar to Don Sutton, Gaylord Perry, and Fergie Jenkins, all of whom are in the Hall of Fame.  Bill James defined players with similarity scores of more than 950 as "unusually similar," players with scores of 900 or more as "truly similar," and players with scores greater than 850 as "essentially similar."  Blyleven and Sutton are 914; with Perry, 909; and with Jenkins, 890.  Other essentially similar Hall of Famers to Blyleven include Robin Roberts and Tom Seaver, meaning that it's absolutely foolish that Blyleven has been excluded from the Hall of Fame... but never mind. also lists players most similar by age, in effect comparing (for example) Blyleven through age 37 vs. Sutton through age 37.  (If you view the list of comparisons - available here - you'll note that Bert and Don Drysdale were similar through Blyleven's late twenties, and then Bert and Sutton were similar for the rest of the Dutchman's career.  Which would again indicate that Blyleven is a no-brainer for the Hall of Fame, but again, never mind.)

With all this in mind, I thought it might be instructive to take a look at a few of the current Twins, and see who they compare to in the history of baseball.  Of course, similarity scores are skewed for active players, and the only true way to compare players is after their careers are complete, but it's instructive to see the "path" players are taking, given the most comparable players for a certain age.

Joe Mauer
According to the stats, Mauer and Yankees second baseman Robinson Cano are extremely comparable.  Both have hit over .300 in their careers (Mauer .323, Cano .309), with some power but not a lot.  Mauer's on-base percentage is a lot higher - .403 to .333 - but their slugging percentages are identical.  Through the age of 23, except for Mauer walking five times as often as Cano, the two are virtually the same.

Who's more valuable?  There's no real doubt that it's Mauer - he hits for a higher average and is on base much more often, and Cano's power numbers are barely better than the Twins catcher's.  But it's surprising that the Twins batting champ, who I always think of as a once-in-a-generation-type hitter, is so close to New York's second baseman. After all, Joe Torre once took major flak for comparing Cano to Rod Carew, with the general opinion being that Cano would never be anywhere near the player Carew was.

I guess I just expected Mauer to be something more than "Robinson Cano, if Cano ever drew a walk."  Given that Mauer's only 24, however, it's very true that this comparison is extremely likely to change over his career.

Torii Hunter
One of the great debates of the last several years has been the arguments about Torii Hunter's contract.  You can loosely split the two groups thusly: one group thinks that Hunter is a special case, an outfielder who is getting better as he gets older, and someone who the Twins need to hold onto as much as possible.  The other camp assumes that Hunter has hit his peak and is on the downhill slope, and to throw money at him would be a Yankees-esque move that will only backfire.

Looking at the numbers, you can see the similarities: the first group thinks Hunter is Jermaine Dye.  The second thinks he's Kevin McReynolds.

The similarities between Dye and Hunter at the plate are numerous, especially if you look at their numbers through age 30.  Through that point, both hit right around .270.  The two had very similar power numbers in similar numbers of games, both averaging one home run every 23.7 at-bats.  Hunter stole four times as many bases, but also got caught four times as often.  And their slugging percentages, at .463 through age 30, are again a perfect match.

If you look at Dye's career numbers, though, one thing jumps out at you: he hit 30 home runs twice before age 32, and he drove in 100 runs three times, but more typical was his year in 2003 for Oakland: 24 HR, 86 RBI, 103 OPS+.  And this is true right up until 2006, when he was 32 years old - and hit 44 home runs and drove in 120 runs for the White Sox, for a 152 OPS+.

McReynolds, on the other hand, never hit 30 HR or drove in 100.  He hit .265 for his career, and averaged 25.7 at-bats for every home run, but he declined after age 32, when his OPS+ went from 115 to 95 to 91.  His career ended with the strike-shortened season in 1994, when he was 34 years old.

Hunter is scheduled to turn 32 this July.  And according to the stats, through ages 27-29, he was most similar to McReynolds; through age 30, he's most similar to Dye.  Maybe it could go either way for Hunter.  But you look at his stats so far this year, with Torii on pace to hit more than 30 home runs and drive in more than 100 runs, and with Hunter sitting at a 145 OPS+, and you have to think: maybe Torii's more like Dye.

(Of course, the White Sox right fielder is hitting .240 this year, so maybe this isn't completely a good thing.)

Johan Santana
Frankly, when I looked up Santana's Similarity Score, I expected the comparisons to be a little more famous: Sandy Koufax, perhaps, or Bob Gibson, or somebody else with multiple Cy Young awards.  Surefire Hall of Fame, in other words.  The most similar pitcher through age 27, by contrast, is someone I would not have expected: Tim Hudson, of the Atlanta Braves.

After all, Hudson's never won the Cy Young.  He was second in the voting and on the all-star team in 2000, when he won 20 games for Oakland, but that mostly emphasizes the over-importance that voters put on wins (his ERA was worse that year than any other, except for 2006).  And that's about it for awards for Hudson, except for being an all-star in 2004 as well.

You look at the career numbers through age 27, though, and there's no doubt they're similar.  Santana posted a 3.20 ERA, Hudson a 3.26.  Santana's ERA+ was 143, Hudson's was 138.  Johan was 78-31 in that span, while Hudson was 80-33.  Santana allowed 1.1 baserunners per inning; Hudson allowed 1.2.  The main difference: Santana struck out over nine guys per nine innings, whereas Hudson was down around 7.

All fair comparisons, based on career stats.  But, just for fun, let's take a look at Johan's career, beginning with the first season he was strictly a starting pitcher.  He was 25 that year, and over the next three years, he posted the following numbers:

  1. 182 ERA+, .921 WHIP, 10.5 K/9
  2. 153 ERA+, .971 WHIP, 9.2 K/9
  3. 161 ERA+, .997 WHIP, 9.4 K/9
And then, Koufax's numbers starting the year he turned 25:
  1. 124 ERA+, 1.205 WHIP, 9.5 K/9
  2. 143 ERA+, 1.036 WHIP, 10.5 K/9
  3. 161 ERA+, .875 WHIP, 8.9 K/9
Over that span, Johan went 55-19; Koufax was 57-25.

I point out the comparison between Santana and one of the great pitchers in history, to note this: starting in 1964, Koufax posted ERA+ numbers of 187, 160, and 190.  He won two more Cy Youngs.  On the other side, Hudson posted 133, 125, and 91, and hasn't been considered dominant for a couple of years.

Which way will Santana go?  Will he continue to be similar to Tim Hudson?  Or will he veer off into Koufax territory, given that the last three years have certainly shown him capable?

A few other guys and their most similar players by age, all through the players' most recent birthday:
Luis Castillo - Dave Cash (three time All-Star with the Phillies in the 70s)
Michael Cuddyer - Dave Hollins (nooo!)
Carlos Silva - Hipolito Pichardo

Sidney Ponson and Ramon Ortiz, if you compare their career stats, are "truly similar" (944).  This is a bad sign.... Do you have any other comparisons to make?  The comments are open below...