You Are Here > Baseball-Reference.com > Blog >

SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all B-R content, so you can quickly and easily find the content you want.

Also, our existing B-R blog rss feed will be redirected to the new site's feed.

Baseball-Reference.com » Sports Reference

For more from Andy and the gang, check out their new site High Heat Stats.

Contest: most dissimilar player

Posted by Andy on November 27, 2009

Most Baseball-Reference.com users are aware of the site's inclusion of Similarity Scores for each player.

By way of example, here are the players tto whom Mark Teixeira is currently most similar:

  1. Kevin Mitchell (913)
  2. Miguel Cabrera (905)
  3. Tony Clark (883)
  4. Dick Stuart (868)
  5. Geoff Jenkins (861)
  6. Gus Zernial (856)
  7. Aubrey Huff (855)
  8. Richie Sexson (853)
  9. Richie Zisk (853)
  10. Ripper Collins (853)

This is the similar batter list for career totals. (Each player's page also lists similar players through the current age of the player as well as similar players at past ages for the player.)

So at this point in time, Mark Teixeira's career totals are most similar to Kevin Mitchell's career totals, which is not bad considering that Tex will just be turning 30 around the beginning of the 2010 season. For an explanation of how similarity scores are calculated, see here. I really like the system although I admit I'd prefer if it didn't consider the defensive position of each player so that we could compare based on offensive performance alone.

Anyway, I'd like to try to identify the players who are least similar to any other players.

Here's what I meant. If you look at Teixeira's list above, his top similarity score is 913. However, there are other players whose stats are so unusual that they have a top similarity score that is much lower. Barry Bonds, for example, has Willie Mays as his most similar player but with a score of just 762. By comparison, the guy most similar to Mays himself is Frank Robinson with a score of 830.

I want to find the player with the lowest #1 similarity score. I already know of one star player with such a score much lower than Bonds' but I'll let you, the readers, figure it out.

Let's also create a few categories: lowest similarity score for 1) retired players with at least 1000 games played, 2) retired players with under 1000 games played, 3) active players with at least 1000 games played, and 4) active players with under 1000 games played. I'm talking about only positional players here, not pitchers (or pitchers' similarity scores as batters.)

Go ahead and post whatever you find in the comments. I'll check back on this post at the end of the year (Dec 31) and see who posted the earliest comments with the best answers. Comment as many times as you like.

What are the prizes? As of now, there are none beyond bragging rights. However I am going to add some next week so stay tuned.

This entry was posted on Friday, November 27th, 2009 at 7:04 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

20 Responses to “Contest: most dissimilar player”

  1. Rickey Henderson's most similar player is Craig Biggio at 713.

  2. Pete Rose's most similar player is Paul Molitor at 678. I'd be stunned if there's a lower score.

  3. Pete Rose, almost certainly, for 1).

  4. See, this is what you get for doing research. I knew it was Pete Rose, but decided to double check and lost out by seconds. From now on, I should do research Fox News style, i.e. not at all.

  5. For 3), I'm going with ARod at 799.

  6. For active players under 1000 games, I'm guessing Hanley Ramirez at 866.

    For retired players under 1000 games, I'm guessing Dave Orr at 878.

  7. BunnyWrangler Says:

    I would just like to say that, in searching for this, I saw that the hitter most similar to Russell Martin is Johnny Estrada (954). I say this not to criticize the system but to bring up one of its strangest results.

    Martin is one of the fastest catchers, at the very least the best basestealing backstop of recent years; Estrada didn't try to steal a base even once in his career. Martin has walked about 65 times per season; Estrada rarely took a base on balls. I thought that Martin would have a low similarity score because his talents are unique for a catcher, but I was wrong. Instead of finding another relatively fast catcher, though, it brought up one of the slowest ones I can remember watching. Then again, Martin's most similar player through his age (26), is Thurman Munson, who seems like a much better comparison.

  8. Most unique players: http://www.baseball-reference.com/leaders/similarity.shtml

    So I guess Rose was right and Rickey was wrong. It doesn't say active but I'm going with ARod for that.

  9. Yep I was slow too! Found #2 Cy Young and saw the link for unique players.
    BunnyWrangler: The similarities are stats alone. So position, size, quickness do not matter. The first one I thought of was Randy Johnson and the shortest player, then I read "For an explanation of how similarity scores are calculated, see here." (above)

  10. At an individual age nobody is under 700 except Cy Young at 40 years old- 679.8 with Pete Alexander.

  11. Pete Rose was the guy I'd found with the lowest score, but that doesn't mean he's necessarily the answer...let's see if anybody can find someone even lower.

    As for comments about Martin and Estrada, I totally agree that the sim score system breaks down when the guys haven't played too many games. Sim scores are calculated strictly on a points system, not a rate system, meaning that two players who played 1500 games and have a sim score of 900 are actually much more similar than two players who played 500 games and also have a sim score of 900. In other words, either way there is 100 points of variation, but in the first case it might be spread across 10 years while in the second case it might be spread across just 3-5 years.

  12. BunnyWrangler Says:

    Tiger_Fan:
    I wasn't saying that the system whiffed or anything, and I know that it measures only statistics (although it does factor in position). My comment was more about how odd I found it that the career statistics - at least the ones measured by similarity score - of two very dissimilar catchers, Martin and Estrada, were actually pretty close.

  13. I give Ty Cobb a shot. Not sure how the numbers differ on your list and this one: http://www.baseball-reference.com/leaders/similarity.shtml

  14. wow. waking up at noon has its consequences. when reading this article, i knew it had to be pete rose becuase i coincidentaly was looking at the most dissimilar players leaderboard.also, most unique "active" is barry bonds at 762. http://sports.yahoo.com/mlb/news?slug=ti-uggla111309 -at the bottom of this acticle it says that bonds is still active. So because of the leader board i know that if you want to consider him active, he is the least similar, if if he isnt active, it would have to be a-rod with 771 through age 33.

  15. p.s. wouldnt it be cool if there was a thing like the oracle except for similarity scores?

  16. As you might imagine, I did not know the dissimilar leaderboard existed!

  17. This was really cool to mess with, though I missed the boat by a lot.

    Now I'm trying to find some pair with a 1.000 similarity rating. The closest I have so far is a .991 at age 27 by John Foster and Rodrigo Lopez. I know there's gotta be a 1.000 out there somewhere (at least for a young age with little MLB time), but I'm looking for two players with like a .994 over two ten year careers. How wacky would that be?

  18. Also, still messing around here. Has anyone seen Gary Sheffield's similarity by age?
    You've got Gary Clark, Ryan Zimmerman, Scott Rolen, Dale Murphy, Jack Clark, Chipper Jones, Duke Snider, Jeff Bagwell, Fred McGriff, and Reggie Jackson in there. And similarity overall he pulls Mel Ott, Reggie Jackson, Ken Griffey, Fred McGriff, and Mickey Mantle as his top five (three hall of famers and two who probably will be).

    Impressive comparison. But would any of us consider Sheff a HOFer? He did win a world series, and he finished in the top 3 in mvp three times.

  19. I don't think you'll find a 1000 score because they aren't given unless a player has at least a minimum playing time and the odds are stacked quite high against any two players with, say, 3 years of experience having identical totals across all categories. I'd say it's a million-to-1 shot.

    Sheffield compares to those guys because he's played many years and racked up high totals. He's a very good player but falls short of HOF in my eyes mainly because he was not a particular dominant player for any significant stretch of his career. He put up big numbers alongside a bunch of other guys.

  20. JohnnyTwisto Says:

    I think Sheffield will have difficulty making the HOF soon because of his bouncing around from team to team, multiple injury-shortened seasons earlier in his career, and the various attitude/character questions. But he may very well deserve it as he was a tremendous offensive player for many years. He could be the type of guy who gets in 50 years down the road, when people mostly just have the numbers to go on and can't believe this type of hitter was never inducted. (On the other hand, Dick Allen doesn't seem to be particularly close to getting in...)

    Here's an impressive list of hitters with similar career numbers: http://bbref.com/pi/shareit/S8kYz