This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Should Defense be More Consistent than Offense?

Posted by Sean Forman on September 6, 2011

Is WAR the new RBI? | Its About The Money

I'm not going to speak to the larger discussion around this article because the author loses a lot of credibility/weight (at least for me) when he admits in the comments, "I slightly misrepresent how the stat works here, in favor of making the statement more hyperbolic." That and the article is being discussed elsewhere, but one of the comments points out a particular issue that annoys me.

Here is the Comment by Hank

I'll post this again....

Carl Crawford career LF at the Trop 22.5 UZR/150
Carl Crawford career LF everywhere else: 7.5 UZR/150

This is over 8 years (so each sample size is the rough equivalent of 4 full years).

1 year OF UZR samples are bad, but even the general "3 years is what you need" can also have issue as UZR can have systematic biases.... input bias, park effects and subjective components (armR, errR for outfielders) which don't even out over a 3 year period.

I like the concept of WAR and the issue I have with it is bad input data (the defensive stats in both WAR models and the baserunning values now put into the fWAR). Until these variables can be measured better (FieldFX?), any difference in WAR between players based on these components should be taken with a huge boulder of salt.


THANK YOU for this post.

UZR may be good for multiple years, but it is a flawed single season stat. Saying Carl Crawford one year in TB went from being an elite LF to being a terrible one the next year makes no sense.

What is the issue here exactly? Why would a difference in home and road UZR be evidence of a flaw in UZR? Or why would the fact that Crawford's UZR is much worse this year mean there is a problem with uzr? Some players hit a lot better at home than on the road? Why wouldn't fielders see the same effect.

For some reason when defensive numbers are inconsistent across splits or years folks gnash their teeth and blame the faulty defensive metrics, but Carl Crawford can go from an OPS+ of 135 to 82 in a year and people won't question the offensive numbers.

Why do we expect greater consistency on defense than on offense? That doesn't seem to me to be a valid expectation.

55 Responses to “Should Defense be More Consistent than Offense?”

  1. Mike Fast Says:

    Sean, I don't know if it's a correct expectation (i.e., I haven't tried to measure it), but that assumption certainly seems reasonable to me.

    Batting relies on millisecond timing and quick visual processing for pitch recognition. If something is wrong in your mechanics that throws off either one of those by a little bit, batting performance can suffer by a lot.

    I don't really see analogous skills for fielding that live on the same razor's edge of success and failure. Getting a good jump on the ball is a matter of tenths of a second, not milliseconds, and tracking a batted ball well with your eyes isn't nearly as tough as tracking the seams on a baseball that's spinning 40 times a second.

    It's possible that all the different skills wash out in actual baseball contexts such that the spread and consistency of performance is similar in fielding and hitting, but I certainly wouldn't guess or expect that it would be so.

  2. Evan Says:

    I think the problem is, as John Dewan has pointed out on numerous occasions, that fielding statistics are in their infancy. When evaluating batting or pitching numbers we now have a lot of secondary numbers to look at when we see a seasonal performance, e.g. a batter might have an abnormally high BAbip to explain better than normal performance. We haven't yet developed as complete an understanding of fielding.

    I don't think it is all that surprising that many people expect defense to be consistent from year-to-year. This is clearly evidenced in the frequent historical practice of repeatedly awarding gold glove awards to the same players year after year. Defensive differences are usually in the plays that one player doesn't even get close enough to make a play on - it is very hard to see a negative or non-play. As a consequence, of the difficulty in viewing and evaluating defense people assume that the player is making plays in a manner that is consistent with his reputation and previous results.

    There's a lot of ignorance about evaluating defense. One example from the article that Sean references are that the author doesn't seem to realize that putting 3 very good outfielders in the same outfield will diminish the contributions of each of those players (because there are certain percentage of balls in play that couldn't be caught by an average fielder, but the defenders' exceptional abilities allows each of them to catch ball - but only one of them can catch any particular ball); this doesn't mean that the players have diminished ability, just that they have diminished contributions. I think this difference between evaluating ability and skill is one that a lot of critic of advanced stats fail to recognize - evaluating a player's ability on offense or defense based upon one year's worth of statistics is a flawed way of evaluating his skills because his performance can vary significant;y from year-to-year.

  3. Johnny Twisto Says:

    I think Evan's comment is on point. We have no innate familiarity with fielding stats. Fans generally evaluated fielders as great-good-average-bad-terrible. I don't know how many people ever thought of defense in terms of runs saved, and if they did, no one really knew what numbers were reasonable. We used to see comments like "Ozzie Smith saves a run a game." Everyone "knows" Ozzie Smith was a great fielder, so we paint with a broad brush and assume he was a great fielder every season of his career. Well, he probably was, but that doesn't mean he performed at exactly the same level every season. Once a player got a defensive reputation, it was hard to remove that, because we didn't have any hard numbers to give us a reason to. It leads to the assumption that a good fielder is a good fielder is a good fielder, year after year after year.

  4. BSK Says:

    Well, we know that luck is a factor in batting. So, as luck fluctuates, batting stats will fluctuate.

    Is there luck involved in fielding?

  5. Jason461 Says:

    It seems to me that the argument the commenter is really making has to do with the lack of park adjustment and the like. For instance, we understand that Coors is a great hitter's park. It is similarly possible, I would guess, for a stadium to be a great fielder's park in certain contexts (perhaps the turf at the Trop makes it easier for Crawford to use his speed getting to fly balls, etc.). This should be adjusted for. If someone was a career .900 OPS hitter at home and a career .600 OPS hitter on the road, we'd all think something was up.

  6. BSK Says:

    I also think the reason we are less skeptical of a dramatic change in batting stats is because we generally understand the components better. Even if you don't show me the OPS+ gap, I can see all the other numbers that are down for Crawford. What "simple" numbers can I look at to confirm massive changes in UZR or other defensive metrics? That isn't to say they are wrong... but it impacts how we respond to it.

  7. Jason W Says:

    I think part of it is also the visibility of hitting results and the relative invisibility of fielding results. Here's what I mean:

    Suppose you could see every ball hit to Carl Crawford in 2010 and then every ball hit to him in 2011. Chances are, you wouldn't see a major difference. Sure, you might be able to realize that he "seemed a little off" in 2010, but, for the most part, you'd think he was playing pretty much the same.

    Now, take this odd example: Suppose you could watch every Carl Crawford plate appearance -- but only his swings and the contact. You don't get to see where the ball lands. He's got a slightly higher K rate this year, , and sure, you could identify some great blasts and weak grounders, but otherwise, what would you see? Could you actually pick out that he's lost 160 points off his OBP? Or would you just see him hitting the ball a bunch and not knowing if they're line drive singles or lazy fly balls?

    In real life, of course, we know where those balls land. And a professional manager or batting coach could observe Crawford's PAs and probably make some observations, but you and I couldn't, most of the time. That might be why we think batting fluctuates more than fielding -- because we see those semi-random results, making each PA appear more different from one another, even though many of them appear very much the same. In fielding, we see the process and the result at pretty much the same time, so we have a harder time disassociating the two.

  8. Doug Says:

    Park factors for defense. Good idea, but doing it right would be quite a chore.

    Parks like Oakland Coliseum (C, 1B, 3B, LF, RF) and Fenway (basically the same positions, but especially LF) come immediately to mind as seeming likely to have measurable effects on probability of recording outs at certain positions.

  9. Dann M Says:

    Maybe part of the issue with home/road splits is knowing how to play certain aspects of the game in one's home park. A butcher like Soriano might be a better LF *at Wrigley Field* than a generally good LF on another visiting team at Wrigley because he [should] know how to play the sidewall, play the ivy, and play the daytime sun, not to mention the wind. That, combined with his contribution to what I call the Manny Rule (terrible outfielders can accrue extra assists because of presumptions about their inadequacy), might add up to his previously positive UZRs as a Cub LF.

    Wrigley's day games and Boston's Green Monster seem similar in that they add very specific and unique park effects that ought to benefit any home player because of the difference from any home park. Similarly, dome players like Crawford at the Trop could look great there compared to open-air home-field visitors who aren't used to the poor view allowed by the roof. Remove him from his great advantage and put him in a far-from-neutral spot, and there is a new learning curve.

    There's no real data I'm going on here, other than my Fonzie experience as a Cubs fan and the related numbers. But the simple fact that we recognize park effects for offense, pitching and defense ought to imply that these conditions likewise effect quality of play as well as degree of difficulty.

  10. Sean Forman Says:

    TZR is all handled by park, so the percentages of fielded balls is separated out by park.

  11. Johnny Twisto Says:

    Well, we know that luck is a factor in batting. So, as luck fluctuates, batting stats will fluctuate. Is there luck involved in fielding?

    I never like when statheads use the word "luck," as it implies something is completely out of the player's control. I prefer "chance," which some might think means exactly the same thing, but to me it has a different connotation.

    Anyway, I'd say "luck" probably can affect defenders more. If an infielder gets a bunch of bad bounces, that probably is something completely out of his control, but in results in balls he can't field. Moreover, a large percentage of fielding opportunities are pretty routine. I think it's a small set of plays which separate great fielders from poor ones. So with a small sample, there's a greater likelihood that a fielder's performance will vary due to random chance.

    Park factors for defense.

    Several years ago, I recall MGL (the creator of UZR) rating Manny Ramirez as an average player overall, because his defensive rating was so atrocious. Ramirez was being debited for balls which were uncatchable because they hit 15+ feet up on the Monster. After MGL finally made some adjustment for the tiny Fenway left field, Ramirez's rating changed dramatically.

  12. Johnny Twisto Says:

    what I call the Manny Rule (terrible outfielders can accrue extra assists because of presumptions about their inadequacy

    I think Manny had a pretty decent arm (don't forget he was a RF in Cleveland).

  13. Timothy P. Says:

    Speaking of stats, there is a historical milestone getting close that has not received much attention. Juan Pierre is sitting at 1998 career hits and was going for 2000 tonight against the Twins but went 0 for 4. Juan will try again tomorrow night in Minnesota and America will be watching! As far as defense goes, Juan has made just 1 error since May 1st.

  14. Johnny Twisto Says:

    Timmy, did you send Pierre some supportive emails? Some defensive tips? When you arrived here you were lamenting his recent defensive performance. And since then, he's been nearly flawless! I can only assume if you had been posting here a couple years earlier, he'd be zipping towards 2200 hits by now.

  15. Timothy P. Says:

    @15 Well 5 errors in 4 weeks for an above average, speedy outfielder should stick out to anyone familiar with the game. What Juan has to be worried about is his drop off in SB and that he's been caught so many times this year and he just turned 34. But to answer your question, yes I do feel somewhat vindicated by Pierre's hitting and fielding resurgence since early in the season. The same with the Brewers being so successful since Weeks went down. I really called that one.

  16. Pete Says:

    Could luck have as much to do with the variance in fielding as it does in batting?

    Consider the Brewers total zone fielding runs for the last 2 years:
    2010 - 37 BELOW avg (68% Def Eff)
    2011 - 23 ABOVE avg (70% Def Eff)

    Their collective gloves are 60 runs better than last year despite small changes in their roster. The only significant difference is SS (replacing Escobar with Betancourt). The rub is that Betancourt is a below avg SS and Escobar is above avg. We would have expected Milwaukee's fielding performance to actually decline this year. Can this be explained?

  17. Johnny Twisto Says:

    Are you still a Jemile Weeks fan, or have his hair and genetic makeup finally rubbed you the wrong way?

  18. Johnny Twisto Says:

    We would have expected Milwaukee's fielding performance to actually decline this year. Can this be explained?

    I'm too lazy to dig into the numbers right now, but it's an interesting question. How much of that is due to random variation, how much to stuff that Total Zone misses, how much to actual improvement among the holdover Brewers?

    I'd love for someone who watches the Brewers a lot to share their opinion.

  19. Andrew Says:

    I already commented on the article itself but I just want to reiterate that WAR is often a lot less flawed than people's understanding of it. Sean, you bring up a very interesting point regarding defense - we expect offense to fluctuate but rarely make such allowances for defense. The quintessential 'luck' stat for batting is BABIP - though I recently saw a very good article on Fangraphs which combined BABIP with HR/FB% for an ultimate 'luck' rating. We can easily accept that a player's BABIP can be influenced by a few ground balls sneaking through for hits or line drives falling in (or vice versa). Why is it so hard for us to believe that, perhaps, Ichiro Suzuki has simply been very unlucky in the field this year? He certainly has been at the plate.

  20. kds Says:

    @16,18. Since DER is what I might call a primary statistic, coming directly from events on the field, while Total Zone is a secondary statistic, the change from 2010 to 2011, is unlikely to be; "Total Zone misses." The 2% change in DER is exactly what one would expect from a change in 60 runs, and vice versa. DER and total Zone do not always agree this well. For team defense I would always prefer to start with DER, (and make park adjustments).

  21. Bip Says:

    Pitchers change but the way the ball leaves the bat doesn't? Maybe being in a road park affects how he sees the ball off the bat and judges its trajectory. Also maybe he doesn't know the best place to position himself in road parks. Maybe batters approach hitting in their home park in a particular way because they know what works in that park which makes it more likely balls will be hit in a way he's unfamiliar with, or they'll be more likely to be hit to gaps.

  22. Timothy P. Says:

    @17 Genetic makeup?? Teasing a player on your favorite teams arch-rival about his goofy hair style is a long way from being in the Klan. Really Twisto, that is beneath you and something that should not be thrown around lightly. A high batting average with 75 walks and less than 50 strikeouts knows no color, religion, or national origin. Shame on you Twisto.

  23. aweb Says:

    I think it's a small set of plays which separate great fielders from poor ones. So with a small sample, there's a greater likelihood that a fielder's performance will vary due to random chance.

    This. Especially in the outfield, sometimes fielders just don't get any realistic tough chances and only get routine plays for long stretches. Hitting and fielding just aren't that comparable. Hitting is a long series of medium probability success (every pitch, every PA) chances. Matchups might change the baseline success rate by 20-30% (platoon splits, favourable matchups). Kind of like flipping slightly biased coins, very predictable over time.

    Fielding is a long series of widely varying chances, often mostly at the high end of success probability (99%+ for recieving a good throw at firstbase or catching a routine flyball - for everyone in the league), and sometimes at the very low end (1% for a liner to the gap, or a shot down the line at third). Nowhere near as predictable, due to the variability.

    Defensive ability might be more constant (might be...), but defensive chances are a completely different beast.

  24. Matt Says:

    .....and, Nick Swisher is the second best defensive player in the game, Brett Gardner is an all-star, and Ben Zobrist in an occasionally MVP candidate. WAR is a good stat that should be taken with a grain of salt, and UZR should be correctly taken with a boulder of salt.

  25. John Autin Says:

    @24, Matt -- I think that Brett Gardner is one of those cases where eyewitness observation really helps gain a true appreciation for his defensive skills. I've just never seen an OF consistently get such quick jumps on balls, and then his acceleration is tremendous; he seems to be at top speed in just a few steps, and rarely takes a bad route.

    Just my opinion, but since his UZR numbers are pretty consistent and jibe with my direct observations, I'm fully sold that Gardner is a fantastic outfielder.

  26. Johnny Twisto Says:

    Timmy/22, I was only referring to the Weekses being brothers. I didn't mean anything more than that. I should have been more clear, since I know you've been accused of more nefarious thoughts on this board before.

  27. Matt Says:

    I never debated whether Gardner was a a great defender. I debated his WAR numbers that state he's an all-star. He is not an all-star IMO, but he clearly is a great defender (he single-handedly won Nova's last start for him with his 2 great catches in the field in the first inning keeping inning to 2 runs instead of 5 and then his two run homer late in the game), and I'm firmly in his camp. I get tired of Yankee fans giving him cr*ap. They just want great offensive players at each position. Many (not all by any stretch) of the Yank fans don't realize that along with a very deep bullpen (best in years), greater team speed (most in years), a deeper bench (best in years), that a better defense is also to a degree responsible for the Yankees having a decent chance of taking the WS despite the lack of high-end starters. However though, if you look at his overall WAR, he would clearly merit all-star status, of which, he is not. Hence, take WAR with a grain of salt, and UZR with a boulder of salt, especially when considering that UZR gives Swisher the second highest defensive metrics in the game. Swisher is an improved outfielder, but he's not the second best defensive player in the game.

  28. Johnny Twisto Says:

    Kds/20, good point about the change in DER directly corresponding to the change in TZ. The question still remains, though, why the big change in DER when we might have expected their defense to get worse? Maybe it's the pitching.

  29. mosc Says:

    I've said it before. Nick Swisher tells you all you need to know about dWAR. He's a decent player, maybe even an above average defender in right, but certainly not worth DEFENSIVELY what an athletic shortstop is worth.

  30. Johnny Twisto Says:

    I've said it before. Nick Swisher tells you all you need to know about dWAR. He's a decent player, maybe even an above average defender in right, but certainly not worth DEFENSIVELY what an athletic shortstop is worth.

    And I've said it before, WAR does not say that he is. dWAR is mislabeled, because it is not the player's defensive value. It is only his performance compared to others at his position.

    Defensive value is Rfield (performance) plus Rpos (position). The average SS is estimated to be worth about 15 runs more than an average RF.

    I doubt Swisher is really a +16, but he's good. Guess what, defensive statistics are imperfect. Everyone knows this. That doesn't make them useless.

  31. Jason Winter Says:

    And a good-hitting shortstop or second baseman isn't worth offensively what a good-hitting right fielder or first baseman is. But we accept that a guy who hits, say, .300/.375/.500 at a "defensive" position offers about the same value as one who hits .325/.410/.550 at a "hitter's" position. If we didn't, then a SS/2B/CF/C would almost never win the MVP.

  32. Jason Says:


    Defensive statistics are certainly imperfect. Everyone accepts that. And as you point out, this doesn't necessarily make them useless. However, what is the actual evidence that they have any value whatsoever?

    There are a lot of parameters that go into stats like UZR and all of these have an associated measurement error. What makes you think that the measurement error inherent in the defensive statistics isn't actually greater than the slight differences they are trying to measure? What are the confidence intervals associated with the estimates? Why would anyone put any stock into the defensive measurements without knowing what kind of confidence we should have in the estimates?

    I know a homerun is a homerun. I'm very confident that the measurement is correct. I don't know that a ball that Granderson fails to field would have been handled easily by Ellsbury. I want to know how confident I should be in a stat that tells me Ellsbury catches the ball that Granderson doesn't. Until I know how confident I should be in the estimate, I regard the statistic as useless.

    There is no evidence that Ellsbury and Pedroia are MVP candidates because both their candidacies are based entirely upon estimated defensive superiority with no indication of how reliable the estimates are. All we know is that neither is even the best hitter at their respective positions in the AL East, let alone the league.

  33. Lawrence Azrin Says:

    @32/ Jason - I am not quite sure what you saying here in your comment:

    - we cannot be sure that Ellsbury and Pedroia are MVP candidates, because the evaluation of their defensive value is not reliable
    - Ellsbury and Pedroia cannot possibly be serious MVP candidates, because Granderson and Cano are better offensive players at their respective positions

    If it's the second, you are just flat-out wrong; plenty of writers have called Ellsbury and Pedroia serious MVP candidates. If it's the first, you are also wrong, because many of these writers have called Ellsbury and Pedroia serious MVP candidates without ever referring to advanced defensive statistics

    Even if Cano and Granderson are indeed having better offensive years than Ellsbury and Pedroia, is that a good reason to not consider them _at all_ for MVP? Even if we cannot agree on the methods, there's a legitimate argument that Ellsbury and Pedroia are better defensive players than Granderson and Cano at their positions.,and just as or more valuable than Granderson and Cano.

    Justin Verlander has better pitching stats this year than C.C Sabathia in the same division, but that doesn't mean Sabbathia won't receive serious CYA consideration.

    You wish that defensive stats had the same degree of accuracy as offensive stats; this is commendable and I agree. However, most offensive stats also have some error built into them. For example, there are a small but distinct number of "hits", that could also be called "errors".

    This is not just limited to the difference between hits and errors: human judgement enters into any labeling/counting system - let's say a very fast batter hits a ball in the gap, the right fielder bobbles it slightly, the batter reaches second a few feet ahead of the throw - is it a single/error or a double?

    No judgement system is absolute; to demand that level of confidence in any evaluation system is unrealistic.

  34. Baseball: September 7th, 2011 Recap » Stathead » Blog Archive Says:

    [...] mail about WPA… Carson Cistulli posted the latest Leaderboards of Pleasure… Sean Forman asked why people hold defensive metrics to a higher standard of efficiency than hitting metrics… [...]

  35. Johnny Twisto Says:

    Defensive statistics are certainly imperfect. Everyone accepts that. And as you point out, this doesn't necessarily make them useless. However, what is the actual evidence that they have any value whatsoever?

    That's a fair question and I'm probably not that well qualified to answer it, but I shall try.

    As long as baseball has been played, defensive statistics have been incomplete and imperfect, yet people have evaluated fielders' skills. At first this was probably just done visually, then by trying to make sense of the traditional stats, and now more and more with advanced numbers. I guess I would ask you, do you think we (fans/management/whoever) have ever been able to evaluate defense? I don't mean perfectly. But do you think we have had any clue as to who the better fielders are, and who the worse ones are? I think we have. Some time back I made the comparison of Ozzie Smith to Howard Johnson. Everyone "knows" Ozzie was a better shortstop than HoJo. How we do know this? We don't have a precise count of Ozzie's "home run" plays, or how many grounders HoJo didn't reach that Ozzie would have put in his back pocket. It's just accepted as a fact. Observers agreed, Gold Glove voters agreed, tradtional stats agreed, advanced stats agreed.

    There's no single benchmark by which to confirm that we are right about Ozzie. It's just something that one has to accept. Now, one might say we don't need any statistics to prove Ozzie's greatness, we could tell just by watching him. Perhaps that's true. But there ain't many Ozzies out there. 90% of MLB fielders fall somewhere between spectacular and abysmal, and 90% of plays don't require extreme effort. So you may not be able to differentiate between the pretty good and the subpar defenders unless you have the opportunity to watch them a lot. And even if you watch them a lot, we know about perception bias, which can taint our opinions and recollection of what we've seen. And on top of that, I don't have time to watch all 30 teams regularly. If I can't rely on my eyes to rate San Diego's third baseman, how do I know how good he is? There's anecdotal evidence, but maybe I haven't heard or read any opinions about him.

    So I need to rely on numbers. As I said, there's no way I can "know" if they are "right," but I just have to have a level of faith. Do different defensive numbers tabulated different ways corroborate each other? Do the stats corroborate general perception? Do they corroborate my own personal opinions? Do they rate Gold Glove winners high? As I said in this thread
    that list of Total Zone's highest rated players looks pretty close to a subjective list of the best fielders. You might disagree about some, but for the most part it looks "right." And if it works well for the players I "know" are good (or bad), I'm inclined to give it weight for the ones I don't know about. But it's just part of the puzzle. I want the advanced stats, I want the Fans' Scouting Report, I want my own opinion if I've seen the player enough. And even if all those assessments are perfectly aligned, there is still some degree of uncertainty. But I have something, whether I want to give it an exact number, or be as vague as "I'm pretty sure this guy is a very good defender." And I think that's better than nothing, whether it's HOF/MVP debates, trade discussions, or just chewing the fat.

    I don't know if you will find that at all convincing, or responsive.

  36. Jason Says:


    "No judgement system is absolute; to demand that level of confidence in any evaluation system is unrealistic."

    I am not demanding the system be absolute. I just want to know how confident I should be in the measure. This is just common practice in all real statistics ( It is a necessary component to interpret the estimates. How confident are you that a UZR of 10 is really greater than a UZR of 5? If the estimates are 10 +/- 0.001 you should be very confident. On the other hand if the estimates are 10 +/- 10, then the estimates are entirely meaningless.

    I want to know what the confidence intervals are for UZR, etc. Until I know it, I will regard the numbers as nothing more than random numbers (i.e. entirely meaningless).

    You correctly point out that there is also measurement error with hitting stats. However, I have a sense of the magnitude here. I am pretty confident that 38 homeruns really is greater than 24.

    It is pretty well accepted on sites like this that batting average is a flawed statistic for measuring hitting ability. People say the reason for this is the luck component. Another way to think of it is measurement error. The estimate of hitting ability (BA) might be quite different from the real ability because of the way we are measuring it.

    We can only hope UZR and related stats are as good as BA. It is likely the case that they are much, much worse. But we just don't know so we ought to ignore it entirely until the data is published.

  37. Jason Says:


    I think you very accurately described how people use defensive statistics. The problem is that it masquerades as objective rigor, when in fact, the stats people are really doing the exact same thing the scouts have been doing. The scouts watch the players play and then give their subjective opinion of ability. The stats people measure things and reduce ability to a number. This is good. The problem is they then subjectively compare the numbers to determine ability level because they denude their measurements of all their error. This is wrong.

    We can know how confident we should be in defensive statistics. No one, to my knowledge has ever bothered to check though. One simple way to do it would be to bootstrap the data. Bootstrapping is a resampling technique where you repeatedly resample your data to create a distribution of values. You then compare your measured value to the distribution of resampled values to get a sense of how good your measurement is.

  38. Jason Says:


    As an addendum:

    Howard Johnson was one of my favorite players growing up! I wanted to wear sweatbands up my forearms in little league to be like him. I remember him as a really good defensive 3B! ...but who knows?

  39. Johnny Twisto Says:

    Hmm. I think of some what you refer to has been done. I've often read that one should be as confident in a full season of UZR as one would be in two months of OPS. I've never heard of "bootstrapping" and I'm not sure I understand what you mean -- can you explain that further?

    One thing I didn't write before was that if something is a skill, it is repeatable. So if defensive stats were capturing a skill, there should be some consistency in the results. We can find guys who go from +10 to -10 (and sometimes that might be random variation, and sometimes it might be a change in ability), but generally the good performers continue performing well, and vice versa. I think you would find a positive year-to-year correlation in the results, which should indicate that they are not random.


    Johnson may have handled 3B ok, but I think he was stretched at SS.

  40. Jason Says:


    This gives a pretty good explanation of bootstrapping:

    My sense is that fielding is a skill that is much easier at repeating than hitting (fielding percentages for even poor fielders approach 100%, while a great hitter makes out 60% of the time). Yet, UZR fluctuates wildly from year to year. Sometimes Curtis Granderson is amongst the best center fielders in the game and sometimes he is amongst the worse according to UZR. Sometimes Jeter is the worst shortstop ever and sometimes he's above average. Yet fielding is a skill that is much easier to repeat than hitting. Why would this be? I think it is because the estimates that we get aren't all that close to the true value they are trying to capture.

  41. Sean Forman Says:

    Yet fielding is a skill that is much easier to repeat than hitting.

    Repeating this does not make it so.

  42. Jason Says:

    "Repeating this does not make it so."

    True. The fact that fielding percentages are close to 1 does, however.

  43. Evan Says:

    Jason @36

    BA is considered a flawed statistic for measuring offensive contribution because it treats singles, doubles, triples and home runs as equivalents and treats walks as non-events. The objection is that it measures something that is far less important than other easily measured percentages (OBP, SLG, OPS) evaluate, not that it can be influenced by luck (which can similarly affect the other "slash stats").

  44. Sean Forman Says:

    True. The fact that fielding percentages are close to 1 does, however.

    How? It just means that gloves and fields are better and players catch more of the balls they get to.

  45. Sean Forman Says:

    Circling back to Mike's first comment in the thread.

    I don't really see analogous skills for fielding that live on the same razor's edge of success and failure.

    By this argument shouldn't pitching be even more consistent. No reaction time is required. All of the actions are under the pitcher's control.

  46. Jason Says:


    Isn't catching the balls they get to the very definition of repeatable? They pretty much make the play every time. ....on the other hand, even the best hitters hit foul balls, grounders and pop ups in batting practice (let alone when the pitcher is trying to get them out!). ...hitting is really, really difficult. Fielding, not so much.

  47. Raphy Says:

    Sean - Do people expect "greater consistency on defense than on offense" or an equivalent.

    While offensive numbers fluctuate, generally there is a fairly predictable progression of offensive statistics throughout a player's career. I have not spent a lot of time on defensive stats (I'm still waiting for that PI you promised.), but my impression is that there is a lot less predictability in the overall fluctuations.

  48. Johnny Twisto Says:

    Isn't catching the balls they get to the very definition of repeatable? They pretty much make the play every time.

    Right, except for the ones they don't. See Aweb's #23. And also, fielding % is only based on chances, whereas so much of defense is about balls *not* reached, which don't show up as chances.

  49. Johnny Twisto Says:

    Is the old Zone Rating still tabulated? On maybe? There you'll see your spread. I believe it is a simple division of plays made divided by all balls in the player's zone. No one's close to 100%, and the results wouldn't be all packed together like fielding %.

    The more modern stats generally take the same approach, they've just converted the results into runs saved above or below average.

  50. Jason Says:


    So there is pretty much no variability in the balls a player gets to. They almost always make the play. All of the variability in the widely variable defensive stats must then come from their ability to get to the ball if we are to believe that the variability is real and not simply a measurement problem. But why should we believe this is the case?

    Sometimes players aren't quick enough and sometimes they are to get the exact same ball? Or maybe the ball that is rated as exactly the same actually isn't. Or maybe its both. What do you think?

  51. Johnny Twisto Says:

    I read the article on bootstrapping, or rather, I tried to read it. It's mostly Greek to me. I've never taken any courses in statistics. I'll give it another shot later.

    I think Sean Forman was a mathemathics professor -- if you're still hanging around this thread, can you bootstrap the TZ numbers and tell us what you find? And what it means?

    Or Jason, can you do it, in Excel or something? Is the data you need available on B-R?


    I'm not sure if I completely understand what you're asking in #50. It sort of reads like you think the differences in defensive ratings are (at least mostly) due to measurement problems. I don't deny that measurement problems exist. But do you think every major leaguer has essentially the same defensive ability? I doubt you do so I'm assuming I'm misunderstanding you.

    Sorry if I'm coming off as a dunce.

  52. Jason Says:


    As far as I can tell, the data is not available on the site for me to do it. If the data were available, it would be pretty straightforward to do with software such as R (

    No, clearly players have different fielding abilities between them. I am talking about the variation within a player. According to FG UZR Curtis Granderson is amongst the worst CF this year. Last year he was one of the best according to them. He looks the same to me. He doesn't seem to be hurt. Why the variation? I think it probable that the same player is playing at the same level and the difference is error in measuring it.

    I really do think much of the difference in defensive statistics is error. I think the error in the measurement is probably on the same order as the differences we are trying to measure making the enterprise entirely misleading. I want to know if a player with a UZR of 20 is really better than a player with a UZR of 10. If the confidence intervals of both measures largely overlap then we should regard the differences as insignificant.

  53. Thursday Links (8 Sep 11) | Ducksnorts Says:

    [...] Is WAR the new RBI? (It’s About the Money, Stupid). This looks like an interesting read, if a bit long (nearly 2,400 words). Rob Neyer offers his critique, while Sean Forman has questions of his own. [...]

  54. Matt Says:

    While it's a decent comparison, Brooks Robinson did have WAR near 70 (69.1), whereas Jones' is 60(60.1). In the WAR-world, 60 is borderline HoF, whereas near 70 is basically a lock. The difference in WAR can be mostly attributed to Brooks maintaining for longer than Jones did....of course the eras were completely different as well. Yes, WAR tries to account for differences in era, but Brooks played most of his years in the second dead-ball era, whereas Jones played in an inflated offensive era tainted by steroids.

  55. Johnny Twisto Says:

    No, clearly players have different fielding abilities between them. I am talking about the variation within a player. According to FG UZR Curtis Granderson is amongst the worst CF this year. Last year he was one of the best according to them. He looks the same to me. He doesn't seem to be hurt. Why the variation? I think it probable that the same player is playing at the same level and the difference is error in measuring it.

    (You'll probably never read this, but....)

    ...this is an example of why people reject defensive stats without cause (and should not). *Because* defensive stats are historically so inconclusive, fans assume they mean little and assume players have a defensive ability which changes little from year to year. How did Curtis Granderson go from +31 and +20 runs batting in 2007-8 to -2 runs batting in 2009? I'll guess a lot of Tigers fans thought he looked the same. (Or if they picked out differences, it was likely ex post facto, because they "knew" he was performing worse, because offensive numbers are more detailed and accurate.)

    Without more information, it would be very easy to say a batter is probably "playing at the same level and the difference is error in measuring it" if you saw him go +31, +20, -2, +4, +32. That's Granderson's batting performance (per WAR), but it's not dissimilar from defensive numbers which cause some to dismiss said defensive numbers. We accept the inconsistent offensive numbers because we have more evidence those things actually happened. We shouldn't automatically dismiss the defensive numbers because they are inconsistent.