This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

## The correlation between run-scoring and extra-inning games

Posted by Andy on August 30, 2010

This result isn't too surprising, but it turns out that there is a fairly strong correlation between average run scoring and percentage of games that go to extra innings. This makes intuitive sense--if league-wide run scoring is higher, it's less likely that both teams in a game will end up with the exact same number of runs at the end of 9 innings.

Click through for the data.

This data was easy to collect. I used two Team Batting Game Finders to find the total number of games each year as well as the total number of games of 9 innings or less. That gave me the fraction of extra-inning games. Then I picked off the run-scoring numbers from the MLB Batting Encyclopedia page.

Here's a scatter plot for all that data:

Each data point represents a single year. The average runs per game is on the x-axis and the fraction of games going to extra innings is on the y-axis. For example, this year, teams are averaging 4.42 runs per game and 9.1% of games have gone to extra innings. In 2009, runs were more plentiful at 4.61 per game and only 8.0% of games went to extra innings. In 1988, runs were scored only 4.14 per game and 9.4% of games went to extra innings. Back in 1965, run scoring was 3.99 per game and 11% of games went to extra innings.

The R-sqaured value of 0.27 tells us that there is a correlation coefficient of 0.52 between run-scoring and extra innings--that's a pretty strong link between the two.

### 30 Responses to “The correlation between run-scoring and extra-inning games”

1. Zachary Says:

I love seeing graphs and math-related posts. Cool stuff!

2. Dan Says:

I'd like to see the data if there weren't a data point for every year, but rather the data was binned with say 0.05 runs/game or 0.1 runs/game bins. So if there were two seasons with runs/game between 5.2 and 5.25 you would find an overall percentage of extra inning games for those years and put that as one point on the graph. It looks like that might take away a lot of the scatter. I'm curious how much.

3. Andy Says:

I can probably do that in the next couple of days, Dan.

One other factor that increases extra-inning games when scoring is low. When the trailing team is 1-run behind, they are more likely to try to play for that 1 run to send the game to extra innings.

4. Evan Says:

Several thoughts/questions:

1) Is there any evidence that extra inning games are increasing or decreasing in frequency as the game evolves (independent of RPG environment)? i.e Is there any evidence that different strategies as related to bullpen use and other game management factors may be affecting the numbers of extra inning games?

2) Any thought to splitting out the AL/NL data for 1973-1996 where we would be dealing with different rules sets and different RPG environments. 1997-present gets muddled by the presence of interleague play and it seems not worth the effort to filter down to only intraleague games.

3) It looks as though you have fit a linear correlation to the data. My intuition tells me that we would see far greater incidences of extra-inning games as RPG approaches 0.5 (every game 1-0) than we would get from extending the fitted line and in a 0 RPG environment 100% extra inning games (0-0 ties). In a very high RPG I would expect the frequency to have an asymptote approaching zero as RPG approaches infinity, but at the very least to flatten out a bit as we move to a higher RPG environment than we have thus far experienced in the history of MLB.

5. da HOOK Says:

Actually, since extra-inning games decrease as runs increase, r=-.52 .

Have you tried non-linear measures for the data?

6. Andy Says:

Right you are, negative .52...

It's definitely true that it can't be a linear relationship, as explained by Evan.

I guess there is another way to plot this data--I could look at each time the winning team scores N runs and what fraction of those games went to extra innings....

7. Neil Says:

Evan, really like your first question. One would think that with the proliferation of set-up men and closers in the modern era that more leads would be held and more games would be decided in 9 innings, even in a low RPG environment. More of the points below the line of best fit should represent "modern" seasons.

8. jeff Says:

This is not quite the same thing, but for this year, most of the media harped about the Cubs having a poor record in one run games. I suspect this had more to do with the Cubs failure to score runs. Until at least recently (say end of July when they still had Lilly and Silva), they had a very poor w-l record winning scoring 3 or fewer runs, and a good w-l record when scoring 4 or more runs. So, I think it had more to do with poor offense, rather than a supposed failure to play well in close games. Because the Cubs offense was bad and starting pitching was good, they did play a lot of close games, low scoring games, in which they lost, which I think hyped this misperception of "can't win a close one". Anyway to run statistical analysis on that?

9. Neil Says:

Jeff, I seem to remember an old Bill James study, mid-eighties, I think, where he analyzed one-run records of teams over a period of time and his conclusion, if memory serves correctly, was..... pure luck! The one-run W-L records of teams even out over a few years and no firm conclusions can be drawn about teams choking offfensively under late-ininng pressure and not being able to "win the close one".

After all a one-run win could be caused by an opponent's bloop falling in and your own loss by a line drive tagged right at a defender. One-run losses are more psychologically painful to fans and media than blowouts, but may be based more on luck than anything else, at least over the short term.

10. jeff Says:

That makes sense to me in any given game, but are you going to play more close games with poor offense and good starting pitching over the course of a season? What is the average W-L percentage per run scored? What is the expected W-L percentage per run allowed? It seems like you can't expect to win many games when you score 1, 2 or 3 runs, and that if you score 4 or more runs, and don't win, then it is your pitching staff.

11. joe baseball Says:

As a statisticain, i always find these kinds of correlations funny because it proves what most see as "common sense". Every game starts out with a score of 0 - 0 The more runs that are scored, the more likely you are to move away from a tie. The odds of each team scoring the same number of runs decreases with every run they score.

12. Neil Says:

Joe, you are right, of course, about the cold, hard mathematics of what the game starts at and what the probabilities are of going more than 9 innings when fewer runs are scored, but that misses the point. The appeal as a baseball statistical devotee is to unravel the subtle nuances that are superimposed on the math.

And besides, a lot of the "common sense" dogma in baseball, such as that the sacrifice bunt or attempted steal is always a good idea have been proven wrong by sabremetric analysis.

13. Kahuna Tuna Says:

Bill James . . . analyzed one-run records of teams over a period of time and his conclusion, if memory serves correctly, was..... pure luck!

The Padres' best single-season record in one-run games was 31-16 by the 1974 club, whose overall record was 60-102.

On the other hand, the 1936-39 Yankees' record in games decided by seven or more runs was 100-19. In games decided by ten or more runs, it was 51-2. The good teams win most of the blowouts.

14. Neil Says:

Tuna, not sure what your point is about the '74 Padres season. Sure they cheated Lady Luck big time with a 0.670 winning percentage (31-16) in one-run games and 0.252 WP (29-86) in non-one-run games. That is an amazing abberation. There might not be any other modern-era team with as big a discrepancy. They were still a bad pitching team and an overall "bad" team in the fifth year of their existence.

Food for thought..... they exceeded their Pythagorean win % by 9 wins that year. Doesn't that also suggest an incredibly "lucky" season?

15. Curly Gruff Says:

a lot of the "common sense" dogma in baseball, such as that the sacrifice bunt or attempted steal is always a good idea have been proven wrong by sabremetric analysis.

I'd say that more recent sabermetric analysis has "proven" the common sense dogma was not so wrong after all. Raw run expectancy charts led people to claim that sac bunts were always bad plays or you needed to steal over 75% to have a positive impact. Taking into account the particular times in the game when such plays usually take place, we now realize that the break-even points are different than originally thought. Strategies still might not be optimal according to saber orthodoxy, but it turns out most managers are not complete morons.

16. jeff Says:

The '74 Padres had a team OPS+ of 81, meaning they probably had to keep the score low and close to have any chance to win. They were probably lucky in such games, but the offense was so bad they were not going to be on the winning side of a blowout. The Yankees of the late 30's had historically great pitching staffs meaning they usually were not going to get blown out.

17. Larry R. Says:

Huh?

18. Neil Says:

I think Jeff was saying, Larry, that with an OPS+ of 81 in '74 the Padres absolutely stank offensively, relative to league average, both in terms of power and making contact. This means that they were only going to win low scoring games, if any games at all, certainly not many slugfests. Very few 9 to 7 scores for the Padres and more 2-1 wins.

However, without closer examination, the winning percentage split between 1-run W/L and multi-run W/L for that season, remains a mystery ... other than an act of God.

And to bring the discussion full circle... how many of the Padres 47 one-run games that year were X-inning games?

19. Tuesday Links (31 Aug 10) – Ducksnorts Says:

[...] The correlation between run-scoring and extra-inning games (Baseball-Reference). According to the article, “there is a fairly strong correlation between average run scoring and percentage of games that go to extra innings.” [...]

20. Jason Rennie Says:

I assume you're using least squares regression based on the R^2 comment. Least squares will be overly influenced by "outlier" points, such as the two 13%+ data points. I'd love to see the equivalent plot using an absolute difference loss or Huber loss (http://en.wikipedia.org/wiki/Huber_Loss_Function). Also, the encyclopedia link appears to have runs/game, but not %-age of extra inning games. Did I miss something? Is the data available in CSV form or somesuch?

21. Kahuna Tuna Says:

the '74 Padres . . . were still a bad pitching team and an overall "bad" team in the fifth year of their existence. . . . they exceeded their Pythagorean win % by 9 wins that year. Doesn't that also suggest an incredibly "lucky" season?

Possibly, Neal. The point I was making is that some (or, well, at least one) of the best single-season records in one-run games was posted by a team that finished last. In other words, success in winning one-run games need not imply some high degree of clutch performance; it might only imply that the team wasn't good enough to regularly win by larger margins. The 2003 Tigers also fit this pattern: 43-119 overall, 19-18 in one-run games. Conversely, the '03 Tigers had one of the worst records ever in two-run games — 7 wins, 30 losses. In my opinion, those Tigers were neither lucky nor unlucky — they just stank, because if the game turned into a rout they were bound to lose. The good teams win most of the blowouts.

And to bring the discussion full circle... how many of the Padres 47 one-run games that year were X-inning games?

Answer: Seven, of the fifteen extra-inning games they played. Which puts them in the fairly unusual category of teams who played more extra-inning games in a season that were decided by two runs or more than extra-inning games decided by exactly one run. The biggest disparity I've been able to find so far is the '87 Red Sox, 10 to 5.

22. Neil Says:

Tuna, we're agreeing on all counts, I think. The jumping-off point for this thread was @8 and @9, where Jeff mentioned the perception that the Cubs this year "can't win a close one" and the reply that it is not an indicator of clutch batting performance but rather of luck.

I totally agree that good one-run W-L records in a single season probably correlates better with poor overall team records than with good ones because the poor teams are offensively challenged. The '03 Tigers, although not as extreme an example, support this perception.

And it's a no-brainer that the good teams will have a better record in blowouts, however defined, than bad teams.

But looking at the other side of the coin.... what was the '36-39 Yankees record in one-run games (mentioned in @13) compared to their overall record? If the "luck hypothesis" prevails, then significantly worse than the one-run record, even though they were great teams.

Thx for answering the question about the '74 Padres and X-inning games that I was too lazy to research. But their record in X-inning games decided by two or more runs is bizarre for a team that bad. Maybe a six-month full moon in 1974 or something.

Jason, the first thing that caught my eye about Andy's graph was the two data points over 13%. Would be interesting to know know what years they were..... Does managerial strategy change in a low-run environment. You know the old adage about playing for one run.

23. Neil Says:

Sorry.... in 4th paragraph meant "...significantly worse than the overall record, ..."

24. Andy Says:

On Thursday I will be posting a major follow-up to this post, looking at the same data parsed out by runs scored by the winning team, not by year.

25. joe baseball Says:

charts and graphs are fun to look at, but everthing is dependent on the assumptions used when collecting the data. Average runs per game may not be a very good stat for this model. Average runs per inning might be better. When the home team wins a game, they don't bat as many innings or complete innings as the losing team. Tie games will peak with a score at about the average runs per inning multiplied by 9.
0-0 scores with be above zero, 1-1 scores more frequent, up to the average and then dropping down to zero as the scores get large.
The scores of the games used should only use the score as it was at the end of 9 full innings.
This model should be quite accurate, especially if all the data from every year is used.

26. Kahuna Tuna Says:

what was the '36-39 Yankees record in one-run games . . . compared to their overall record? If the "luck hypothesis" prevails, then significantly worse than the one-run record, even though they were great teams.

The "luck hypothesis" holds here. The 1936-39 Yankees were 83-67 (.553) in one-run games. Their overall record for the four-year period was 409-201 (.670).

27. Neil Says:

@25

However, I can't see that the results would be improved a lot by taking into account 8 1/2 inning home team wins and X-inning games. The two effects cancel each other out a little don't they?

Although runs per inning might be a slightly better independent variable, RPG is still a good analysis and an interesting discussion starter.

@26 I'm suprised that the Yankees 1-run record is that good over the three-year period. The disparity isn't nearly as great as the '74 Padres or the '03 Tigers, but then it was a larger sample size.

28. Extra innings games vs. runs scored » Baseball-Reference Blog » Blog Archive Says:

[...] is a follow-up to my earlier post about the correlation between run-scoring and extra-inning games. In that earlier piece, I made a [...]

29. Oskar Says:

Extra inning games and stealing second base.
How many extra inning games have there been a situation where a manager has a runner on second base and has to decide whether to steal second? How many times was it successful, how many times was that the winning play? How about just games in the playoffs and world series?
Was just wondering whether this was a high risk strategy or not.

30. Oskar Says:

Oops typo, I meant the runner is on first base