## Site Features: More On Postseason Boxscores

Posted by Neil Paine on October 5, 2010

Yesterday I gave a big-picture overview of all the features we have in our Postseason Section, so today I wanted to talk about a really cool, underrated feature of our playoff boxscores...

As you probably know, our regular-season play-by-play data stretches back to 1950 -- which is amazing and to the credit of Retrosheet that anything close to that amount of information is available. But did you know that our postseason boxscores have play-by-play accounts for the entirety of the World Series era?

That's right, we have play-by-play descriptions of baseball games that happened 107 years ago. Not only that, but we have Win Probability statistics and graphs for those games! I don't know about you, but I think that's pretty amazing. In fact, basically anything you can do in a 2010 box score, you can do for postseason games going back to 1903.

So if you ever wondered how much WPA Babe Ruth cost the Yankees when he was caught stealing to end the 1926 World Series, we can answer that and many more questions. Play around for a while in the Postseason Section, and you may find a whole new way to look at games that happened a century ago.

### 14 Responses to “Site Features: More On Postseason Boxscores”

1. pm Says:

What are the Win probabilities based on 100 years ago? Offensive numbers this year or the small sample size numbers of 100 years ago?

2. Johnny Twisto Says:

Why do you say there is a small sample size?

3. pm Says:

#2, if it is win probability, how do you determine what the probability is in the 6th inning up 2 runs if all the data you have is 20 postseason games that season?

4. DavidRF Says:

I think he's talking the base-out-inning win expectancy that requires play-by-play data.

I don't know how much of that stuff is era dependent. You could likely adjust the runs->wins converter by simply using the level of offense. But the deadball era featured a much different style of play with more fielding miscues. How different was the base-out matrix back then?

5. Neil Paine Says:

Right, I can't really speak for how Sean does it, but I believe you can tweak the base-out matrix for the environment, much like Tango did here:

http://www.tangotiger.net/customlwts.html

The various offensive events are worth more or less based on the RPG environment in which they occurred.

6. Neil L Says:

Neil, gotta try it to appreciate it, I guess.

Am I understanding your post correctly to mean that are more complete data available for 107-year-old, post-season, games than there are for comparable regular-season games?

I am aware of a lot of gaps in the minor offensive stats around that time for regular-season games.

7. Sean Forman Says:

What Neil said.

8. pm Says:

Isn't that approach a little flawed. They might have scored the same amount of runs in 1930 as they did today, but they did it in a different way. The batting average, OBP, and doubles were higher while today there are more HR, more K's, more BB, plus better pitchers as relievers at the end of the game changing the run distribution of late innings. Are the numbers in 1920's taking that into account? Is the model you are basing it on empirical?

9. Gerry Says:

Sean, what which Neil said? There are two of them in this discussion.

11. Sean Forman Says:

The model is not empirical. It is based on simulations given the run scoring environment. I'm not sure the manner in which the runs score will affect the team's win probability all that much.

As for the relievers, that cuts both ways. The team ahead now brings in relievers and the team behind will do so as well, so a lot of those effects cancel out.

I guess here is the question. Construct two teams, one that walks a lot and only hits home runs and the other that only hits singles. The tricky part will be creating the event rates such that their overall run scoring overall is the same. The question is then do the two teams have different probabilities of scoring different numbers of runs. It's possible they do, but you'd need to show me a model that demonstrates that before I'm going to spend a lot of time building different event rates into the system. I'm not convinced that the probabilities of one run, two runs, etc will change all that much.

12. Matt Says:

I dutifully went back to the Ruth game...and noticed that, at the end of the Yankee 9th, the play by play read "0 runs, 0 hits, 0 errors, 1 LOB. Cardinals 3, Yankees 2.

Doe the site do this whenever a caught stealing ends an inning?

13. Matt Says:

Sorry, meant to bold just the 1 LOB

14. Sean Forman Says:

Yes, it does. I'll look into that.