You Are Here > Baseball-Reference.com > Blog >

SITE NEWS: We are moving all of our site and company news into a single blog for Sports-Reference.com. We'll tag all B-R content, so you can quickly and easily find the content you want.

Also, our existing B-R blog rss feed will be redirected to the new site's feed.

Baseball-Reference.com ยป Sports Reference

For more from Andy and the gang, check out their new site High Heat Stats.

10 for 10: #2 1920-39,52,53 RetroSheet Data, Improved Box Scores, Gamelogs, and Splits

Posted by Sean Forman on March 19, 2010

This is the second of ten features we are adding for our 10th anniversary.

Due to the tireless work of the RetroSheet volunteers (otherwise known as heroes) Baseball-Reference.com now has play-by-play (PBP) for the 1952 and 1953 seasons and box score level accounts (BSEF) of all games in the 1920-1939 seasons. This is the first time that I have incorporated seasons with just BSEF data, so it necessitated some changes which I'll try to highlight below.

Beyond that major change there have been many, many additional improvements to the

Box Scores

Here are a couple of sample boxscores.

  • I updated the style of the box scores. The pitching and batting stats are now in sortable tables like all of the other data on the site.
  • I added Win Probability and Run Expectancy data to all of the box scores with PBP. There was WPA in the play-by-play accounts, but I had never sorted out the baserunning issues in assigning it to the offensive players. That is done now, so I am able to add summary info. I have a lot more to say about how I do the WPA numbers, but more on that in a later post.
  • WPA charts to the all of the boxes with PBP. These charts are interactive. You can mouse over the play to see a summary of what happened. You can click a link to highlight the big plays or the scoring plays. You can highlight all of the plays by pitcher or batter. You can also fix them at the top of the screen and scroll through the PBP and read along to the chart. Again more on this in a later post.
  • There are now links to the team's next and previous games next to their score in the game.
  • Outfield Assists are now listed below the batting summary.
  • The PBP report is re-ordered for easier reading.
  • When pitches are available, you can see the summary by clicking on "View Pitches" above the PBP table.
  • Clicking on red text will give you good stuff to view (as always on a Sports Reference site).

    Splits

    2009: Derek Jeter/Tim Lincecum, 1930: Hack Wilson/Lefty Grove

    Because the site only has BSEF for the 1920-1939 seasons, the variety of splits that can be presented for those years is less. No true platoon splits, no by inning splits, clutch splits, base/out splits, etc., but what can be shown is still pretty good. Home/Road, by opponent, by month, by stadium, etc.

    A number of new splits (unless stated otherwise, these are for player, team and league) have been added as well:

  • For pitchers and batters, you can look at times through the lineup for both starters and relievers from the pitcher and batter perspective. PAs 1, 2, 3, and 4+ for starters and 1, 2, and 3+ for relievers.
  • For the pitcher inning splits and by catcher splits, columns for IP, ER and ERA to go with the batting against lines have been added.
  • A DP situation split (on 1st, lt 2 out).
  • Pitching splits by umpire faced.
  • To get at platoon for non-PBP years, a split "vs LH Starter" and "vs RH Starter" has been added, this tells you how the player, team, league did FOR THE ENTIRE GAME when the starter was RH or LH.
  • Added pitcher performance by batting order position
  • Added a general split for when the team is ahead or behind at the start of the play.
  • Added a split for whether opponent is at or over .500 for the entire season or below.
  • Added a split by age, for teams and leagues (I split it up 25 and below, 26-30, 31-35, 36+)
  • Added a split by run support for pitchers (I goofed and included relief pitching appearances in this, but I will change that next go around). 0-2 total runs by the offense in the game, 3-5 runs or 6+
  • SF and ROE have been added to the pitcher splits
  • WP column for pitchers was null, now fixed
  • Corrected baserunning splits for PBP splits, so that they are properly assigned to the relevant split. For instance, Carlos Beltran's SB/CS numbers for platoon splits actually show his SB/CS numbers against that pitcher handedness regardless of the pitcher handedness when he reached base.
  • For teams and league pitching, added a platoon split that shows how opposing batters did by batter handedness ('as RHB, from gl', 'as LHB, from gl', 'as BHB, from gl')
  • As always, clicking on red text for player, team or league splits will bring up the player's year-by-year, the team's player-by-player, or the league's team-by-team results. Clicking "Permanent Link" in the upper right-hand corner of the popup provides you with a way to share the link.

    Gamelogs

    2009: Derek Jeter/Tim Lincecum, 1930: Hack Wilson/Lefty Grove

  • SF was added to the pitching gamelogs
  • A season stat line now appears at the bottom of the player gamelogs.
  • WPA and leverage appears in the pitching gamelogs
  • Home Plate Ump has been added to the team pitching gamelogs
  • Batting gamelogs now have a summary of who the player drove in most often.
  • The player movement is now clearer as full team names are given.
  • Also, all of the other stuff like row summation, career game and season game totals (first two columns, remember RED TEXT) still works.

    Play Index

    The Play Index is fully updated with the new seasons of data. For instance, you can now find:

  • Lou Gehrig's Longest RBI Streak (B-R has his entire career)
  • The most walks drawn in a 1920's game
  • Lefty Grove's top Game Scores (we're missing 40 and 41)
  • The 1927 Yankees longest streak with 5 or more runs scored
  • Other Features

    As always, clicking on "SHARE" above a table allows you to customize an output to include on your site, in a forum, or a link for e-mailing or tweeting. All tables are sortable and can be quickly retrieved in csv or pre-formatted text.

    I hope you enjoy these new features and stats. Let us know what suggestions and comments you have and what errors you find.

    This entry was posted on Friday, March 19th, 2010 at 1:33 pm and is filed under 10 for 10, Announcements, Box Scores, Gamelogs, Site Features, Splits, Stats. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

    21 Responses to “10 for 10: #2 1920-39,52,53 RetroSheet Data, Improved Box Scores, Gamelogs, and Splits”

    1. Wow!!!!!!
      Without trying to sound ungrateful but more from an informational angle, why the gap between 40-51? It's just weird how there is one?

      Thanks again for the continued treasure trove that BBREF has become!

    2. Mike,

      It is going to be filled. RetroSheet decided to build from 1920 forward to 1950. They've already released most of the 1940 AL seasons. We'll get those last ones done pretty soon.

    3. Bug Report: The team earned runs appear to have an issue in the 1920's and 1930's and we are missing Losses and Saves for all leagues 1937-1939 and NL 1934-1936. We'll get that updated this afternoon hopefully.

    4. Which means this fun boxscore is now available.

      http://www.baseball-reference.com/boxes/CHN/CHN192208250.shtml

    5. Fixed the bug.

    6. Johnny Twisto Says:

      There seems to be a partial problem with gamelogs for the '20s-'30s. When using the Pitcher Game Finder to search for most Holds in a season during that period, it finds no games. I wondered if it was somehow possible there had really never been a hold during that period -- obviously bullpen usage was very different. But it also finds no games for Blown Saves or even Save Opportunities. It does find Saves.

    7. Holds can only be computed from play-by-play data. You need to know when the player entered or left the game. For those games we do have save data, but nothing like save opps or save situations.

    8. There seems to be a bug with the boxscore Alan linked. Every pitcher has -1 home runs given up. It appears to affect every NL game from that day (and possibly others) but not the AL games.

    9. Chuck Hildebrandt Says:

      Oh. My. God.

      You are so the man, it's ridiculous.

    10. Wow. That's awesome. In 1930 Hack Wilson had 27 games with 3 or more ribbies.

    11. Alan, that game is amazing ... especially if you look at the next day: an 11-inning pitcher's duel. What a nutty couple of days.

    12. Found an error:
      After checking the "Find Number of Team Games Matching Criteria in a Season" bubble and selecting 11 runs, it gives the Yankees in the #1 spot with a 28-1 record in 1936
      However clicking the individual games link sends me to the full season's 11+ runs list instead of just the Yankees list

      Is it supposed to give the full season list or just the Yankees list?

    13. I love the WPA graph.

      I found a bug. The player links in the pitchers section look like this:

      http://www.baseball-reference.com/players/gormato03/gormato03.shtml

    14. [...] now with Win Probability Graphs March 19, 2010 by erik Interactive win probability graphs at [...]

    15. Here's another classic boxscore. The 26 inning Robins/Braves duel.

      http://www.baseball-reference.com/boxes/BSN/BSN192005010.shtml

    16. And Alan, two days after that, those two teams played a 19 inning game. Each team only used one pitcher again.

      http://www.baseball-reference.com/boxes/BSN/BSN192005030.shtml

    17. [...] the new addition of the '20s and '30s to the batting game finder, all-time players who put up unique statistics at a [...]

    18. I think it's cool since we all obviously take TV games for granted that the May 1st game took only 3:50 despite it's length in innings! That is literally an average of 5 mins per half inning - wow!

    19. Charles Saeger Says:

      Aren't the individual putouts and assists known? There are game logs for fielders from that era on Retrosheet.org.

    20. Sean - Any way to add 1920-1940 data to the PI Event Finder Batting/Pitching by team? I know there's no PBP, but would like to be able to get home/road splits, then vs. LH starter/RH starter within each of those, as that data should be within the Retrosheet boxscore-only format.

    21. BunnyWrangler Says:

      I just noticed that game scores - at least those in the Oeschger-Cadore duel to which Alan linked above - are less than they should be. Both Oeschger's and Cadore's stoped at 127 when the game scores for each man should be in the 140s.