You Are Here > > Blog >

SITE NEWS: We are moving all of our site and company news into a single blog for We'll tag all B-R content, so you can quickly and easily find the content you want.

Also, our existing B-R blog rss feed will be redirected to the new site's feed. ยป Sports Reference

For more from Andy and the gang, check out their new site High Heat Stats.

Pre-1910 Batter Strikeout Data

Posted by Neil Paine on April 13, 2011

A B-R user recently wondered about the source for our pre-1910 batter strikeout data (example), given that those stats were not officially kept track of until 1913 in the AL and 1910 in the NL. I posed the question to Pete Palmer, stat legend and season-data provider to Baseball-Reference, and here was his reply:

"The strikeout data came from Jonathan Frankel, who did a tremendous amount of work with a number of helpers checking box scores in various newspapers. He identified about 90% of NL batters and 80% of AL batters from 1897-1909. The results were then prorated for the remainder of the season. Work is continuing on digging up more boxes and also on 1910-12 AL.

I was surprised that Jonathan was able to find so much data. What happened is that the local papers often carried the strikeouts for their games, so it required volunteers all over the country to check the papers, plus some inter-library loans. It was a terrific undertaking."

It turns out that Jonathan has a blog where he posts updates about the progress of his batter strikeout research. He says the 1910 AL is 89% complete right now, and that he has begun work on the 1912 AL as well.

This entry was posted on Wednesday, April 13th, 2011 at 11:01 am and is filed under Administration, History, Mailbag, Stats. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

8 Responses to “Pre-1910 Batter Strikeout Data”

  1. And I was just going to watch TV tonight.

  2. Johnny Twisto Says:

    The results were then prorated for the remainder of the season.

    I really think this *has* to be clearly noted on the site. Otherwise you are presenting estimated stats as if they were actual. It's similar to the old catcher stats which have SB/CS based on prorated team totals, not actual player totals. It is very misleading.

  3. Not sure how to explain this...3 at bats and 4 strikeouts ?

  4. Johnny Twisto Says:

    Owen, see above. The numbers are being estimated based on incomplete data and it can end up in screwy results like that.

  5. Retrosheet is updated back to 1918 now and they've had the isolated 1911 NL season done for a while now too. It'll be interesting to see what happens when they get back far enough to encounter this issue. I guess the more eyes looking at things the better.

  6. This is a great thing (even with sometimes strange results)...the 1900s strikeout machine Billy Maloney !

  7. [...] Neil Paine talks about where the site’s pre-1910 batter strikeouts came from. Link Posted on Wednesday, April 13th, 2011 at 5:19 pm, Category: Baseball, Tags: bref, data, history, [...]

  8. Charles Saeger Says:

    After going over his blog, I'll hold out as to why Charlie Gibson struck out four times in three at bats. His extrapolations are based on strikeouts/game. The at bat total may well be wrong.