Data Coverage
There is a lot of data on the site. Some of it is ours and much of it comes from RetroSheet.org. This page attempts to show you a complete list of what we do and do not have on the site. When stats were accumulated and when they are missing. Please let us know if there is an item you would like to see reported here.
Full Season StatsWe consider the start of major league baseball to be 1871, and obviously the game has changed a lot in that time, and so has the recordkeeping. Below we summarize the seasons for which we have complete data. Any entry with a "NO" in it means the data is completely missing for that season. Entries with a "Partial" means we have it for some leagues or some players in that season and "YES" means everything is known. This doesn't mean there aren't errors, just that we have a value for it.
Pitching
Fielding
Minor Leagues
The Chadwick Baseball Bureau has a Full Rundown of what is and isn't in the Minor League Database
Play-by-Play
Play-by-Play Coverage
Here are the teams affected with the number of games for which they don't have play-by-play (just for years we have any play-by-play).
Missing Play-by-Play
Hit Location Data and Batted Ball Type Data
Hit location Diagram from Retrosheet
Batted ball types include Line Drive, Fly Ball, Ground Ball, and Pop Ups. Bunts are also noted.
Please note that this data is not 100% complete, and that locations and trajectories have been measured differently in different years. We have attempted to merge different sources whenever possible to have as complete a dataset as possible. Here is the coverage for the 50+ years of data that we have on hand. The table below looks at all of the balls of play and then gives a breakout of the percent of time we know the trajectory (and the type to show how this has changed), the percent of the time we know the location and who fielded the ball (won't be 100% as there is no fielder for home runs and some things like ground rule doubles), and the percentage of plays that result in air_outs or ground_outs. Even in cases where the trajectory and location are not exactly known we may still know the fielder (even for hits) and whether a ground ball or fly ball out was recorded and by whom.
Note: for 2000-2002, home runs were classified with empty batted ball types in our data source. We have reclassified all of these hits as fly balls. Probably 20% of these home runs should be line drives and perhaps 1-2/year as ground balls. We realize this is a simplication, so please adjust your expectations of splits, etc accordingly.
Hit Locations
Pitch Data
The pitch data is only given when we know the values for the entire game and for all plays in the game. This report does not include pitch type or velocity. Instead, it records the sequence of balls and strikes, fouls, swinging strikes, pitchouts, etc.
Please note that this data is not 100% complete, and we have merged several datasets when producing this data. Back to 1998 is essentially complete and before then there is a great deal of data back to 1988. Previous to 1988, only a few years have data. For example, Allan Roth of the Dodgers compiled such data for many, many Dodgers games from the '50s and '60s.
Below are the percentage of all plays in a season that are missing pitch sequence data.
Pitch Results
Weather Data
The weather data is based on conditions at the start of the game. Below we show the percentage of each data set (temp, wind speed & dir, etc) which are not null (or unknown). This data is included in the RetroSheet data files and is provided as is and most certainly contains some errors. There is no weather data pre-1950.
This article at the Hardball Times provides some background on the data collection methods of the ballpark weather data.
Batting