Our Sources, Contributors and Collaborators
Core Purpose for Sports-Reference.com
Answer questions as quickly, easily, and accurately as possible.
Collaborators & Sources
Just a note that much the following was written in 2004, so some of it is out of date, but I think it's worth keeping it around, so I did.
Sean Forman, Feb 19, 2017
If you enjoy this site, it is only due to a great amount of work by a large number of people.
Sean Lahman laid the groundwork for a great deal of baseball research today when he produced his baseball database. We owe a great deal here to the work he put into and continues to put into his database.
Pete Palmer's work is the basis for the Lahman database, and he has provided data to this site as well, along with his business partner Gary Gillette.
Many of these stats are now available through the Baseball Databank.
Don Malcolm has encouraged us a great deal in this project and has helped us develop many new ideas.
Doug Drinen has developed Pro-Football-Reference.com from scratch and is a fellow mathematician and stathead. He has provided a great deal of feedback on the site.
Justin Kubatko developed the counterpart basketball site and has provided many suggestions as well.
Tom Ruane has contributed both data and advice to this effort. Tom works with retrosheet.org, a very, very worthy cause. The game log and transaction data on the site is available at retrosheet free of charge.
Doug Pappas contributed salary and payroll information to the site.
Keith Woolner was the original StatHead and his work at Baseball Prospectus and elsewhere has influenced our work here. He currently works in the Indians front office.
Collegiate Data appears courtesy SABR's Collegiate Committee. They will welcome all corrections to their data.
Much of the data is due to the volunteers at SABR.org, among whom some of us are ourselves.
The SABR biographical committee routinely puts out updates for new information (or even new ballplayers) found. This can include updated birth, name and death information.
The SABR Collegiate committee maintains the database of schools attended by players.
Jeffrey Burk provided me with the awards voting data.
Derek Adair has contributed numerous corrections to the player data and a variety of other data sets.
Jay Jaffe designed the Babe Ruth logo.
Greg Spira has likewise contributed advice and support to this effort.
Hundreds of others have e-mailed in corrections or contributed data. We apologize for not properly citing them here.
Much of this data is available through the Baseball-DataBank.org.
Current Data Sources
About the Register Data
About the Minor League Data
Historical performance data for professional leagues (affiliated minor leagues, independent minor leagues, fall/winter leagues, and other international leagues) is provided by and licensed from 24-7 Baseball and Chadwick Baseball Bureau. It incorporates the work of many stalwart baseball researchers, including Cliff Blau, Art Cantu, Frank Hamilton, Reed Howard, Kevin Johnson, Bob McConnell, Jack Morris, and Ray Nemec, as well as members of the Minor Leagues Committee of the Society for American Baseball Research.
Perhaps most importantly, it builds upon the seminal work of Ed Washuta, who magnanimously provided the framework to make the whole thing possible.
About the Negro League Data
This data comes from two sources. 1) The Negro Leagues Researchers and Authors Group put together by the National Baseball Hall of Fame and Museum thanks to a grant provided by Major League Baseball. 2) Gary Ashwill and his collaborators.
The Hall of Fame data is found for the years 1920-1948 and the Ashwill data is found from 1904-1919. Many statistics are incomplete due to ongoing research and/or limitations in published sources, so please be assured that we are aware there are issues with the data and will continue to work with our data providers to improve the data that appears here.
The Hall of Fame dataset
This data is constructed from the best available information as provided by Larry Lester, Wayne Stivers and Dick Clark of the Negro Leagues Researchers and Authors Group. It contains data culled from newspaper boxscores, covering league sanctioned games from 1920 to 1948, which was produced for a study sponsored by Major League Baseball and the National Baseball Hall of Fame and Museum. It reflects totals as compiled by the NLRAG up to 2006. As new credible information is continually unearthed, these numbers will continue to change.
Baseball-Reference and the National Baseball Hall of Fame and Museum would like to acknowledge Major League Baseball for funding this study, along with the Negro Leagues Researchers and Authors Group for their extensive efforts to collect the raw data and construct the most comprehensive database of Negro Leagues Baseball statistics. Under the direction of Larry Lester, Wayne Stivers and Dick Clark, this database is the largest dataset ever made publically available and we wish to express our gratitude for all their efforts to help rebuild this lost statistical history.
The Ashwill Negro Leagues Database
Playing statistics and biographical data on the Negro leagues (all pre-1920) and early Latin American professional baseball is licensed from and provided by the Negro Leagues Database, a project organized by Gary Ashwill with the participation of many historians of Negro league and Latin American baseball.
The database is a work in progress, and will include more seasons as development continues. It appears in its original form at Seamheads.com.
Playing statistics and biographical data for this portion of our dataset (The Negro Leagues Database) are all copyright 2013 by Gary Ashwill. All rights reserved.
Note that our records are missing thousands of players who played in the Negro Leagues for minor league teams, independent teams, barnstorming teams or even Negro major league teams. We will bring their records to light as soon as suitable records are available for those players.
Perl is a robust scripting language. This whole site is built using perl.
MySQL is an open source database. All of our data is stored and manipulated in MySQL.
Emacs is a free text editor with phenomenal capabilities if you sit down and learn them.
RedHat Linux is an open source operating system. If you want the power of the command line on a PC, linux is a nice option.
I stopped using redhat a while back and now it's all built using Macs.
Philosophy of Baseball-Reference.com (May 1, 2004)
When I first hatched this idea, I had several overriding priorities for the site that had to be met before it would be launched. Four years later, I think the popularity of the site has shown that these are good things to build any product of website on.
- Useful - It needs to be comprehensive and the data must be easy to find.
- Fast - My site is fast because it has only ten images repeated across the entire site, and the pages are small. The average player page is 8KB and 95% of the pages are under 20KB. It is also fast because every page is already created and you don't have to wait for me to call a script to call a database to create a page.
- Embraces the medium -There are links everywhere. You visit Ted Williams and want to see who his teammates were in 1950, so you click on the team name, or you wonder who won the MVP in 1941, so you click on the league. The web is built on links, and that is why the Williams page has over 100 links on it, so you wonder something and *Click* you find out. That is also why it has to be fast.
- Fun - Writing thousands of lines of code may not sound fun, but it has been a blast putting this all together. I also want the site to have a sense of humor and a personality, so I try to write from a personal viewpoint and there are a few inside jokes sprinkled around.
There are two sites that provided much of the technical knowledge and design philosophy for this site at its genesis. Jakob Nielsen's UseIt.com and Philip Greenspun's Photo.net were very helpful in creating this site and they are recommended to anyone doing web design. Very, very helpful. Both Nielsen and Greenspun have books that are worth checking out. Our library now has about 40 or 50 web design/computer books in it, so these should just be viewed as starting points.
July 1, 2004 (yes, really)