BR.com's Sports Reference Blog

August 29, 2007

Couch linked to steroid use

Filed under: Media — Sean @ 11:50 am

Couch linked to steroid use
When reached by the Daily News, NFL spokesman Greg Aiello said the league had no comment. According to Yahoo.com, Aiello said if Couch rejoins the NFL, the league could take action “if we have sufficient proof that he used a banned substance without an acceptable medical justification. Merely being prescribed it by a doctor is not enough.”

We now have a #1 draft pick, the defensive player of the year runner-up, a doctor of the Super Bowl Champion Pittsburgh Steelers, and a 3/5ths of the Super Bowl bound Carolina Panthers offensive line and their All Pro punter and tight end with pretty serious steroid allegations. Where is the outrage that baseball has seen?

Also, I wonder if anyone has asked Aiello how many players have taken banned substances under with medical justification?

August 28, 2007

Postseason Index Now with Box Scores and Play-by-Play

Filed under: New Features — Sean @ 10:54 am

Postseason Index - Baseball-Reference.com

We recently acquired some postseason play-by-play data and over the last two days, I fitted it into our database and added it to the site. At the moment, there are the following new features on the site.

*Box Scores and play-by-play for every postseason game from 1903 on.
*On the player pages above their playoff stats, there are now links to their postseason gamelogs for both pitching and batting. Here is Yogi Berra. All of the tooltips and row summing features work on these pages.
*These aren’t yet folded into the Play Index tools, but I will be working on that today.

Anything, I’m missing?

One thing I need to do is dramatically expand the postseason stats available to the batters with things like SH, SF, GS, etc.

I will be updating this during the 2007 postseason as well.

August 17, 2007

How does this site get updated daily?

Filed under: Uncategorized — Sean @ 12:39 pm

There has been some speculation as to how I manage to update the stats daily. Here is the rundown.

I buy the stats from a third party. I get the play-by-play accounts for the games and also summary stats for the majors and minors. They deposit them on my server between 6 and 7am each morning. These are in a format unique to them, but using perl, I manipulate them apply my own id’s and then stuff them into the site’s database, about 15 tables of data total.

This all happens automatically. I use a cron job to check every five minutes for the stats to arrive, and once they have and the program is sure that they have finished uploading (don’t want a partial file) it marks that we can update the site.

I then have other scripts that work on the database and build new tables for specific things like splits, gamelogs, etc. These are all derived from the play-by-play, so if a play-by-play account can’t be parsed (like a strange runner interference when attempting to advance on a pitch that gets away), it breaks everything downstream, and I have to go back in and fix some stuff by hand. This has happened about 6 times this season (I get a page when the stats don’t run right).

Next the scripts build the pages (I try to only update the 2007 stuff, so sometimes the 2006 and 2007 pages may not have the same stats or layout as I’m not re-running the entire site every day).

All told this takes about 90 minutes.

This isn’t happening live on the server. This happens on a second server and the db being updated is a second db. So now if it all worked ok, I then can transfer it over to the main server.

The pages transfer in seconds, but syncing the play index and other databases takes almost two hours. The big time sinks are that I have to recompute the career splits and batter vs. pitcher tables to handle the new 2007 data.

Then if everything gets updated, I get a page around 9am that everything has been fixed. I also re-run the previews in there to be updated with the previous days data.

I’d love to hear how the big guys make the real-time updates they do because things like splits and the like take forever for me to re-compute.

As for hardware, we’ve got three servers.

1) the main webserver
2) the main dbserver (running three mysql instances to fully use the 6GB of RAM it has), and
3) the backup server and image server that is a small machine that just serves static things like images, js and css files.

I’m happy to answer questions folks might have.

August 15, 2007

Main Page - BR Bullpen

Filed under: New Features, News — Sean @ 1:11 pm

Main Page - BR Bullpen

I just completed a major upgrade to the Bullpen (now at over 41,000 pages!). This involved upgrading a lot of software and troubleshooting a lot of issues, but I think we’ve gotten it sorted out. Feel free to head over there and edit the pages of some of your favorites. You’ll be amazed at the depth of baseball content on the site.

Baseball-Reference.com - Yahoo! Picks Profiles

Filed under: Media — Sean @ 1:10 pm

Baseball-Reference.com - Yahoo! Picks Profiles

B-R had the good fortune of getting selected for a Yahoo! Picks Profile.

August 13, 2007

The Southpaw

Filed under: Media — Sean @ 10:11 am

The Southpaw

Jeff Merron has some good baseball multimedia links on his weblog. There are LOTS of baseball weblogs with LOTS of text, but Jeff’s the first I’ve seen with relevant video and audio included.

August 6, 2007

Sean Forman interview — The Hardball Times

Filed under: Media — Sean @ 1:15 pm

Sean Forman interview — The Hardball Times

Chris Jaffe did a thorough interview with me last week. The results are up.

Powered by WordPress