BIG Split Update on Baseball-Reference.com
It took a bit longer than I wanted, but the splits have now been updated on the site. I re-wrote the split engine to much more easily handle the addition of new splits (and their inclusion for teams, players, careers, etc.). It’s a big job. For example, I’ve got a db with over 4.5m split lines for the 52 years of batting stats alone (just for players). Here is a rundown of the improvements. I hope you enjoy them.
Update: Platoon splits for 1999 are messed up, they will be fixed tomorrow morning.
- We now have 1956 splits
- 2008 splits are now updating.
- Leverage (low, medium and high) Clutch hitting and pitching splits have been added. (about wpa)
- For team splits, we now show splits by month and half for starters and relievers.
- Pitchers now have splits for save and non-save situations.
- We now have splits for runner on third and less than 2 outs and runner on third and 2 outs.
- Day’s Rest splits are now divided between Games Started and Games Relieved. GS splits are day’s rest before a start and GR are day’s rest before a relief appearance. The previous appearance could be a start or a relief appearance.
- For games from 1988 to the present we have added splits for Ball Location: To the Infield, To the Outfield, Ball In Play, Ball Not In Play (Including HR over the fence, think 3 True Outcomes), Fair Territory, Foul Territory, Pulled, Up the Middle, or Opposite Field (as RHB or LHB), Ball Trajectory (Ground Balls, Fly Balls, Line Drives or Bunts).
- For batters, splits in team wins and losses, as a starter or as a sub.
- We cleaned up the handling of players who PH for a DH.
- We added a split for Leading off the Game.
- For splits where the player’s context can’t change after they reach base (like defensive position, leading off the inning, inning, etc.) runs scored now is shown and is the number of runs scored eventually by that batter from that time at bat. So the leading off an inning split now shows how many times the leadoff batter scored, not just how many home runs they hit. For example, Tim Raines led off the game 1398 times and scored 285 runs (just 16 leadoff home runs). (Runs aren’t shown for things like scoring margin or clutch stats because the score may change from when the batter batted to when they finally scored.)
- SB and CS totals are now given for context splits. These are accurate for the actual player and do not represent the SB and CS totals of others on base. For example, Jose Reyes has stolen 178 bases against RHP and 56 against LHP. He is 34 of 42 when the count is 0-1. He is 26 of 31 when stealing third with no one on first. He is 73 of 88 when there are 2 outs. (Here’s Rickey. Unfortunately, his 1999 stats will be updated tomorrow.)
There are likely errors, please let me know when you find them.
[…] Sports Reference Blog » BIG Split Update on Baseball-Reference.com […]
Pingback by Stat of the Day » Sports Reference Blog » BIG Split Update on Baseball-Reference.com — April 8, 2008 @ 4:36 pm
Player & team splits for 1956, but no league splits yet.
Otherwise, everything looks GREAT. I might even have to resteal some stuff, blast it. Kudos.
Comment by Chris J. — April 8, 2008 @ 11:14 pm
Sean - it’s awesome. One suggestion that I’ve wanted for a long time. Can you set up these splits pages with a link target for the major headings on the page? The reason is that if, for example, I want to send someone to some split data that’s far down the page, they can have a lot of trouble finding it. But if I could link to the specific section, that would be great.
Comment by Andy — April 9, 2008 @ 8:18 am
Andy: great idea.
Sean: on the LI splits, I’d suggest putting the average LI for that split line, as well as the cumulative WPA.
Comment by tangotiger — April 9, 2008 @ 8:32 am
Sean–
Great work. One suggestion for a label on the spilts: –3,lt 2 out you could put as –3 4 runs.
Comment by plim — April 9, 2008 @ 9:13 am
There is a now a direct link option for the splits. Here is Maddux directly to pitch counts
The 1956 splits were in the DB, but the links were not on the pages yet. They are now.
Comment by Sean — April 9, 2008 @ 9:36 am
Plim
I don’t understand.
Comment by Sean — April 9, 2008 @ 9:37 am
Kind of surprised there’s no WPA on the league leaderboard in years for which this is available.
Comment by scareduck — April 9, 2008 @ 10:58 am
Sean–
the label say - - 3, lt 2 out (runner on 3rd, less than 2 out). i was suggesting that you could replace the lt with the less than symbol.
oh…the text got all chopped up. oh…i see why. later in my comment i said it’s similar to how you have run margin (greater than symbol) 4 runs, so the comments engine truncated everything inbetween the less than character and greater than character, probably assuming everything inbetween was an html tag or something.
Comment by plim — April 9, 2008 @ 11:26 am
Sean-
Is there a way to track Team Win/Loss Streaks given a Statistic. For example, how many times a team has opened the season with 7 losses and never scoring over 5 runs in any of those 7 losses? I’ve been playing around with Play Index for a while trying to find a way to do this and I can’t seem to find it.
Comment by habetw4 — April 9, 2008 @ 1:19 pm
I notice that for some offensive spltis (like home/away) games gives you all games played by players, rather than team games. I kinda liked it better the old way when it was team games played.
Comment by Chris J. — April 9, 2008 @ 10:32 pm
when can we expect the split data to be available in the PI?
Comment by ubiquitous — April 9, 2008 @ 10:34 pm
The rate stat leaderboards for 2008 are not right. Everyone has a double-asterisk (including league PA leader Chone Figgins).
I understand that the qualified/unqualified calculation is a bit chaotic when the number of team games is low, but about how long before that stabilizes?
Comment by DavidRF — April 10, 2008 @ 12:44 pm
Can you tell us which zones you’re using to determine pulled, opposite field, and up the middle? Thanks.
Comment by Danny — April 10, 2008 @ 1:52 pm
#10, Sorry we can’t do season opening streaks yet, but that would be a good add.
#11, The games is likely a better choice in my opinion because it shows you things like the number of relievers used and the number of player games that went into a split.
#12, this sumer
#13, will fix
#14, I used the retrosheet fielding codes and up the middle. The field is divided into seven pie slices and I put the two end ones in pull and opp. field and the rest in up the middle.
http://www.retrosheet.org/location.htm
Comment by Sean — April 10, 2008 @ 3:12 pm
Thank you for the link targets!!
Comment by Andy — April 12, 2008 @ 6:56 am
Looking at Maddux’s pitch counts that you linked to, the stolen base numbers per pitch numbers appear to be off.
Comment by dtoddwin — April 14, 2008 @ 12:20 pm
I agree with Sean regarding his response to #11. It’s easy enough to see the number of games in any of the splits, especially seeing that W and L are in virtually all the splits (ties notwithstanding).
At least, now I can go here:
http://www.baseball-reference.com/pi/psplit.cgi?team=TOT&year=1987&lg=ML
and see how many relief games were pitched in 1987. It used to show how many league games used at least 1 reliever (basically games minus complete games). Since you already know complete games, why do you need to know games with at least 1 reliever? I’d much rather figure out if a league was using 1.5 or 3.0 relievers or whatnot.
Comment by tangotiger — April 14, 2008 @ 4:01 pm
Say, would it be possible to have GDP situations (runner on 1st, none/one out) as a clutch stat?
Comment by Charles Saeger — April 15, 2008 @ 6:39 pm