## Bloops: Shooting Fish in a Barrel

Posted by Sean Forman on September 30, 2009

THE BOOK Blog: Shooting Fish in a Barrel

MGL over at The Book flags a comment by Michael Kay about splits being more likely in double-headers than in back-to-back games as being idiot speak. MGL doesn't back up his comments with numbers, but we'll do the heavy lifting for him here.

All Data is from 1979-2008:

For double-headers

+------+------------+--------+-----------+ | DHs | home_sweep | split | vis_sweep | +------+------------+--------+-----------+ | 1286 | 0.3095 | 0.4883 | 0.2022 | +------+------------+--------+-----------+

Now I look at all cases where two teams played on date N and date N+1. This will include a four-game series on consecutive days 3 separate times. b2b = Back-to-back

+--------------+----------+--------+---------+ | consec_dates | home_b2b | split | vis_b2b | +--------------+----------+--------+---------+ | 41046 | 0.2955 | 0.4885 | 0.2160 | +--------------+----------+--------+---------+

I'm not sure why the home team picks up the advantage in the doubleheader (about 1%), perhaps it is because they get to dictate the pitching matchups, or maybe fatiguing situations increases the advantage of playing at home, noise?

September 30th, 2009 at 7:33 pm

I've wanted to see these numbers for about 10 years. THANK YOU!

I think another possibility is that the home team is always a bit more comfortable--better clubhouse, slept in their own beds the night before etc. For one game, this might not make a difference but when the visiting team is working on hour 7 or 8 of game day, it might matter. If this were true, then we might also expect double-header splits to show the home team winning the latter game a bit more often than the opposite case. It might also be true that day/night double headers behave more like games on consecutive days whereas double-headers played truly back-to-back (single gate) would show the favor to the home team.

September 30th, 2009 at 9:45 pm

The statistical errors on the doubleheader fractions are roughly 1.5% (.015), so the difference between doubleheaders and consecutive-day games is most likely noise.

However, the higher statistics for the consecutive-day games give a much smaller uncertainty (roughly .0025). Using them it looks like the win probabilities may not be the same from the first game to the next. If they were, the win probability inferred form the WW probability plus the loss probability inferred from the LL probability should add up to 100%. They actually add up to 100.84% with an uncertainty of around .35% -- not an overwhelming effect, but suggestive that something more complicated is going on.

Assuming that the winner of the first game increases their chance of winning the next game by x, and that the home team has a win probability of w in the first game, I estimate w = .539 +/- .009 and x = .008 +/- .002 from the consecutive-game data given. The fact that x is four standard deviations from zero seems to indicate that the probability of winning a game increases by almost 1% if you've won the previous game (within a particular series), whether you're the home or visiting team.

This analysis is crude, and win probabilities obviously change from game to game due to different pitching matchups, but it is interesting.

October 1st, 2009 at 12:11 pm

Whiz: you are presuming the ".539" is fixed. Suppose you actually had three types of teams:

.580

.540

.500

So, the chance of a home-sweep for each is .58^2, .54^2, .50^2. The average of that is .541^2. Suppose that you had 5 teams, with these win%: .620, .580, .540, .500, .460. Now, the average following the same logic is: .543^2.

Granted, it will not be a uniform distribution. But the distribution would certainly be wider than what I'm suggesting here.

My point is that you can't just take the average. I can take a league mean of .540 and I just showed how the home-split is akin to having a league mean of .543.

Regardless, you can look at the overall year-to-year home win%, and see big jumps/drops in win% every year. The difference here is well within the noise levels.