nkurz comments on “What We Found When X-Raying Some MLB Baseballs”
The cores of the new balls weighed, on average, about 0.5 grams less than the cores from the old group. This difference was statistically significant, which means it’s highly unlikely that it was due to sampling error.
I’m doubtful about their use of statistics here, and whether they actually show what is being claimed. They have 4 “old” balls, and 4 “new” balls, and are making claims about the differences between the pools of (let’s guess) 100,000 balls from which they were respectively taken. They then go from here to hypothesize that the measured differences in the samples can explain the number of home runs hit per year.
Assume that instead of baseballs they were trying to talk about mortality rates in small American cities, and their procedure was to sample 4 Americans from each of two cities with different mortality rates, and then run tests on them to determine what’s causing the different death rates. While not impossible, it would seem surprising if such a small sample would provide firm evidence toward any conclusion. It would be less surprising if by doing multitude of tests one could come up with something that is “statistically significant” about the differences in the samples.
The first things that bothers me about the article is that they don’t give the actual measured values, they just say “this difference was statistically significant”. I mean, there are only eight numbers, let’s see them! Also, they don’t clearly say whether there was overlap in the distributions — that is, where any of the “old” balls lighter than any of the “new” balls. If there was no overlap, I’d assume they would have said so, so their silence makes me presume overlap. By how much? What assumptions are being made about the distributions?
Second, I wonder what other tests they ran. Presumably, if they were X-raying the balls, they expected to see something different about the interiors, but was this the difference they were planning to test? If you run twenty tests (and only report one of them), you’ll probably find a statistically significant result even if it’s just by chance. So did they preregister this hypothesis? And what else did they test?
Third, I wonder about the sampling. Are these balls (which were bought off Ebay) really randomly sampled from the game balls used at the each time? If they were bought from the same sellers, might they have have been from the same batch and thus not randomly sampled? If these were actual game balls, were they sampled from “home run balls” rather than all balls? If so, how does sampling from the extremes change the sample distribution? And might the have changed over time? That is, might the older balls have heavier cores simply because they are older, perhaps because they’ve been stored somewhere humid and are absorbing water?
Fourth, rather than hypothesizing about how this tiny observed difference might cause a large difference in number of home runs per season, how about some tests? Do the lighter balls actually fly farther? One might take measurements of a full set of balls, set up a batting practice with a real pitcher a real batter, and measure how far they fly. No, one doesn’t have to do this for every article, but one also doesn’t have to claim “statistical significance” about about hypothesis.
Lastly, has any thing else changed? A few different pitchers and a few different batters might also cause a large difference in home runs per season. They mention in the article that “The remainder could be reasonably chalked up to a philosophical shift among MLB hitters, who are likely swinging upward to maximize the number of balls they hit in the air and are not shy about the increase in strikeouts that may come with that approach.” Rather than than accounting for the “remainder”, is there any reason to assume that this doesn’t fully account for the difference?
Go to Link: https://news.ycombinator.com/item?id=16510693