We’re less than two weeks into the 2014 baseball season, and most people would say that it’s too early to make any forecasts about the rest of the year.
Still, as others have noted, though ten games only represents 6% of an MLB season, surely these early games provide some indication of how a team will finish. Does Milwaukee’s 7-2 start mean that they may not be the sub-.500 team they were predicted to be? Are the 4-8 Diamondbacks likely to be even worse than expected?
The graphic below explores this question. It plots the full season for every team over the last 10 years, or 300 seasons in total. By filtering by record, you can see how teams with similar starts fared over the rest of the season, and how this compares to an average season. (Because the graphic is loading nearly 50,000 games, it takes moment to first display.)
Note that for records with fewer than 5 teams, the graphic expands the win filter to include at least 5 teams. For example, because only one team started 6-0, the graphic shows teams that started both 6-0 and 5-1.
As the graphic shows, there’s generally a lot of noise early in the season. A small bit of history, however, is on the Brewers' side. Based on the 15 teams that also started 7-2, the Brewers have around an 80% chance to finish above .500. In Arizona, things don’t look too desperate yet: Though teams that start 4-8 average 6 fewer wins than the overall mean, the distribution is still quite wide and many teams finish above .500.
Of course, forecasting (in the loosest sense of the word) future records based on teams' current records is a very simple way to approach this problem. There are a number of other factors, such as runs scored, runs allowed, run differentials, results against other strong or weak teams, and results in home and away games, that could be indicative of future success.
As an example, run differentials could provide more information than just win totals. The table below shows the relationship between run differentials in wins and season win totals. As it shows, teams that win by more runs in their first 30 games tend to have better seasons. While that’s unsurprising, it suggests that wins alone is probably a crude metric, especially early in seasons when the sample is so small.
For anyone interested in exploring this data further, Retrosheet data is available in Mode for every season since 1980. This data can be analyzed, visualized, and shared directly through Mode, and I can provide access to anyone who is interested. If you’d like to modify or double-check any of my analysis above, you can click through the embed links to access the work and data directly.
Looks like you've got a thing for cutting-edge data news.
So do we. Stay in the know with our regular selection of the best analytics and data science pieces, plus occasional news from Mode. Sign up here and we'll keep you posted: