Generating Realistic Baseball Line Scores

A "line score" of a baseball game lists the runs scored by each team in each inning, along with various embellishments such as the total number of runs scored by each team. This Demonstration lets you produce synthetic line scores for baseball games. You choose the mean number of runs scored per inning and the standard deviation of runs scored per inning using the sliders. You choose the length of the standard game. The Demonstration responds by producing 18 sample line scores. If you want to produce a new sample without changing any of the other parameters, you can move the bottom slider.

THINGS TO TRY

SNAPSHOTS

  • [Snapshot]
  • [Snapshot]
  • [Snapshot]

DETAILS

To permit a more compressed presentation, the games are sorted so that longer games appear in the rightmost column.
The default values for the sliders are calibrated to the actual results of the 2007 Major League Baseball season using data made available with the consent of "Retrosheet", providing the following notice appears: "The data used was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at www.retrosheet.org."
The Demonstration assumes that the runs per inning are essentially gamma distributed. Various experiments conducted by the author suggest that this is a reasonable assumption for major league baseball games. The Demonstration modifies the gamma distribution on one occasion: if the home team was trailing in the bottom of the last regular inning or at the start of any "extra inning", the maximum number of runs the home team can score in the bottom of that inning is the difference in score at the start of the inning plus four. The maximum can be achieved if the home team ties the score and, in a bases loaded situation, a batter on the home team hits a home run.
The code used in this Demonstration can be used to rapidly simulate the results of thousands of baseball games and the simulated data can then be used to conduct various other sabremetric explorations. By way of example, one could use the code in this Demonstration to compute the probability that a game will be "interesting", that is, the score difference be less than some threshold, if the home team is winning by six runs in the bottom of the fifth inning.
A yet more realistic line score generator would take into account the fact that baseball runs are not equally distributed over the innings. In Major League baseball, for example, considerably more runs are scored in the first inning (when the best batters tend to hit) than in the second inning. Fewer runs also tend to be scored in the later innings perhaps because teams often use their most effective pitchers ("closers") at that time.
Snapshot 1: line scores if baseball games lasted six innings
Snapshot 2: line scores if baseball games lasted twelve innings
Snapshot 3: line scores with different inning score parameters
    • Share:

Embed Interactive Demonstration New!

Just copy and paste this snippet of JavaScript code into your website or blog to put the live Demonstration on your site. More details »

Files require Wolfram CDF Player or Mathematica.