Simulate Game
Team 1
Team 2
Set For Team
About Simulation
Batter 1:
Plate Appearances:
Hits:
Doubles:
Batter 2:
Plate Appearances:
Hits:
Doubles:
Batter 3:
Plate Appearances:
Hits:
Doubles:
Batter 4:
Plate Appearances:
Hits:
Doubles:
Batter 5:
Plate Appearances:
Hits:
Doubles:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Batter 6:
Plate Appearances:
Hits:
Doubles:
Batter 7:
Plate Appearances:
Hits:
Doubles:
Batter 8:
Plate Appearances:
Hits:
Doubles:
Batter 9:
Plate Appearances:
Hits:
Doubles:
Pitcher:
ERA (Not in Use):
WHIP:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Batter 1:
Plate Appearances:
Hits:
Doubles:
Batter 2:
Plate Appearances:
Hits:
Doubles:
Batter 3:
Plate Appearances:
Hits:
Doubles:
Batter 4:
Plate Appearances:
Hits:
Doubles:
Batter 5:
Plate Appearances:
Hits:
Doubles:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Batter 6:
Plate Appearances:
Hits:
Doubles:
Batter 7:
Plate Appearances:
Hits:
Doubles:
Batter 8:
Plate Appearances:
Hits:
Doubles:
Batter 9:
Plate Appearances:
Hits:
Doubles:
Pitcher:
ERA (Not in Use):
WHIP:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Triples:
Home Runs:
Walks:
Team 1:
Plate Appearances:
Hits:
Doubles:
ERA (Not in Use):
WHIP:
Triples:
Home Runs:
Walks:
Team 2:
Plate Appearances:
Hits:
Doubles:
ERA (Not in Use):
WHIP:
Triples:
Home Runs:
Walks:
Team 1 Score:
Team 2 Score:
Record:
Games to Sim:
www.zwmiller.com
Copyright © 2016 Zachariah Miller
How the Simulation is Done
This simulation takes in the stats from all players on a team (one starting pitcher, nine batters), then simulates each at bat using a Monte Carlo approach. The stats are used to set the percentage chance that an event occurs, and then the program rolls a dice and decides whether that event occurs. As an example each at bat's result (hit, walk, or out) is determined by a combination of the pitchers "Walks and Hits per Inning Pitched" (WHIP) and the batters "On Base Percentage" (OBP). If it's a hit, the batters distribution of extra base hit probability is used to decide whether they hit a single, double, triple or homerun. These way these values are used is tuned such that when fed the average Major League hitters stats for every batter and pitcher, the output value for runs per game matches the league average (2015 stats used for tuning).
The baserunning is very simplistic and follows "ghost man" rules from childhood games. A runner only advances by the number of bases equivalent to the hit. So, a single moves each runner one base, a double two bases, and so-on. This also means there is no such thing as a sacrifice fly in this simulation. However, both of these issues apply to both teams equally and should only have a small effect compared to the statistical fluctuation.
Stats in Use
I've chosen to not rely on inputting average and slugging percentage, and then calling it a day. I tried this initially, as it's the simplest method, however the predictive power was weak at best. Fed the lineups from the 2015 Cubs and Reds, this simple form only found the Cubs to be marginally better than the Reds. This was simply not true. Thus, I moved to requiring more detailed information for each batter at the cost of an ugly form and a large commitment of time from the user. However, the accuracy of the simulation is much better now: coming within 7% of the Cubs vs Reds (2015) win-percentage over the course of 10,000 simulations. Given the lack of baserunning, sac-fly, etc. I'm content with that result.
Improvements to Come
Currently, the pitching stats are limited. Part of that is due to being under-development... part of that is because accounting for pitching is really quite difficult. Sample sizes are small in terms of game-long perfomance, finding predictive inputs is difficult, and finding a nice way to merge the pitcher and the batter to produce reliable outputs of their head-to-head matchups is questionable. Currently, I use WHIP to scale the batter's chance of successful at bats, with the scaling set against the league average WHIP. Improvements to this system are underway. However, fed Clayton Kershaws WHIP and an average hitting lineup, it reproduces his ERA within tolerance. Same for Alfredo Simon. I haven't done much testing in the middle of the WHIP spectrum, but it seems to do a pretty good job on the tests I have run.
And with that, I release this to you to play with. Enjoy tinkering with lineups to see how the runs/game can change. I also think you'll find the random fluctuations to be much larger than you expect. Try simulating a 162 game season with two equivalent teams, you'll see seasons where one team wins more than 90 games (the accepted threshold of an excellent team) quite frequently; which is strange considering both teams are identical. But, thems the breaks and also why most experts agree that in order to always select the true best team, we'd need a 400+ game season. Statistics are fun.
I hope to add more features, such as tracking individual batter performance, the ability to lookup already completed teams, etc... but those are still far away. In the meantime, happy simulating. If you find bugs or strange behavior, please mail to zaglamir@gmail.com.
-Zach
© Z.W. Miller - 2016