Introduction to Dodgeball statistics

Dodgeball analysis through data science and statistics

Following the European Dodgeball Championships in Italy last year, I started doing some statistical analyses for the games that I played in with the Mix team. The goals were to get some insights into team tactics that could be used to develop our game, and to identify individual strengths and weaknesses that could guide player/line-up selection, training and allow for tailored programs. In essence, it is a data scientific approach to Dodgeball, familiar for those who have watched the movie “Moneyball”. I have for some time now been planning to write a series of articles about these analyses, but have never really gotten around to it due to lack of time. In this post, I will outline what types of analyses I have done, and the data I collected. Future posts will go into each type of analysis and the respective findings in more detail.

First of all, the data I collect is recorded manually, from video recordings of the games (some of them you can find on YouTube: EuroDodge2018, CentralEuroDodge2019). I record every throw, along with meta-data that can be of interest for further analysis, such as time, player, outcome of throw, thrower and defender position/sex/distance, whether the throw was planned (called by the playmaker) or improvised, stationary or countering, whether one ball was thrown or several and if they were synchronised or not, number of players on court, other players in line-up, team scores etc.

For now, I have manually collected 99’813 raw data points (from three championships: European 2018, Central European 2019, National 2018), with some of them combined to synthesize additional meta-data for a combined total of 221’726 data points. For the analysis I have written a 138-page, 332’941-character, 7’954-line code that includes Input/Output, calculations, and plotting.

The analysis consists of findings of increasing order of complexity:

Layer 1) Raw statistics such as the frequency of specific events (throws, dodges, blocks, eliminations) and playing time. These stats give a very rough overview of playing style for individual players (such as Male5 in this post), and illustrate some team tactics for obvious patterns.

Layer 2) Compound statistics such as hit/defense percentage, player efficiency, hit percentage for different ball possessions as well as for improvised/planned plays. These findings are generated by combining the stats from layer 1 with each other and with meta-data. These stats start to paint a picture of which players are skilled at what, and offer a basic analysis of decision making on the court. They also allow for some more advanced statistical methods to be used, such as linear regression, to identify game-specific patterns (i.e. not specific for the players or teams being analysed, but holds for dodgeball in general) that can be used to guide further analysis. For example, identification of the relative importance on scoring ability from being a good attacking player (hard, accurate throwing) versus defensive player (blocking, dodging).

Layer 3) Complex statistics that are guided by findings in layer 2, such as hit/defense percentages corrected for relative importance, technical/tactical skill score and individual skill profile, individual player importance, focus and stamina change across games or long tournaments with many games, as well as player decision making and net effect on game outcome. These stats give valuable insights into tailoring individual training programs and optimizing player development. For example, whether someone should focus more on improving throwing or dodging, improvised situations, or to be more aware of their own strengths and limitations depending on the opponent.

Layer 4) Exploratory statistics that are extrapolative or in early stages of development, as well as team-level statistics based on findings from lower layers. For example, determination of metacognitive decision making and individual shot-taking sensitivity, or identification of the optimal shot-sensitivity/-frequency for individual players that maximises point scoring efficiency. I have segmented all the collected data into 5-minute epochs for each player, and then used those short time-windows to calculate stats from the first 3 layers. By doing this, one gets a much larger dataset, which is still game-relevant, than using stats from whole sets/matches. These stats can then be used for further identification of game-specific patterns across teams, or identification of which line-ups are more successful.

I urge those who are interested in data science, sport analysis and/or dodgeball to contact me with comments on the posts and analyses, and ideas for additional analyses that could be of interest for the dodgeball community.