Overwatch is a 6v6 action game that emphasizes strategy and teamwork. While individual skills are still important, its effect is significantly less than games like Counter-Strike: Global Offensive. Being primarily an objective-based game, individual performance metrics like damage done, eliminations, accuracy, or kill/death ratio often cannot be directly compared and do not represent actual outcomes of matches.
I pursued this project with the motivation of being able to analytically compare how different teams or players perform, and to see if certain trends or insights could be obtained from the game.
The Overwatch League provides a few tableau dashboards to visualize data such as teams, players, and teamfights information. Blizzard, the company which made Overwatch, actively suppresses the availability of these metrics in (minimal comparison in-game) and out of game (only few aggregated metrics from a few selected players are shown post-game) to discourage players from using them as performance evaluation (and to tone down the teammate blaming). This trend is no different for Overwatch League games, where only some aggregated data from individual matches are available.
For reference, the dashboards and data is available at https://overwatchleague.com/en-us/statslab, however, none of it is used for this project.
Although more fine-grained data is unavailable for play-by-play game data, the Overwatch League does have a replay viewer, allowing replaying the game via the game client. This will be used to extract precisely timed data during the games. The replay viewer allows free roam, first-person and third-person view.
Many games and maps are played for Overwatch, due to the video recording and post-processing time required, a 10 minute game would require around 6-7 hours of processing time. Ultimately, the following criteria are chosen:
The end result is a total of 6 games being analyzed, ranging between 10-15 minutes each. However, one specific game, The OWL 2020 Playoffs Losers' Finals between Seoul Dynasty and Shanghai Dragons, had multiple processing failure which remained unresolved at the time of writing this report, resulting in the rest of the report basing on these 5 maps:
Here's a short extract from how the captured video looks like. The playback speed is 2x and captured at 60 frames per second, without sound, and in 3rd person perspective. The graphics details are also dialed down to the lowest possible but maintaining 1080p resolution. OBS Studio and a series of keyboard macros were used to automatically capture and loop the replays between all 12 players, resulting in a 1 hour video for a 10 minute match.
Although this verification was not required, automatically synchronize the videos between all players was done and a sample freeze frame image is provided to demonstrate this.
With the video captured, frame by frame analysis can be done.
Each King of the Hill map uses a first-to-two scoring system. Each round consisted of the following state:
State |
Description | Image |
|
---|---|---|---|
Countdown to round start | At the beginng of a round, there is a 30 second countdown before players can leave their spawning area. | ![]() |
|
Countdown to objective unlock | Once the round starts, there is aother 30 second countdown until the objective can be captured. | ![]() |
|
Objective unlocked | Once the objective is unlocked, the teams will fight for control to try to get to 100% progress. In this snapshot, the white team is currently in control of the objective. | ![]() |
|
Overtime | When the progress for the team in control has reached 99%, if the opposing team is actively contesting the objective, the round will enter overtime state. No further progress will be made until either team stops contesting. | ![]() |
|
End of round | The round ends when one of the teams achieves 100% progress | ![]() |
To determine the current state of a round based on each image frame, OpenCV
's matchTemplate()
was used against these templates. If none of the templates matches, the round is currently in progress with the objective unlocked.
Initially PyTesseract
was used to attempt to recognize the team progress, however, the accuracy was very low. An alternate strategy was used to count how long each team was in control for. This was achieved by the following steps:
OpenCV
's matchTemplate()
was used against these templates. With only a limited number of heroes being played during the playoffs, these partial templates were used to determine each player's hero selection, and when they swapped heroes. Hero selection is necessary as each hero has different health pool and abilities.
Health and ultimate charge values are available at set positions in the UI. While the health value is always represented as a number, when a player's ultimate is fully charged, the number turns into an icon. OpenCV
's matchTemplate()
was used to check if a checkmark is present in the portraits bar to check whether an ultimate is charged.
All health and ultimate charge numbers are extracted out stored for further decoding.
Initially, an attempt to using OCR to recognize the digits was attempted with PyTesseract
, however, even when specifying a numeric character subset, the results were poor. With the noisy backgrounds and a font with a few very similar looking numbers, OCR performed poorly (accuracy less than 60%).
Improvements to the OCR performance were attempted by improved image pre-processing, binarizing the background and foreground did help with the results, however, it was still unsatisfactory (sub 80%), with numbers like 3 / 5 / 6 / 9 and 8 / 0 still getting mixed up. A combination of thresholding, skewing, dilation, erosion, noise removal, and flood filling was used to achieve better digit separation.
With OCR's poor performances, OpenCV's template matching was explored and provided some hope. With some image pre-processing, digits were extracted from the numbers and templates for each number was generated.
The digit matching gave decent results, was it was the first time that individual digits were matched with an accuracy above 95%. However, problems still persisted with certain sets of numbers.
Use of a Keras
digit classifier model was next attempted, with decent results from template matching, a model was trained with 12,000 digit samples labeled from the earlier results. The model performed perfectly, achieving a 100% accuracy for digit classification. However, some problems still prevented it from being the final solution, namely, the digits were not extracted properly from the images.
Due to the high variance of colors in the background, performing threshold and floodfill transformations were problematic.
![]() |
![]() |
![]() |
Examining the histograms of these problematic examples revealed that dynamic thresholding was needed in order binarize the input images better.
![]() |
![]() |
![]() |
![]() |
With the improved results, the model was applied to other videos. Unfortunately, the results were inconsistent still. The image pre-processing proved too difficult given the variance in the input data.
Since we were able to have good results from one video, we managed to obtain a large number (50,000+) of samples with their values labeled correctly. With little to lose, 2 neural network number classifiers, one for health (up to 3 digit number) and another for ultimate charge (up to 2 digit number), were trained to see how they performed, the samples were pre-processed only as grayscale images. Although some errors still persisted, the end result was excellent after a two-frame confirmation outlier removal was performed. Theses models were used in the following analysis.
Different stages of teamfights (What is a Teamfight?) can now be identified from the data. A teamfight usually includes the following stages:
A team health based threshold could provide a good baseline for teamfights. A simple algorithm combining percentage health remaining for players, eliminated players, as well as gradual ease in factor after respawn resulted in the following model.
The simple model exceeded expectations and matched up very well to manual identification. This model will be used as is for the following analysis.
With the data collected and teamfights identified, overviews for each round was generated.
Unfortunately, there were fewer than 100 teamfights in total through the analysis of these 5 matches. The grounds for drawing conclusion from the analysis below is weak, but it should serve as a demonstration of what is possible with the data extracted.
For the first teamfight, teams are on even ground. Let's see if certain teams do better than others.
Often, teams with the first kill are more likely to win the teamfight. Let's see if this is the case.
There is a strong believe that the team having ultimate advantage has a significantly higher chance of winning the next team fight, let's see if it is the case.
Let's explore if there are any relationship between teamfight length and win rates for teams.
Improvements can be made to the current state of the project, namely:
Using computer vision to extract data and insights from Overwatch replays are technically feasible. However some limitations exists, for example, it's currently not possible to track damage source, and certain animations (D.Va calling down her mech) blocks out the UI completely for a split second.