%pip install numpy pandas matplotlib seaborn

Requirement already satisfied: numpy in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (2.2.2)
Requirement already satisfied: pandas in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (2.2.3)
Requirement already satisfied: matplotlib in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (3.10.0)
Requirement already satisfied: seaborn in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (0.13.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from pandas) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from pandas) (2025.1)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (1.3.1)
Requirement already satisfied: cycler>=0.10 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (4.55.6)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (1.4.8)
Requirement already satisfied: packaging>=20.0 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (24.2)
Requirement already satisfied: pillow>=8 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from matplotlib) (3.2.1)
Requirement already satisfied: six>=1.5 in c:\users\mike\appdata\local\programs\python\python313\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Note: you may need to restart the kernel to use updated packages.

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip

# First Load up all the librarys needed.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pandas import DataFrame


class VCTAgentData:
    def __init__(self, year):
      self.agents_pr = pd.read_csv(f"./data/vct_{year}/agents/agents_pick_rates.csv")
      self.map_stats = pd.read_csv(f"./data/vct_{year}/agents/maps_stats.csv")

class VCTMatchData:
    def __init__(self, year):
      self.WinLossRounds = pd.read_csv(f"./data/vct_{year}/matches/win_loss_methods_round_number.csv") 
      self.MapScores = pd.read_csv(f"./data/vct_{year}/matches/maps_scores.csv")
      self.DraftPhase = pd.read_csv(f"./data/vct_{year}/matches/draft_phase.csv")

class VCTPlayerData:
    def __init__(self, year):
      self.PlayerStats = pd.read_csv(f"./data/vct_{year}/players_stats/players_stats.csv") 
    

class VCTData:
  def __init__(self, year):
    self.Agents = VCTAgentData(year)
    self.Matches = VCTMatchData(year)
    self.Player = VCTPlayerData(year)
  
vct2023 = VCTData("2023")
vct2024 = VCTData("2024")

# Check and Print the null data
# Here is the amount of data that we will be going through.
print("2023 Data: \n")
print(vct2023.Agents.agents_pr.shape)
print(vct2023.Agents.map_stats.shape)
print(vct2023.Matches.DraftPhase.shape)
print(vct2023.Matches.MapScores.shape)
print(vct2023.Matches.WinLossRounds.shape)
print(vct2023.Player.PlayerStats.shape)


print("2024 Data: \n")
print(vct2024.Agents.agents_pr.shape)
print(vct2024.Agents.map_stats.shape)
print(vct2024.Matches.DraftPhase.shape)
print(vct2024.Matches.MapScores.shape)
print(vct2024.Matches.WinLossRounds.shape)
print(vct2024.Player.PlayerStats.shape)

2023 Data: 

(17712, 6)
(738, 7)
(1898, 7)
(830, 16)
(35514, 10)
(11249, 25)
2024 Data: 

(24840, 6)
(1035, 7)
(2604, 7)
(1104, 16)
(46754, 9)
(15030, 25)

print("2023 WinLoss Data: \n", vct2023.Matches.WinLossRounds.isnull().sum())
print("2024 WinLoss Data: \n", vct2024.Matches.WinLossRounds.isnull().sum())

2023 WinLoss Data: 
 Tournament      0
Stage           0
Match Type      0
Match Name      0
Map             0
Round Number    0
Team            0
Method          0
Outcome         0
Identity        0
dtype: int64
2024 WinLoss Data: 
 Tournament      0
Stage           0
Match Type      0
Match Name      0
Map             0
Round Number    0
Team            0
Method          0
Outcome         0
dtype: int64

# We need to filter out all rows that contain data from a challengers tournament. This is easy to do.

def filter_challengers(df:DataFrame):
    df = df[~df['Tournament'].str.contains("Challengers")]
    return df

# We also want to give these matches an identifier so we can sort through the same game fast.
def createMatchIdentifier(df:DataFrame):
    df['Identity'] = df['Tournament'] + '_' + df['Stage'] + '_' + df['Match Type'] + "_" + df['Map']
    return df

vct2022.Matches.WinLossRounds = filter_challengers(vct2022.Matches.WinLossRounds)
createMatchIdentifier(vct2023.Matches.WinLossRounds).head(1)

def numOfTournaments(df:DataFrame):
    return len(df['Tournament'].unique())
tour = pd.Series([ numOfTournaments(vct2022.Matches.WinLossRounds), numOfTournaments(vct2023.Matches.WinLossRounds), numOfTournaments(vct2024.Matches.WinLossRounds)])
tpy = pd.DataFrame({'Years': pd.Series([2022,2023,2024]), 'Tournaments':  tour});
tpy.plot(x='Years', y='Tournaments', kind='bar')

<Axes: xlabel='Years'>

# Firstly lets look at the number of tournaments for the first year, get a pattern down to replicate for others.
# Within the data we need to take a look at a pattern. 

# First lets get all of the over time rounds.

def getFirstOverTime(df:DataFrame):
    cp = df.copy()
    cp.dropna()
    # First lets get all of the over time rounds.
    cp = cp.loc[cp['Round Number'] == 25]
    # Second we just want to see all the wins, doing this also filters out the double up on unique rows per round.
    cp = cp[cp['Outcome'] == "Win"]
    return cp

firstOt2022 = getFirstOverTime(vct2022.Matches.WinLossRounds)

# now that the data has been sorted. We now have round 25 which indicates the first round of overtime played which is the required round that we can see a unique number of overtimes.

# Now we can showcase how many first overtimes there were throughout the year.
tpy = pd.DataFrame({'Years': pd.Series([2022,2023,2024]), 'Overtimes':  pd.Series([len(firstOt2022),len(getFirstOverTime(vct2023.Matches.WinLossRounds)),len(getFirstOverTime(vct2024.Matches.WinLossRounds))])});
tpy.plot(x='Years', y='Overtimes', kind='bar', title="Overtime Matches")

<Axes: title={'center': 'Overtime Matches'}, xlabel='Years'>

# Lets get the number of matches we had.

def getNumberOfUniqueMatches(df: DataFrame):
    cp = df.copy()
    cp.dropna()
    # First lets get of the first rounds, this allows us to see all of the matches.
    cp = cp.loc[cp['Round Number'] == 1]
    # Second we just want to see all the wins, doing this also filters out the double up on unique rows per round.
    cp = cp[cp['Outcome'] == "Win"]
    return cp

tpy['Matches'] = [len(getNumberOfUniqueMatches(vct2022.Matches.WinLossRounds)), len(getNumberOfUniqueMatches(vct2023.Matches.WinLossRounds)), len(getNumberOfUniqueMatches(vct2024.Matches.WinLossRounds))]

tpy.plot(x="Years", y=["Overtimes", "Matches"], kind="bar", title="Overtimes Compared to Total Matches")

<Axes: title={'center': 'Overtimes Compared to Total Matches'}, xlabel='Years'>

def determinePercentageOf(a, b):
    return (a/b);

tpy.head()
tpy["PoOT"] = determinePercentageOf(tpy['Overtimes'], tpy['Matches'])    
tpy['PoOT'] = tpy["PoOT"].map('{:.2%}'.format)

print(tpy)

   Years  Overtimes  Matches    PoOT
0   2022        135     1230  10.98%
1   2023        103      830  12.41%
2   2024        112     1104  10.14%

# First we want to get all of the overtimes rounds that are wins. Similar to what we did before, we now just want every value of overtime
def getAllOverTimeRounds(df:DataFrame):
    cp = df.copy()
    cp.dropna()
    # First lets get all of the over time rounds.
    cp = cp.loc[cp['Round Number'] >= 24]
    # Second we just want to see all the wins, doing this also filters out the double up on unique rows per round.
    cp = cp[cp['Outcome'] == "Win"]
    # This allows the removal of the second round that is played. Can be used to remove the first round instead of wanted.
    cp = cp.loc[cp['Round Number'] % 2 == 1]
    cp = cp['Team'].value_counts().head(12)
    return cp
cOT = pd.concat([vct2023.Matches.WinLossRounds,vct2024.Matches.WinLossRounds], ignore_index=True)
cOT = getAllOverTimeRounds(cOT)
OT_2023_Rounds = getAllOverTimeRounds(vct2023.Matches.WinLossRounds)
OT_2024_Rounds = getAllOverTimeRounds(vct2024.Matches.WinLossRounds)
 
fig, axes = plt.subplots(1,3, figsize=(25,8))
allDataSetsforOT = [(cOT, "VCT History Number of Overtime Showings"), (OT_2023_Rounds, "VCT 2023 Number of Overtime Showings"), (OT_2024_Rounds, "VCT 2024 Number of Overtime Showings")]
for i, (dataplot, title) in enumerate(allDataSetsforOT):
    dataplot.plot(x=dataplot.index, y=dataplot.values, kind="bar", title=title, ax=axes[i]) 

plt.tight_layout()

# To get the teams that won the most overtimes we need to take a look at the overview of all the matches.
# First we need to filter for just the matches that contains overtime. Looking at the data this is pretty easy as all we need to do is look for null points
vct2024.Matches.MapScores.isnull().sum()

def determineOvertimeWins(df : DataFrame):
    df = df.dropna(subset=['Team A Overtime Score', 'Team B Overtime Score'])
    df['Winner']= np.where(
        df['Team A Overtime Score'] > df['Team B Overtime Score'], 
        df['Team A'], 
        df['Team B']
    )
    df = df['Winner'].value_counts().head(12)
    return df
cMS = pd.concat([vct2023.Matches.MapScores,vct2024.Matches.MapScores], ignore_index=True)
cMS = determineOvertimeWins(cMS)
ms23 = determineOvertimeWins(vct2023.Matches.MapScores);
ms24 = determineOvertimeWins(vct2024.Matches.MapScores);

fig, axes = plt.subplots(1,3, figsize=(25,8))
allDataSetsforDF = [(cMS, "VCT History Overtime Map Series Wins"), (ms23, "VCT 2023 History Overtime Map Series Wins"), (ms24, "VCT 2024 History Overtime Map Series Wins")]
for i, (dataplot, title) in enumerate(allDataSetsforDF):
    ax = axes[i]
    dataplot.plot(x=dataplot.index, y=dataplot.values, kind="bar", title=title, ax=ax) 

plt.tight_layout()

C:\Users\Mike\AppData\Local\Temp\ipykernel_5396\3126028497.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Winner']= np.where(
C:\Users\Mike\AppData\Local\Temp\ipykernel_5396\3126028497.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Winner']= np.where(
C:\Users\Mike\AppData\Local\Temp\ipykernel_5396\3126028497.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Winner']= np.where(

def DetermineMapPickRate(df: DataFrame):
    test = df.copy()
    test = test.groupby('Map')['Action'].value_counts().unstack(fill_value=0)
    test["Overall"] = test["ban"] + test["pick"]
    test["Ban Rate"] = determinePercentageOf(test['ban'], test['Overall'])    
    test['Ban Rate Label'] = test["Ban Rate"].map('{:.2%}'.format)
    test = test.rename(columns={'ban': 'Ban Count', 'pick': 'Pick Count'})
    test = test.reset_index() # This is interesting but this is used to make the index into a column. Stuck here for awhile
    test = test.sort_values(by="Ban Rate", ascending=False)
    return test
# we want to get the ban rate. we want to see the number of bans have gotten overall.
df_23 = vct2023.Matches.DraftPhase
df_24 = vct2024.Matches.DraftPhase

combination = pd.concat([df_23,df_24], ignore_index=True)
allTime = DetermineMapPickRate(combination)
df23 = DetermineMapPickRate(df_23)
df24 = DetermineMapPickRate(df_24)

fig, axes = plt.subplots(2,3, figsize=(22,11))
allDataSetsforDF = [(allTime, "VCT History Pick/Ban Draft Rates"), (df23, "VCT 2023 Pick/Ban Draft Rates"), (df24, "VCT 2024 Pick/Ban Draft Rates")]

for i, (dataplot, title) in enumerate(allDataSetsforDF):
    ax = axes[0,i]
    dataplot.plot(x='Map', y=["Ban Count", "Pick Count"], kind="bar", title=title, ax=ax) 

for i, (dpa, title) in enumerate(allDataSetsforDF):
    ax = axes[1,i] # Second Row
    dpa.plot(x='Map', y='Ban Rate', kind="bar", title=f"{title} - Ban Rate", ax=ax) 

    for j, bar in enumerate(ax.patches):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width() / 2, height + 0.01, 
        f"{dpa['Ban Rate Label'].iloc[j]}", ha='center', fontsize=11, color='black')


plt.tight_layout()
# Grouping the Values allows us to see the Map to Action ratio. Unstacking it allows ban and pick to be sorted into different columns. Value_Counts does what it says lol.
# we also want to get the pick rate.
# we need to get the overall number that the map has been mentioned. Some maps dont get selected at all.

def determineScore(df : DataFrame):
    # First we want to get the KDA of the player
    cs = 0
    cs += ((df['Kills'] + df['Assists']) / df['Deaths']) * .4 # Weight of KDA 40%
    # FDs and FKs are like bonus points in this senario.
    if(df['First Deaths'] != 0):
        cs += (df['First Kills'] / df['First Deaths']) * .2 # Weight of FKs/FD are 20%
    else:
        cs += df['First Kills']
    cs += convertPercent(df['Headshot %']) * .15 # Weight of Headshot Percentage is 15%
    cs += df['Average Combat Score'] * .25 # Weight of CombatScore is 25%, this is a value which includes many different aspects, and is used as a slight baseline.
    cs += convertPercent(df['Clutch Success %']) * .20 # Weight of Headshot Percentage is a Bonus over 100% at a 20% bonus
    return cs

def cleanPlayerStats(df : DataFrame):
    # Instead of just removing each rows, we are just going to get the columns that we want.
    df = df[['Player', 'Agents', 'Average Combat Score', 'Headshot %', 'Clutch Success %', 'Kills', 'Deaths', 'Assists', 'First Kills', 'First Deaths' ]]
    # We also want to remove any rows that contains more than 1 agent. To not have multiple values, this is pretty easy as we just need to look for a comma within the agent value.
    df = df[~df['Agents'].str.contains(", ")]
    df["PlayCount"] = df.groupby('Player')['Player'].transform('size')
    df['Headshot %'] = df['Headshot %'].fillna(str("0%"))
    df['Clutch Success %'] = df['Clutch Success %'].fillna(str("0%"))
    df['Average Combat Score'] = df['Average Combat Score'].fillna(0)
    return df

def convertPercent(pString : str):
    if(type(pString) == float): #???? I hate python
        return 0
    return int(pString.replace('%', ''))

def getPlayerScores(df : DataFrame, minimumMatches = 20):
    df = cleanPlayerStats(df)
    df['PScore'] = df.apply(determineScore, axis=1)

    # we want to normalize the score by the number of entries a player has, we want to make sure that a player has at least played 20 matches. to get rid of any outliers.
    df = df[df['PlayCount'] >= minimumMatches]
    df = df.sort_values(by='PlayCount', ascending=True)
    df.head()
    df["NPS"] = (df['PScore'] / df["PlayCount"])
    df.isna().sum()
    df = df.groupby('Player')['NPS'].sum().reset_index()
    df = df.sort_values(by="NPS", ascending=False).head(12) # We only want to show the top 12 player on the graph
    #df['NPS Label'] = df["NPS"].map(lambda value:'{:.0}'.format)
    df['NPS Label'] = df["NPS"].map(lambda value: f'NPS: {value:.2f}')
    return df

cPS = pd.concat([vct2023.Player.PlayerStats,vct2024.Player.PlayerStats], ignore_index=True)
cPS = getPlayerScores(cPS, 40)
ps23 = getPlayerScores(vct2023.Player.PlayerStats)
ps24 = getPlayerScores(vct2024.Player.PlayerStats)
fig, axes = plt.subplots(1,3, figsize=(25,8))
allDataSetsforDF = [(cPS, "VCT History Top Performing Player on Average"), (ps23, "VCT 2023 Top Performing Player on Average"), (ps24, "VCT 2024 Top Performing Player on Average")]

for i, (dataplot, title) in enumerate(allDataSetsforDF):
    ax = axes[i]
    dataplot.plot(x='Player', y="NPS", kind="bar", title=title, ax=ax) 
    for j, bar in enumerate(ax.patches):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width() / 2, height + 2.01, 
        f"{dataplot['NPS Label'].iloc[j]}", ha='center', fontsize=8, color='black')


plt.tight_layout()

Taking a look at the history of VCT¶

What questions are we asking?¶

Introduction & Overview¶

Importing the libraries and Setup¶

Pre-Processing¶

Data Understanding/Visualization¶

Which team plays best under pressure?¶

So, how many blood-pumping overtimes did we have?¶

20 in 2022, 10 in 2023, 15 in 2024¶

Which Years had the most Overtime Matches Played?¶

So what are we looking at here¶

We can see that overtime is not common.¶

So which team showed up the most during Overtime?¶

Which team has the most overtime wins?¶

What map has stayed consistently banned throughout the history of the league?¶

Which Player is the best within the last 2 years?¶

Impact¶

References¶