# **Set Analyzer**

Le but de ce script est d'analyser statistiquement les différents set de carte du jeu *Magic the Gathering* afin de :


1.   Déterminer les cartes les plus pertinentes dans chaque set
2.   Comparer les métadonnées liées aux cartes entre chaque set

L'objet de l'analyse est de pouvoir objectiver la valeur intrinsèque d'une carte *intra* et *inter* set.

L'application visée est l'utilisation de ses données afin de performer dans les formats limités (*sealed, draft*)




***draft***

quelques métadonnées utiles

**intraset**
- nombre de cartes + ratios
- card type : couleur, créature / non-créature, rareté
- cout : valeur de mana
- board : power, toughness, evasion (en fonction de keywords: flying, trample, menace, ...)
- interaction : (en fonction de keywords: counter, return, deal damage, ...)
- mana fixing : type de manafix (en fonction de keywords: dork, rock, treasures, ...)

**interset**
- replacer chacune des données précédentes par rapport aux précédents sets
- évaluer : vitesse du set, créatures fragiles, propension de bombes, propension d'interaction, potentiel de fixing, etc





# **Initialisation**

In [137]:
# Import librairies to use in the code

import os
import json
import re
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# import seaborn as sns
from tqdm import tqdm

# Import widgets librairies
# import ipywidgets as widgets
from IPython.display import display

Upload data from JSON (dataset from https://mtgjson.com/)

In [2]:
# Set path of the folder containing dataset
dataset_FolderPath = Path.cwd().parent / 'datasets' / 'MTG_datasets' # @dev TBC before each use

# Set path of the File
dataset_FileName = 'AllPrintings.json'
dataset_FilePath = dataset_FolderPath / dataset_FileName

In [5]:
# load all dataset
data = pd.read_json(dataset_FilePath)
allSets = data.iloc[2:]['data'] # 2 first rows of JSON files are metadata

In [21]:
setCompare = allSets.apply(pd.Series)[['baseSetSize', 'code', 'totalSetSize', 'type', 'name', 'releaseDate']]

## Load Limited dataset function

In [23]:
def loadLimitedSet(set_code):
    """
    Loads and processes a subset of Magic: The Gathering cards from a given set code.
    
    Parameters:
    set_code (str): The unique identifier of the card set to be loaded.
    
    Returns:
    pd.DataFrame: A DataFrame containing the cleaned and filtered card data.
    
    Functionality:
    1. Defines relevant card features to be analyzed, including name, keywords, mana cost, color identity, 
       power, toughness, rarity, types, and text.
    2. Loads the card data from the `allSets` dataset based on the given `_setCode`.
    3. Converts the extracted data into a Pandas DataFrame and selects only the first `baseSetSize` cards.
    4. Cleans numeric fields ('manaValue', 'power', 'toughness') by converting them to numeric values, 
       coercing invalid entries to NaN.
    5. Filters the dataset to include only cards with rarities 'common' or 'uncommon'.
    6. Returns the processed DataFrame for further analysis.
    """
    
    # Define cards features to be analyzed
    features_analyzed = [
        'name',
        'keywords',
        'manaValue',
        'manaCost',
        'colorIdentity',
        'power',
        'toughness',
        'rarity',
        'types',
        'text']

    # Load cards
    df = pd.DataFrame.from_dict(allSets.loc[set_code]['cards'])
    cards = df.loc[:allSets.loc[set_code]['baseSetSize']-1, features_analyzed]
    
    # Clean numeric data
    cards[['manaValue', 'power', 'toughness']] = cards[['manaValue', 'power', 'toughness']].apply(pd.to_numeric, errors='coerce')
    
    # Keep only common and uncommon cards
    cards = cards[cards['rarity'].isin(['common', 'uncommon'])]
    
    return cards

In [298]:
set_code = 'OTJ'
cards = loadLimitedSet(set_code)
cards

Unnamed: 0,name,keywords,manaValue,manaCost,colorIdentity,power,toughness,rarity,types,text
2,Armored Armadillo,[Ward],1.0,{W},[W],0.0,4.0,common,[Creature],Ward {1} (Whenever this creature becomes the t...
4,Bounding Felidar,[Saddle],6.0,{5}{W},[W],4.0,7.0,uncommon,[Creature],Whenever Bounding Felidar attacks while saddle...
5,Bovine Intervention,,2.0,{1}{W},[W],,,uncommon,[Instant],Destroy target artifact or creature. Its contr...
6,Bridled Bighorn,"[Saddle, Vigilance]",4.0,{3}{W},[W],3.0,4.0,common,[Creature],Vigilance\nWhenever Bridled Bighorn attacks wh...
9,Eriette's Lullaby,,2.0,{1}{W},[W],,,common,[Sorcery],Destroy target tapped creature. You gain 2 life.
...,...,...,...,...,...,...,...,...,...,...
281,Swamp,,0.0,,[B],,,common,[Land],({T}: Add {B}.)
282,Mountain,,0.0,,[R],,,common,[Land],({T}: Add {R}.)
283,Mountain,,0.0,,[R],,,common,[Land],({T}: Add {R}.)
284,Forest,,0.0,,[G],,,common,[Land],({T}: Add {G}.)


## 1) SPEED

Format speed can be caracterized by :
- the ratio of creatures: set a grade from 100-90% == S ; 90-70% == A ; 70-50% == B ; 50-30% == C ; 30-10% == D ; 10-0% == E
- the median creature `manaValue`
- the median `powerToManaValue` : above 1: creatures hit hard, fast
- the board state (see section 2)
- the number of interactions (see section 3)

In [269]:
# Filter for 'Creature' only
cardsCreatureFiltered = cards[cards['types'].apply(lambda x: 'Creature' in x)]
cardsCreatureFiltered = cardsCreatureFiltered.copy()

# Ratio of creatures
nTot = len(cards)
nCreature = len(cardsCreatureFiltered)
limitedCreatureRatio = (nCreature / nTot) * 100 # in percentage
# add a grade (@dev TBD)

# Creature Manavalue
medianCreatureMV = cardsCreatureFiltered['manaValue'].median()
cardsCreatureFiltered['normalizedCreatureManaValue'] = cardsCreatureFiltered['manaValue'] - medianCreatureMV # normalized columns

# Creature Power to ManaValue
cardsCreatureFiltered['powerToManaValue'] = cardsCreatureFiltered['power'] / cardsCreatureFiltered['manaValue']
medianPowerToMV = cardsCreatureFiltered['powerToManaValue'].median()
cardsCreatureFiltered['normalizedPowerToManaValue'] = cardsCreatureFiltered['power'] - medianPowerToMV # normalized columns

In [271]:
# Add values to setCompare
setCompare = setCompare.copy()

setCompare.at[set_code, 'limited_CreatureRatio'] = limitedCreatureRatio
setCompare.at[set_code, 'limited_medianCreatureManaValue'] = medianCreatureMV
setCompare.at[set_code, 'limited_medianCreaturePowerToManaValue'] = medianPowerToMV

## 2) BOARD STATE

- the mean creature `power`
- the mean creature `thougness`
- the mean `powerToToughness ratio`: above 1: creatures are likely to hit harder and defend badly (and vice versa)
- ratio of evasive creatures (ie. 'Flying', 'Trample', 'Menace')

In [274]:
# Creature Power
medianCreaturePower = cardsCreatureFiltered['power'].median()
cardsCreatureFiltered['normalizedCreaturePower'] = cardsCreatureFiltered['power'] - medianCreaturePower # normalized columns

# Creature Toughness
medianCreatureToughness = cardsCreatureFiltered['toughness'].median()
cardsCreatureFiltered['normalizedCreatureToughness'] = cardsCreatureFiltered['toughness'] - medianCreatureToughness # normalized columns

# Creature Power to Toughness
cardsCreatureFiltered['powerToToughness'] = cardsCreatureFiltered['power'] / cardsCreatureFiltered['toughness']
medianPowerToToughness = cardsCreatureFiltered['powerToToughness'].median()
cardsCreatureFiltered['normalizedPowerToToughness'] = cardsCreatureFiltered['power'] - medianPowerToToughness # normalized columns

# Evasion
def countKeywords(data):
    unique_keywords = list(data['keywords'].explode().unique())
    unique_keywords.remove(np.nan)
    exploded_data = data.explode('keywords')
    filtered_data = exploded_data[exploded_data['keywords'].isin(unique_keywords)]
    KW_count = exploded_data['keywords'].value_counts().to_dict()
    return KW_count

def countEvasiveKeywords(keyword_dict, keyword_list):
    evasiveCount = [keyword_dict[key] for key in keyword_list]
    return evasiveCount

KWCount = countKeywords(cardsCreatureFiltered)

# evasiveKW = ['Flying', 'Trample', 'Menace']
# evasiveCount = countEvasiveKeywords(KW_dict, evasiveKW)
# print(evasiveCount)

In [276]:
# Add values to setCompare
setCompare = setCompare.copy()
setCompare.at[set_code, 'limited_medianCreaturePower'] = medianCreaturePower
setCompare.at[set_code, 'limited_medianCreatureToughness'] = medianCreatureToughness
setCompare.at[set_code, 'limited_medianCreaturePowerToToughness'] = medianPowerToToughness
setCompare.at[set_code, 'limited_KWCount'] = [KWCount]

## 3) FIXING

- monocolor-to-multicolor ratio (lands excluded)
- multi-pip ratio : cards with more that one colored pip in mana cost
- ratio of mana producer : lands, manarocks, manadorks, treasures
- type of mana produced

In [292]:
# Monocolor to multicolor ratio
non_land_cards_total = len(cards[(cards['types'].apply(lambda x: 'Land' not in x))])
multicolor_nonland_cards = len(
    cards[
        (cards['types'].apply(lambda x: 'Land' not in x)) 
        & (cards['colorIdentity'].apply(len) > 1)
    ])
monocolorToMulticolorRatio = multicolor_nonland_cards / non_land_cards_total

# Multi-pip ratio
def isMultiPip(s, letters_to_remove=None):
    if letters_to_remove is None:
        letters_to_remove = ['{', '}', 'C', 'X']  # Assign default list safely

    if not isinstance(s, str):  # Handle NaN or non-string values
        return False
    
    s = ''.join(c for c in s if c not in letters_to_remove and not c.isdigit())
    return len(s) > 1

multiPipRatio = len(cards[cards['manaCost'].apply(isMultiPip)]) / non_land_cards_total

# Mana producers
# lands + manarocks + manadorks = text contains 'Add {'
# treasures = text contains 'treasure.s token'
# fetch / tutor = text contains 'search ... land'

# Type of mana produced

In [294]:
setCompare.at[set_code, 'limited_MonoToMulticolorRatio'] = monocolorToMulticolorRatio
setCompare.at[set_code, 'limited_MultiPipRatio'] = multiPipRatio

## 4) Interactions (TBD)

a quel point le set est interactif ?
000 - définir ce qu'est une interaction
- ratio de permanents
- pourcentage d'interaction
- la "vitesse" de l'interaction = distribution de mana value des sorts interactifs
- type d'interaction : single-target removal + combat trick
- color pie

In [None]:
"""
# ratio of permanents

def checklist(items_wanted, items_tbc):
  return any(item in items_wanted for item in items_tbc)

permanent_index = [item for item in type_index if (item !='Instant' and item!='Sorcery')]
cards['types'][cards['types'].apply(lambda x: checklist(x,permanent_index)==False)] #non-permanent
cards['types'][cards['types'].apply(lambda x: checklist(x,permanent_index)==True)]  #permanents

print('permanent ratio = ' + str(len(cards['types'][cards['types'].apply(lambda x: checklist(x,permanent_index)==True)])/len(cards)*100) + ' %')
"""

In [None]:
"""
# get interactive cards

interaction_list = [
    'destroy',
    'exile',
    'counter',
    'target'
]

def interactive_card(str):
  if any(word in str for word in interaction_list):
    return True
  else:
    return False

cards[cards['text'].apply(interactive_card)]
"""

# **Interset**

In [None]:
# toutes les stats interset

# mettre en input le nombre et la temporalité des sets à comparer (tout le modern, 4 derniers sets, etc)
# écrire une ligne d'input

# appeler les fonctions précédentes
# ranger dans des listes / df pour faire les statistiques ensuite

In [296]:
setCompare.loc[['OTJ', 'DSK', 'MH3']]

Unnamed: 0,baseSetSize,code,totalSetSize,type,name,releaseDate,limited_CreatureRatio,limited_medianCreatureManaValue,limited_medianCreaturePowerToManaValue,limited_medianCreaturePower,limited_medianCreatureToughness,limited_medianCreaturePowerToToughness,limited_KWCount,limited_MonoToMulticolorRatio,limited_MultiPipRatio
OTJ,286,OTJ,374,expansion,Outlaws of Thunder Junction,2024-04-19,51.456311,3.0,1.0,2.0,2.5,1.0,"[{'Plot': 17, 'Flying': 16, 'Saddle': 10, 'Rea...",0.113636,0.181818
DSK,286,DSK,451,expansion,Duskmourn: House of Horror,2024-09-27,51.269036,3.0,1.0,2.0,3.0,1.0,"[{'Flying': 13, 'Eerie': 12, 'Survival': 10, '...",0.11399,0.212435
MH3,303,MH3,560,draft_innovation,Modern Horizons 3,2024-06-14,45.714286,3.0,0.75,2.0,2.0,1.0,"[{'Devoid': 17, 'Flying': 15, 'Adapt': 9, 'Bes...",0.179775,0.303371
