# Introduction

## Schema from Exports

Slimming Down the Columns to reduce the memory impact on the computer.

```python
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12291 entries, 0 to 12290
Data columns (total 23 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Unnamed: 0              12291 non-null  int64  
 1   index                   12291 non-null  int64  
 2   Cube Name               12291 non-null  object 
 3   Cubecon Type            12291 non-null  object 
 4   cardID                  12291 non-null  object 
 5   addedTmsp               12291 non-null  object 
 6   details_name            12291 non-null  object 
 7   details_full_name       12291 non-null  object 
 8   details_artist          12291 non-null  object 
 9   details_rarity          12291 non-null  object 
 10  details_color_identity  12291 non-null  object 
 11  details_colors          12291 non-null  object 
 12  details_set             12291 non-null  object 
 13  details_released_at     12291 non-null  object 
 14  details_cmc             12291 non-null  int64  
 15  details_parsed_cost     12291 non-null  object 
 16  details_type            12291 non-null  object 
 17  details_elo             12291 non-null  float64
 18  details_popularity      12291 non-null  float64
 19  details_cubeCount       12291 non-null  int64  
 20  details_loyalty         268 non-null    object 
 21  details_power           5286 non-null   object 
 22  details_toughness       5286 non-null   object 
dtypes: float64(2), int64(4), object(17)
memory usage: 2.2+ MB
```

Count of Cards per Cubes:

```
# Make a DataFrame of above
cube_munity = cubes_main_event.groupby('details_name')['Cube Name'].nunique().sort_values(ascending=False)
cube_munity.to_csv('../Data Files/cards_per_unique_cubes.csv')
```

Creating a function to clean up the Data:

```python
# Give me the # of Cubes each card (overall card name) is in
# Then we turn it into a function:
# Then you move this to after the extraction

def tweak_cubes(cubes_main_event):
    return (cubes_main_event
     .assign(details_cmc=cubes_main_event.details_cmc.fillna(0).astype('int8'),
             is_creature=cubes_main_event.details_type.str.contains('Creature'),
             is_land = cubes_main_event.details_type.str.contains('Land'),
             is_pwer = cubes_main_event.details_type.str.contains('Planeswalker'),
             added_to_cube_on = pd.to_datetime(cubes_main_event.addedTmsp, errors='coerce', unit='ms'),
             composite_id = cubes_main_event['Cube Name'] + '-' + cubes_main_event['cardID']
        )
     .astype({'Cubecon Type': 'category', 'details_cmc': 'int8', 'details_rarity': 'category'})
     .drop(columns=['Unnamed: 0', 'index'])
)

cleaned_data = tweak_cubes(cubes_main_event)
```

Checking for Values in Added Timestamp:
```python
cleaned_data.loc[cleaned_data['added_to_cube_on'].isnull(),['added_to_cube_on', 'addedTmsp']]

cleaned_data.loc[cleaned_data['added_to_cube_on'].isnull(),'added_to_cube_on'] = pd.to_datetime(cleaned_data.loc[cleaned_data['added_to_cube_on'].isnull(),:]['addedTmsp'])
cleaned_data['added_to_cube_on'] = pd.to_datetime(cleaned_data['added_to_cube_on'], utc=True)
cleaned_data['days_until_added_to_cube'] = ((cleaned_data['added_to_cube_on'])-pd.to_datetime(cleaned_data['details_released_at']).dt.tz_convert('UTC')).dt.days

```




In [408]:
import numpy as np
import pandas as pd

In [409]:
pre_accepted_cubes = pd.read_csv('../Data Files/pre_accepted_cubes_extract.csv')
poll_winner_cubes = pd.read_csv('../Data Files/poll_winner_cubes_extract.csv')

In [410]:
pre_accepted_cubes.info()
poll_winner_cubes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12291 entries, 0 to 12290
Data columns (total 23 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Unnamed: 0              12291 non-null  int64  
 1   index                   12291 non-null  int64  
 2   Cube Name               12291 non-null  object 
 3   Cubecon Type            12291 non-null  object 
 4   cardID                  12291 non-null  object 
 5   addedTmsp               12291 non-null  object 
 6   details_name            12291 non-null  object 
 7   details_full_name       12291 non-null  object 
 8   details_artist          12291 non-null  object 
 9   details_rarity          12291 non-null  object 
 10  details_color_identity  12291 non-null  object 
 11  details_colors          12291 non-null  object 
 12  details_set             12291 non-null  object 
 13  details_released_at     12291 non-null  object 
 14  details_cmc             12291 non-null

In [411]:
cubes_main_event = pd.concat([pre_accepted_cubes, poll_winner_cubes], ignore_index=True)
cubes_main_event.head()

Unnamed: 0.1,Unnamed: 0,index,Cube Name,Cubecon Type,cardID,addedTmsp,details_name,details_full_name,details_artist,details_rarity,...,details_released_at,details_cmc,details_parsed_cost,details_type,details_elo,details_popularity,details_cubeCount,details_loyalty,details_power,details_toughness
0,0,0,Regular Cube,Pre-Accepted,251015ed-9408-4941-894a-158551ed2613,1572901810806,Favored Hoplite,Favored Hoplite [ths-13],Winona Nelson,uncommon,...,2013-09-27,1,['w'],Creature — Human Soldier,1226.2,1.630827,2282,,1,2
1,1,1,Regular Cube,Pre-Accepted,70e3a90c-1e5c-4646-b3d5-ff46d3fa7b35,1572901810808,Trusted Pegasus,Trusted Pegasus [m20-314],Chris Rahn,common,...,2019-07-12,3,"['w', '2']",Creature — Pegasus,1155.3,1.550786,2170,,2,2
2,2,2,Regular Cube,Pre-Accepted,27394079-924a-4fdb-8be2-f853193eca80,1572901810808,Whitemane Lion,Whitemane Lion [a25-39],Zoltan Boros & Gabor Szikszai,common,...,2018-03-16,2,"['w', '1']",Creature — Cat,1200.3,3.741898,5236,,2,2
3,3,3,Regular Cube,Pre-Accepted,b958bcdd-d0ea-4ae0-9dd0-e6de5cf74128,1572901810809,Restoration Angel,Restoration Angel [mm3-20],Johannes Voss,rare,...,2017-03-17,4,"['w', '3']",Creature — Angel,1297.8,18.047724,25254,,3,4
4,4,4,Regular Cube,Pre-Accepted,c47ba1fa-3ace-488b-97e6-d9f3b389c602,1572901810809,Emeria Angel,Emeria Angel [ima-20],Jim Murray,rare,...,2017-11-17,4,"['w', 'w', '2']",Creature — Angel,1253.7,4.351493,6089,,3,3


In [412]:
# Update Date when Ran
cubes_main_event.to_csv('../Data Files/cubecon_card_list_2023_06_13.csv')

In [413]:
# Give me the # of Cubes each card (overall card name) is in
# Then we turn it into a function:
# Then you move this to after the extraction

def tweak_cubes(cubes_main_event):
    return (cubes_main_event
     .assign(details_cmc=cubes_main_event.details_cmc.fillna(0).astype('int8'),
             is_creature=cubes_main_event.details_type.str.contains('Creature'),
             is_land = cubes_main_event.details_type.str.contains('Land'),
             is_planeswalker = cubes_main_event.details_type.str.contains('Planeswalker'),
             is_gold = cubes_main_event.details_colors.str.contains(','),
             added_to_cube_on = pd.to_datetime(cubes_main_event.addedTmsp, errors='coerce', unit='ms'),
             composite_id = cubes_main_event['Cube Name'] + '-' + cubes_main_event['cardID']
        )
     .astype(
         {'Cubecon Type': 'category', 
          'details_cmc': 'int8',
          'details_rarity': 'category', 
          'details_released_at':'datetime64[ns, UTC]'}
     )
     .drop(columns=['Unnamed: 0', 'index'])
)

cleaned_data = tweak_cubes(cubes_main_event)

In [414]:
cleaned_data.loc[cleaned_data['added_to_cube_on'].isnull(),'added_to_cube_on'] = pd.to_datetime(cleaned_data.loc[cleaned_data['added_to_cube_on'].isnull(),:]['addedTmsp'])
cleaned_data['added_to_cube_on'] = pd.to_datetime(cleaned_data['added_to_cube_on'], utc=True)
cleaned_data['days_until_added_to_cube'] = ((cleaned_data['added_to_cube_on'])-pd.to_datetime(cleaned_data['details_released_at']).dt.tz_convert('UTC')).dt.days


In [415]:
cleaned_data.info()
##cubes_main_event.to_csv('../Data Files/cubecon_card_list_2023_06_08.csv')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20611 entries, 0 to 20610
Data columns (total 28 columns):
 #   Column                    Non-Null Count  Dtype              
---  ------                    --------------  -----              
 0   Cube Name                 20611 non-null  object             
 1   Cubecon Type              20611 non-null  category           
 2   cardID                    20611 non-null  object             
 3   addedTmsp                 20611 non-null  object             
 4   details_name              20611 non-null  object             
 5   details_full_name         20611 non-null  object             
 6   details_artist            20611 non-null  object             
 7   details_rarity            20611 non-null  category           
 8   details_color_identity    20611 non-null  object             
 9   details_colors            20611 non-null  object             
 10  details_set               20611 non-null  object             
 11  details_release

In [416]:
avg_cubing = (cleaned_data
                 .groupby('Cube Name')
                 .mean()
                 .sort_values('details_popularity', ascending=False)
                 .loc[:,['details_popularity', 'is_land', 'is_gold']]
)

avg_cubing

Unnamed: 0_level_0,details_popularity,is_land,is_gold
Cube Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Data Generated Vintage Cube,10.975211,0.155556,0.080556
The Museum of Modern,10.066587,0.235417,0.2125
The Bun Magic Cube,9.982378,0.216667,0.097222
Dekkaru Cube,9.228532,0.151852,0.085185
The Modern Darlings Cube,8.802486,0.208333,0.116667
Eleusis,8.563567,0.146667,0.057778
The Chicago Cube,8.171992,0.138889,0.108333
Derek’s Cube,7.907902,0.20625,0.1875
The Creative Cube,7.741129,0.1825,0.105
Tiny Leaders,7.224855,0.189189,0.147609


In [424]:
cleaned_data.groupby('details_name')['Cube Name'].nunique().sort_values(ascending=False).head(20)

details_name
Stomping Ground      31
Temple Garden        31
Overgrown Tomb       30
Blood Crypt          30
Sacred Foundry       30
Watery Grave         29
Hallowed Fountain    29
Godless Shrine       29
Breeding Pool        29
Steam Vents          29
Lightning Bolt       27
Faithless Looting    26
Bloodstained Mire    25
Windswept Heath      25
Wooded Foothills     25
Path to Exile        25
Flooded Strand       24
Polluted Delta       24
Duress               24
Young Pyromancer     24
Name: Cube Name, dtype: int64

In [429]:
peasant = cleaned_data[cleaned_data['details_rarity'].isin(['uncommon', 'common'])]

In [431]:
peasant.groupby('details_name')['Cube Name'].nunique().sort_values(ascending=False)
#

details_name
Faithless Looting         25
Duress                    22
Abrade                    22
Unearth                   21
Lightning Bolt            21
                          ..
Hotshot Mechanic           1
House Guildmage            1
Howl of the Night Pack     1
Howling Giant              1
Lucid Dreams               1
Name: Cube Name, Length: 3942, dtype: int64