# Data Collection

This Jupyter Notebook is for collecting data using the [Scryfall API](https://scryfall.com/docs/api).  

The Scryfall API provides a way to download [bulk data](https://scryfall.com/docs/api/bulk-data) of all the Magic: The Gathering cards.  
We're interested in the **Default Cards** file, as it contains all Magic: The Gathering cards including reprints.  
The download URI for the Default Cards json file can be retrieved from the bulk data request. 

In [1]:
import requests

r = requests.get('https://api.scryfall.com/bulk-data')

if r.status_code == 200:
    response_dict = r.json()
    # Get the download uri for the most recent bulk data default cards json file.
    default_cards_download_uri = response_dict['data'][2]['download_uri']
    # Get and save the cards in a json file.
    cards_r = requests.get(default_cards_download_uri)
    if cards_r.status_code == 200:
        with open('bulk_data_default_cards.json', 'wb') as f:
            f.write(cards_r.content)

# Data Preprocessing

In [2]:
# First off let us load the card data into a Pandas DataFrame:
import pandas

magic_df = pandas.read_json('bulk_data_default_cards.json')

In [3]:
# Take a look at the head of the dataframe to make sure everything went smoothly.
magic_df.head()

Unnamed: 0,object,id,oracle_id,multiverse_ids,mtgo_id,mtgo_foil_id,tcgplayer_id,cardmarket_id,name,lang,...,tcgplayer_etched_id,attraction_lights,color_indicator,life_modifier,hand_modifier,printed_type_line,printed_text,content_warning,flavor_name,variation_of
0,card,0000579f-7b35-4ed3-b44c-db2a538066fe,44623693-51d6-49ad-8cd7-140505caf02f,[109722],25527.0,25528.0,14240.0,13850.0,Fury Sliver,en,...,,,,,,,,,,
1,card,00006596-1166-4a79-8443-ca9f82e6db4e,8ae3562f-28b7-4462-96ed-be0cf7052ccc,[189637],34586.0,34587.0,33347.0,21851.0,Kor Outfitter,en,...,,,,,,,,,,
2,card,0000a54c-a511-4925-92dc-01b937f9afad,dc4e2134-f0c2-49aa-9ea3-ebf83af1445c,[],,,98659.0,,Spirit,en,...,,,,,,,,,,
3,card,0000cd57-91fe-411f-b798-646e965eec37,9f0d82ae-38bf-45d8-8cda-982b6ead1d72,[435231],65170.0,65171.0,145764.0,301766.0,Siren Lookout,en,...,,,,,,,,,,
4,card,00012bd8-ed68-4978-a22d-f450c8a6e048,5aa12aff-db3c-4be5-822b-3afdf536b33e,[1278],,,1623.0,5664.0,Web,en,...,,,,,,,,,,


In [4]:
magic_df.columns

Index(['object', 'id', 'oracle_id', 'multiverse_ids', 'mtgo_id',
       'mtgo_foil_id', 'tcgplayer_id', 'cardmarket_id', 'name', 'lang',
       'released_at', 'uri', 'scryfall_uri', 'layout', 'highres_image',
       'image_status', 'image_uris', 'mana_cost', 'cmc', 'type_line',
       'oracle_text', 'power', 'toughness', 'colors', 'color_identity',
       'keywords', 'legalities', 'games', 'reserved', 'foil', 'nonfoil',
       'finishes', 'oversized', 'promo', 'reprint', 'variation', 'set_id',
       'set', 'set_name', 'set_type', 'set_uri', 'set_search_uri',
       'scryfall_set_uri', 'rulings_uri', 'prints_search_uri',
       'collector_number', 'digital', 'rarity', 'flavor_text', 'card_back_id',
       'artist', 'artist_ids', 'illustration_id', 'border_color', 'frame',
       'full_art', 'textless', 'booster', 'story_spotlight', 'edhrec_rank',
       'penny_rank', 'prices', 'related_uris', 'all_parts', 'promo_types',
       'arena_id', 'preview', 'security_stamp', 'produced_mana', '

Every row in the dataframe corresponds to a [Magic: The Gathering card](https://mtg.fandom.com/wiki/Card) which is in turn represented by a [card object](https://scryfall.com/docs/api/cards) in the Scryfall API.  
As can be seen, cards have a lot of properties, 83 to be exact! But don't worry we won't need all 83 of them, 
so let's get familiar with the properties and cut them down to a more overseeable count  
(for a better overview we seperate the attributes into gameplay related and non-gameplay related):

## Attributes / properties to keep:

#### Non-Gameplay related attributes:

+ **lang**: The language the card was printed in

#### Gameplay related attributes:

+ **name**: The name of the card
+ **mana_cost**: Denotes how much mana of which color has to be paid to play with card
+ **cmc**: Short for "converted mana cost", the total amount of mana that has to be paid to play this card (sum of mana_cost)
+ **type_line**: The type line of this card e.g. "Creature", "Land", "Instant", etc.
+ **oracle_text**: Rulestext that applies to this card
+ **power**: The card's power, if it is a creature card
+ **toughness**: The cards's thoughness, if it is a creature card
+ **color**: ---
+ **color_identity**: The color(s) this card can be attributed to
+ **keywords**: Any keywords the card may have
+ **legalities**: Magic: The Gathering formats in which this card is legal to play
+ **set**: Abbreviation of the set the card is part of 
+ **set_name**: Long-form set name
+ **rarity**: Common, Uncommon, Rare or Mythic
+ **artist**: ---
+ **loyalty**: Only with Planeswalker cards

(For more information have a look [here](https://mtg.fandom.com/wiki/Parts_of_a_card).)

## Attributes / properties to discard:

In [5]:
properties_to_drop = [
    'multiverse_ids', 'mtgo_id',
    'mtgo_foil_id', 'tcgplayer_id', 'cardmarket_id',
    'uri', 'scryfall_uri', 'highres_image',
    'image_status', 'image_uris',
    'set_uri', 'set_search_uri',
    'scryfall_set_uri', 'rulings_uri', 'prints_search_uri',
    'collector_number', 'digital', 'card_back_id',
    'artist_ids', 'illustration_id',
    'story_spotlight', 'edhrec_rank',
    'penny_rank','related_uris', 'all_parts', 'promo_types',
    'arena_id', 'preview', 'security_stamp', 'produced_mana', 'watermark',
    'frame_effects', 'printed_name', 'card_faces',
    'tcgplayer_etched_id', 'color_indicator', 'life_modifier',
    'hand_modifier', 'printed_type_line', 'printed_text', 'content_warning',
    'flavor_name', 'variation_of',  # We won't need any of the above properties
    'layout', 'flavor_text', 'oversized',
    'frame', 'full_art', 'object', 'set_id', 'textless',
    'booster', 'prices', 'oracle_id', 'id',
    'released_at', 'promo', 'reprint', 'variation', 'set_type',
    'games', 'reserved', 'border_color',
    'foil', 'nonfoil', 'finishes',
]

In [6]:
magic_basic_df = magic_df.drop(labels=properties_to_drop, axis=1)

magic_basic_df.head()

Unnamed: 0,name,lang,mana_cost,cmc,type_line,oracle_text,power,toughness,colors,color_identity,keywords,legalities,set,set_name,rarity,artist,loyalty,attraction_lights
0,Fury Sliver,en,{5}{R},6.0,Creature — Sliver,All Sliver creatures have double strike.,3.0,3.0,[R],[R],[],"{'standard': 'not_legal', 'future': 'not_legal...",tsp,Time Spiral,uncommon,Paolo Parente,,
1,Kor Outfitter,en,{W}{W},2.0,Creature — Kor Soldier,"When Kor Outfitter enters the battlefield, you...",2.0,2.0,[W],[W],[],"{'standard': 'not_legal', 'future': 'not_legal...",zen,Zendikar,common,Kieran Yanner,,
2,Spirit,en,,0.0,Token Creature — Spirit,Flying,1.0,1.0,[W],[W],[Flying],"{'standard': 'not_legal', 'future': 'not_legal...",tmm2,Modern Masters 2015 Tokens,common,Mike Sass,,
3,Siren Lookout,en,{2}{U},3.0,Creature — Siren Pirate,Flying\nWhen Siren Lookout enters the battlefi...,1.0,2.0,[U],[U],"[Flying, Explore]","{'standard': 'not_legal', 'future': 'not_legal...",xln,Ixalan,common,Chris Rallis,,
4,Web,en,{G},1.0,Enchantment — Aura,Enchant creature (Target a creature as you cas...,,,[G],[G],[Enchant],"{'standard': 'not_legal', 'future': 'not_legal...",3ed,Revised Edition,rare,Rob Alexander,,


With the properties cleaned up and reduced to just the most basic ones we have a much better overview over what we are working with.  