## Load data

Due to the OGL, the data we're considering is free to use (provided we follow the license). Although the data was originally published in PDF format, we can take advantage of the work of two others:
* Redditor [droiddruid](https://www.reddit.com/user/droiddruid), who published a list of [monsters from the 5e SRD](https://dl.dropboxusercontent.com/s/iwz112i0bxp2n4a/5e-SRD-Monsters.json).
* GitHub user [vorpalhex](https://github.com/vorpalhex), who extracted a list of [spells from the 5e SRD](https://github.com/vorpalhex/srd_spells).

As both of the above are JSON files, these secondary sources are considerably easier to work with.

**Remark:** This notebook is essentially a heavily commented version of the module `etl`.

In [49]:
import json
import numpy as np
import os
import pandas as pd
import re
import requests

%matplotlib inline

We'll use the following helper function to simplify loading our data sources.

In [2]:
def load_data(filepath, url):
    # Load data from either local file or remote copy. If the local file does not exist,
    # create a local copy of the data.
    if os.path.exists(filepath):
        with open(filepath, 'r') as f:
            data = json.load(f)
    else:
        response = requests.get(url)
        data = json.loads(response.text)
        with open(filepath, 'w') as f:
            json.dump(data, f)
    return data

### Load monster data

In [3]:
def load_monsters():
    # Load monster data.
    filepath = 'data/5e-SRD-Monsters.json'
    url = 'https://dl.dropboxusercontent.com/s/iwz112i0bxp2n4a/5e-SRD-Monsters.json'
    data = load_data(filepath, url)
    monsters = data[:-1]
    ogl = data[-1]
    return monsters, ogl

In [28]:
monsters, ogl = load_monsters()

The json file contains data for 325 creatures (called stat blocks), as well as the OGL licence under which they are released. Each stat block is a dictionary; in order to tidy our data we'll work with these dictionaries.

### Load spell data

In [5]:
def load_spells():
    # Load spell data.
    filepath = 'data/5e-SRD-spells.json'
    url = 'https://raw.githubusercontent.com/vorpalhex/srd_spells/master/spells.json'
    return load_data(filepath, url)

In [6]:
spells = load_spells()

## Cleaning and tidying data

### Cleaning the spell list
We first consider the list of `spells`. Introspection shows that `spells` is a list of dictionaries, which we'll load into a Pandas dataframe.

In [74]:
spells_df = pd.DataFrame(spells)
spells_df.set_index('name', inplace=True)
spells_df.head()

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Acid Splash,1 action,"[sorcerer, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You hurl a bubble of acid. Choose one creature...,Instantaneous,,cantrip,60 feet,False,Conjuration,"[sorcerer, wizard, cantrip]",Conjuration cantrip
Alarm,1 action,"[ranger, wizard]","{'material': True, 'materials_needed': ['a tin...",You set an alarm against unwanted intrusion. C...,8 hours,,1,30 feet,True,abjuration,"[ranger, wizard, level1]",1st-level abjuration (ritual)
Animal Friendship,1 action,"[bard, druid, ranger]","{'material': True, 'materials_needed': ['a mor...",This spell lets you convince a beast that you ...,24 hours,When you cast this spell using a spell slot of...,1,30 feet,False,enchantment,"[bard, druid, ranger, level1]",1st-level enchantment
Bane,1 action,"[bard, cleric]","{'material': True, 'materials_needed': ['a dro...",Up to three creatures of your choice that you ...,"Concentration, up to 1 minute",When you cast this spell using a spell slot of...,1,30 feet,False,enchantment,"[bard, cleric, level1]",1st-level enchantment
Blade Ward,1 action,"[bard, sorcerer, warlock, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You extend your hand and trace a sigil of ward...,1 Round,,cantrip,Self,False,Abjuration,"[bard, sorcerer, warlock, wizard, cantrip]",Abjuration cantrip


In order to clean and tidy this dataframe, we'll work column by column.

#### `casting_time`

In [75]:
spells_df.groupby('casting_time').count()['classes']

casting_time
1 action                                                                                                         291
1 action or 8 hours                                                                                                1
1 bonus action                                                                                                    27
1 hour                                                                                                            10
1 minue                                                                                                            1
1 minute                                                                                                          26
1 minutes                                                                                                          1
1 reaction, which you take in response to being damaged by a creature within 60 feet of you that you can see.      1
1 reaction, which you take when you are hit by an a

Notice that there are some minor typos, which we'll now correct.

In [76]:
spells_df[spells_df.casting_time.str.match('1 minue') | spells_df.casting_time.str.match('1 minutes')]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Augury,1 minue,[cleric],"{'material': True, 'materials_needed': ['speci...","* *Weal*, for good results\n\n* *Woe*, for bad...",Instantaneous,,2,Self,True,divination,"[cleric, level2]",2nd-level divination (ritual)
Find the Path,1 minutes,"[bard, cleric, druid]","{'material': True, 'materials_needed': ['a set...","This spell allows you to find the shortest, mo...","Concentration, up to 1 day",,6,Self,False,divination,"[bard, cleric, druid, level6]",6th-level divination


In [77]:
spells_df.loc[['Augury', 'Find the Path'], 'casting_time'] = '1 minutes'

In [78]:
spells_df.loc[['Augury', 'Find the Path'], 'casting_time']

name
Augury           1 minutes
Find the Path    1 minutes
Name: casting_time, dtype: object

#### `classes`

Each entry is a list of 'player classes', and indicates whether a spell can be cast by a member of that class. We could encode these lists as a collection of columns, but since we'll be ignoring this column, we'll leave it as it is.

In [97]:
classes = set()
for classlist in spells_df.classes:
    classes.update(classlist)
print(classes)

{'cleric', 'ranger', 'warlock', 'druid', 'sorcerer', 'wizard', 'bard', 'paladin'}


In [100]:
assert all(spells_df.classes.apply(type) == list) # every entry is a list

In [102]:
assert len(spells_df.classes[spells_df.classes.apply(len) == 0]) == 0 # every spell can be cast by some class

#### `components`

Each entry should be a dictionary, with either three or four keys. We can normalize this without too much work.

In [111]:
spells_df[spells_df.components.isnull()]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Clone,1 hour,[wizard],,This spell grows an inert duplicate of a livin...,Instantaneous,,8,Touch,False,necromancy,"[wizard, level8]",8th-level necromancy


In [124]:
spells_df[spells_df.components.notnull()]['components'].apply(lambda x: frozenset(x.keys())).unique()

array([frozenset({'material', 'raw', 'somatic', 'verbal'}),
       frozenset({'raw', 'material', 'verbal', 'materials_needed', 'somatic'})], dtype=object)

#### `description`

This field contains the raw text description of a spell's effects as a str.

In [126]:
spells_df[spells_df.description.isnull()]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Aura of Purity,1 action,[paladin],"{'material': False, 'raw': 'V', 'somatic': Fal...",,"Concentration, up to 10 minutes",,4,Self (30-foot radius),False,abjuration,"[paladin, level4]",4th-level abjuration
Elemental Weapon,1 action,[paladin],"{'material': False, 'raw': 'V, S', 'somatic': ...",,"Concentration, up to 1 hour",When you cast this spell using a spell slot of...,3,Touch,False,transmutation,"[paladin, level3]",3rd-level transmutation
Protection from Energy,1 action,"[cleric, druid, ranger, sorcerer, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",,"Concentration, up to 1 minute",,3,Touch,False,abjuration,"[cleric, druid, ranger, sorcerer, wizard, level3]",3rd-level abjuration
Enhance Ability,1 action,"[bard, cleric, druid, sorcerer]","{'material': True, 'materials_needed': ['fur o...",,"Concentration, up to 1 hour",When you cast this spell using a spell slot of...,2,Touch,False,transmutation,"[bard, cleric, druid, sorcerer, level2]",2nd-level transmutation
Elemental Bane,1 action,"[druid, warlock, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",,"Concentration, up to 1 minute",When you cast this spell using a spell slot of...,4,90 feet,False,transmutation,"[druid, warlock, wizard, level4]",4th-level transmutation


In [116]:
assert all(spells_df[spells_df.description.notnull()].description.apply(type)==str)

In [117]:
assert all(spells_df[spells_df.description.notnull()].description.apply(len) > 0)

#### `duration`

There is some inconsistencies in formatting here, which we should correct.

In [130]:
spells_df.groupby('duration').count().type

duration
1 Minute                                 1
1 Round                                  1
1 day                                    1
1 hour                                  23
1 hours                                  1
1 minute                                15
1 round                                 10
10 days                                  5
10 minutes                               7
24 hours                                 9
30 days                                  1
7 days                                   1
8 hours                                 13
Concentration, up to 1 day               2
Concentration, up to 1 hour             33
Concentration, up to 1 minute           83
Concentration, up to 1 minute.           1
Concentration, up to 1 round             1
Concentration, up to 10 minutes         41
Concentration, up to 10 minutes.         1
Concentration, up to 2 hours             1
Concentration, up to 24 hours            1
Concentration, up to 6 rounds            1
Co

#### `higher_levels`

This will require some extra processing, since a good number of spells have stronger versions available.

In [134]:
spells_df[spells_df.higher_levels.notnull()].shape

(107, 12)

#### `level`

The level of the spell. We'll use some of this data in our training.

In [135]:
spells_df.groupby('level').count().type

level
1          63
2          62
3          53
4          34
5          44
6          35
7          19
8          17
9          16
cantrip    35
wind        1
Name: type, dtype: int64

What.

In [136]:
spells_df[spells_df.level=='wind']

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Control Weather,10 minutes,"[cleric, druid, wizard]","{'material': True, 'materials_needed': ['burni...",You take control of the weather within 5 miles...,"Concentration, up to 8 hours",,wind,Self (5-mile radius),False,,"[cleric, druid, wizard, level8]",Wind


In [137]:
spells_df.loc['Control Weather', 'level'] = '8'

For the rest, which are all cantrips, we'll cast `level` column as a numeric value, then replace `NaN`s with 0.

In [138]:
spells_df['level'] = pd.to_numeric(spells_df['level'], errors='coerce')
spells_df['level'].fillna(0, inplace=True)

#### `range`

Some weird typos which need to be cleaned.

In [145]:
spells_df.groupby('range').count().type

range
1 mile                    2
10 feet                  12
100 feet                  2
120 feet                 36
150 feet                 15
30 feet                  48
300 feet                  7
5 feet                    2
500 feet                  3
500 miles                 1
60 feet                  67
90 feet                  17
Self                     67
Self (10-foot radius)     2
Self (10-foot-radius)     1
Self (100-foot line)      1
Self (15-foot cone)       2
Self (15-foot cube)       1
Self (15-foot radius)     1
Self (30-foot cone)       1
Self (30-foot radius      1
Self (30-foot radius)     5
Self (5-foot radius)      1
Self (5-mile radius)      1
Self (60-foot cone)       3
Self (60-foot line)       2
Sight                     4
Special                   1
Touch                    71
Unlimited                 2
Name: type, dtype: int64

In [148]:
spells_df[spells_df.range.str.match('self') |
          spells_df.range.str.match('touch') |
          spells_df.range.str.match('Self \(30-foot radius$') |
          spells_df.range.str.contains('sphere') |
          spells_df.range.str.contains('1OO')]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Speak with Plants,1 action,"[bard, druid, ranger]","{'material': False, 'raw': 'V, S', 'somatic': ...",You imbue plants within 30 feet of you with li...,10 minutes,,3.0,Self (30-foot radius,False,transmutation,"[bard, druid, ranger, level3]",3rd-level transmutation


In [149]:
spells_df.loc[['Blur', 'Branding Smite', 'Detect Thoughts'], 'range'] = 'Self'
spells_df.loc['Beast Sense', 'range'] = 'Touch'
spells_df.loc['Speak with Plants', 'range'] = 'Self (30-foot radius)'
spells_df.loc['Antimagic Field', 'range'] = 'Self (10-foot-radius)'
spells_df.loc['Lightning Bolt', 'range'] = 'Self (100-foot line)'

#### `ritual`

Looks clean. Everything is a bool.

In [153]:
assert any(spells_df.ritual.isnull()) == False

#### `school`

Again, some inconsistent formating, and one particular oddity.

In [173]:
spells_df[spells_df.school.isnull()]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1


In [167]:
spells_df.loc['Control Weather', 'school'] = 'transmutation'

In [168]:
spells_df.school = spells_df.school.str.lower()

In [172]:
spells_df.groupby('school').count().type

school
abjuration       45
conjuration      59
divination       31
enchantment      31
evocation        76
illusion         27
level             1
necromancy       26
transmutation    83
Name: type, dtype: int64

In [170]:
spells_df[spells_df.school.str.contains('transmuation')]

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Barkskin,1 action,"[druid, ranger]","{'material': True, 'materials_needed': ['a han...",You touch a willing creature. Until the spell ...,"Concentration, up to 1 hour",,2.0,Touch,False,transmuation,"[druid, ranger, level2]",2nd-level transmuation


In [171]:
spells_df.loc['Barkskin', 'school'] = 'transmutation'

In [176]:
spells_df[spells_df.school.str.contains('level')] # not sure if this should be here???

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Warding Wind,1 action,"[bard, druid, sorcerer]","{'material': False, 'raw': 'V', 'somatic': Fal...",A strong wind (20 miles per hour) blows around...,"Concentration, up to 10 minutes",,2.0,Self,False,level,"[bard, druid, sorcerer, level2]",2nd level evocation


In [177]:
spells_df.loc['Warding Wind', 'school'] = 'evocation'

#### `tags`

This column mostly repeats information in other columns. (cleric domains info?)

In [183]:
# spells_df.drop('tags', inplace=True, axis=1)

#### `type`

Another column which duplicates information found elsewhere.

In [184]:
spells_df.drop('type', inplace=True, axis=1)

Some data cleaning is necessary. (point out `NaN` values, and other issues)

In [8]:
spells_df = spells_df.fillna('')

In [12]:
spells_df.head()

Unnamed: 0,casting_time,classes,components,description,duration,higher_levels,level,name,range,ritual,school,tags,type
0,1 action,"[sorcerer, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You hurl a bubble of acid. Choose one creature...,Instantaneous,,0.0,Acid Splash,60 feet,False,Conjuration,"[sorcerer, wizard, cantrip]",Conjuration cantrip
1,1 action,"[ranger, wizard]","{'material': True, 'materials_needed': ['a tin...",You set an alarm against unwanted intrusion. C...,8 hours,,1.0,Alarm,30 feet,True,abjuration,"[ranger, wizard, level1]",1st-level abjuration (ritual)
2,1 action,"[bard, druid, ranger]","{'material': True, 'materials_needed': ['a mor...",This spell lets you convince a beast that you ...,24 hours,When you cast this spell using a spell slot of...,1.0,Animal Friendship,30 feet,False,enchantment,"[bard, druid, ranger, level1]",1st-level enchantment
3,1 action,"[bard, cleric]","{'material': True, 'materials_needed': ['a dro...",Up to three creatures of your choice that you ...,"Concentration, up to 1 minute",When you cast this spell using a spell slot of...,1.0,Bane,30 feet,False,enchantment,"[bard, cleric, level1]",1st-level enchantment
4,1 action,"[bard, sorcerer, warlock, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You extend your hand and trace a sigil of ward...,1 Round,,0.0,Blade Ward,Self,False,Abjuration,"[bard, sorcerer, warlock, wizard, cantrip]",Abjuration cantrip


### Cleaning the monster list

We next focus on the `monsters`.

In [13]:
type(monsters), len(monsters)

(list, 325)

In [14]:
def get_monster_df(monsters):
    df = pd.DataFrame(monsters)
    df = df.set_index(['name'])
    df = fix_saves(df)
    df = fix_skills(df)
    df.challenge_rating = df.challenge_rating.apply(fix_challenge_rating)
    df = df.reindex(columns=_column_order)
    columns_with_nan = df.columns[df.isnull().apply(any, axis=0)]
    for column in columns_with_nan:
        replace_nan(df, column, list)
    return df


def fix_saves(df):
    mods = [stat + '_mod' for stat in _stats]
    saves = [stat + '_save' for stat in _stats]
    for stat, mod in zip(_stats, mods):
        df[mod] = np.floor((df[stat] - 10) / 2)
    for mod, save in zip(mods, saves):
        df[save].fillna(df[mod], inplace=True)
    return df


def fix_skills(df):
    for skill, stat in _skills_stats.items():
        df[skill].fillna(df[stat+'_mod'], inplace=True)
    return df


def fix_challenge_rating(cr):
    pattern = re.compile(r'(?P<p>\d)/(?P<q>\d)$|(?P<n>\d+)')
    g = re.match(pattern, cr)
    try:
        x = int(g.group('p')) / int(g.group('q'))
    except:
        x = int(g.group('n'))
    return x


def replace_nan(df, column, func):
    for x in df.loc[df[column].isnull(), column].index:
        df.at[x, column] = func()

We'll also need the following constants, which were extracted directly from the aggregated stat blocks.

In [15]:
_stats = ['strength', 'dexterity', 'constitution', 'intelligence', 'wisdom',
          'charisma']

_mechanics = ['challenge_rating', 'armor_class', 'hit_dice', 'hit_points',
              'condition_immunities', 'damage_immunities',
              'damage_resistances', 'damage_vulnerabilities', 'actions',
              'reactions', 'legendary_actions', 'special_abilities', 'size',
              'speed', 'senses']

_flavor = ['languages', 'subtype', 'type', 'alignment']

_stat_scores = ['strength', 'strength_mod', 'strength_save', 'dexterity',
                'dexterity_mod', 'dexterity_save', 'constitution',
                'constitution_mod', 'constitution_save', 'intelligence',
                'intelligence_mod', 'intelligence_save', 'wisdom',
                'wisdom_mod', 'wisdom_save', 'charisma', 'charisma_mod',
                'charisma_save']

_skills = ['acrobatics', 'arcana', 'athletics', 'deception', 'history',
           'insight', 'intimidation', 'investigation', 'medicine', 'nature',
           'perception', 'performance',  'persuasion', 'religion', 'stealth',
           'survival']

_column_order = _mechanics + _flavor + _stat_scores + _skills

_skills_stats = {'acrobatics': 'dexterity',
                 'arcana': 'intelligence',
                 'athletics': 'strength',
                 'deception': 'charisma',
                 'history': 'intelligence',
                 'insight': 'wisdom',
                 'intimidation': 'charisma',
                 'investigation': 'intelligence',
                 'medicine': 'wisdom',
                 'nature': 'intelligence',
                 'perception': 'wisdom',
                 'performance': 'charisma',
                 'persuasion': 'charisma',
                 'religion': 'intelligence',
                 'stealth': 'dexterity',
                 'survival': 'wisdom'}


In [16]:
monster_df = get_monster_df(monsters)

In [17]:
monster_df.head()

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aboleth,10.0,17,18d10,135,,,,,"[{'name': 'Multiattack', 'desc': 'The aboleth ...",[],...,4.0,4.0,2.0,4.0,10.0,4.0,4.0,4.0,-1.0,2.0
Acolyte,0.25,10,2d8,9,,,,,"[{'name': 'Club', 'desc': 'Melee Weapon Attack...",[],...,0.0,0.0,4.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0
Adult Black Dragon,14.0,19,17d12,195,,acid,,,"[{'name': 'Multiattack', 'desc': 'The dragon c...",[],...,3.0,2.0,1.0,2.0,11.0,3.0,3.0,2.0,7.0,1.0
Adult Blue Dracolich,17.0,19,18d12,225,"charmed, exhaustion, frightened, paralyzed, po...","lightning, poison",necrotic,,"[{'name': 'Multiattack', 'desc': 'The dracolic...",[],...,4.0,3.0,2.0,3.0,12.0,4.0,4.0,3.0,0.0,2.0
Adult Blue Dragon,16.0,19,18d12,225,,lightning,,,"[{'name': 'Multiattack', 'desc': 'The dragon c...",[],...,4.0,3.0,2.0,3.0,12.0,4.0,4.0,3.0,5.0,2.0


## Dealing with nested data

A number of the columns (`actions`, `reactions`, et cetera) in the above data frame contain lists of values: it will be easier to analyze this data if it were reformated. 

### Actions

In [18]:
actions_keys = {tuple(action.keys()) for actions in monster_df.actions for action in actions}
actions_keys

{('name', 'desc', 'attack_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_dice'),
 ('name', 'desc', 'attack_bonus', 'damage_dice', 'damage_bonus')}

In [19]:
def make_action_df(x):
    df = pd.DataFrame(x.iloc[0], columns=['name', 'desc', 'attack_bonus', 'damage_dice', 'damage_bonus'])
    df = df.rename(index=str, columns={'name':'action'})
    df = df.set_index(['action'])
    return df

actions_df = monster_df.actions.groupby('name').apply(make_action_df)

In [20]:
actions_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice,damage_bonus
name,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aboleth,Multiattack,The aboleth makes three tentacle attacks.,0,,
Aboleth,Tentacle,"Melee Weapon Attack: +9 to hit, reach 10 ft., ...",9,2d6,5.0
Aboleth,Tail,"Melee Weapon Attack: +9 to hit, reach 10 ft. o...",9,3d6,5.0
Aboleth,Enslave (3/day),The aboleth targets one creature it can see wi...,0,,
Acolyte,Club,"Melee Weapon Attack: +2 to hit, reach 5 ft., o...",2,1d4,
Adult Black Dragon,Multiattack,The dragon can use its Frightful Presence. It ...,0,,
Adult Black Dragon,Bite,"Melee Weapon Attack: +11 to hit, reach 10 ft.,...",11,2d10 + 1d8,6.0
Adult Black Dragon,Claw,"Melee Weapon Attack: +11 to hit, reach 5 ft., ...",11,2d6,6.0
Adult Black Dragon,Tail,"Melee Weapon Attack: +11 to hit, reach 15 ft.,...",11,2d8,6.0
Adult Black Dragon,Frightful Presence,Each creature of the dragon's choice that is w...,0,,


### Special abilities

In [21]:
special_keys = {tuple(ability.keys()) for abilities in monster_df.special_abilities for ability in abilities}
special_keys

{('name', 'desc', 'attack_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_dice')}

In [22]:
def make_special_abilities_df(x):
    df = pd.DataFrame(x.iloc[0], columns=['name', 'desc', 'attack_bonus', 'damage_dice'])
    df = df.rename(index=str, columns={'name':'special_ability'})
    df = df.set_index(['special_ability'])
    return df

special_abilities_df = monster_df.special_abilities.groupby('name').apply(make_special_abilities_df)

In [23]:
special_abilities_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice
name,special_ability,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aboleth,Amphibious,The aboleth can breathe air and water.,0,
Aboleth,Mucous Cloud,"While underwater, the aboleth is surrounded by...",0,
Aboleth,Probing Telepathy,If a creature communicates telepathically with...,0,
Acolyte,Spellcasting,The acolyte is a 1st-level spellcaster. Its sp...,0,
Adult Black Dragon,Amphibious,The dragon can breathe air and water.,0,
Adult Black Dragon,Legendary Resistance (3/Day),"If the dragon fails a saving throw, it can cho...",0,
Adult Blue Dracolich,Legendary Resistance (3/Day),"If the dracolich fails a saving throw, it can ...",0,
Adult Blue Dracolich,Magic Resistance,The dracolich has advantage on saving throws a...,0,
Adult Blue Dragon,Legendary Resistance (3/Day),"If the dragon fails a saving throw, it can cho...",0,
Adult Brass Dragon,Legendary Resistance (3/Day),"If the dragon fails a saving throw, it can cho...",0,


In [24]:
from collections import Counter

abilities = Counter(special_abilities_df.reset_index().special_ability)
abilities.most_common(15)

[('Magic Resistance', 32),
 ('Amphibious', 30),
 ('Legendary Resistance (3/Day)', 24),
 ('Innate Spellcasting', 20),
 ('Keen Smell', 19),
 ('Pack Tactics', 16),
 ('False Appearance', 15),
 ('Spider Climb', 13),
 ('Keen Hearing and Smell', 13),
 ('Spellcasting', 12),
 ('Magic Weapons', 12),
 ('Charge', 12),
 ('Shapechanger', 11),
 ('Swarm', 10),
 ('Water Breathing', 9)]

In [25]:
spellcasting_df = special_abilities_df[['Spellcasting' in x for x in special_abilities_df.index]]
spellcasting_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice
name,special_ability,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Acolyte,Spellcasting,The acolyte is a 1st-level spellcaster. Its sp...,0,
Androsphinx,Spellcasting,The sphinx is a 12th-level spellcaster. Its sp...,0,
Archmage,Spellcasting,The archmage is an 18th-level spellcaster. Its...,0,
Cult Fanatic,Spellcasting,The fanatic is a 4th-level spellcaster. Its sp...,0,
Druid,Spellcasting,The druid is a 4th-level spellcaster. Its spel...,0,


In [26]:
innate_spellcasting_df = special_abilities_df[['Innate Spellcasting' in x for x in special_abilities_df.index]]
innate_spellcasting_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice
name,special_ability,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cloud Giant,Innate Spellcasting,The giant's innate spellcasting ability is Cha...,0,
Couatl,Innate Spellcasting,The couatl's spellcasting ability is Charisma ...,0,
Deep Gnome (Svirfneblin),Innate Spellcasting,The gnome's innate spellcasting ability is Int...,0,
Deva,Innate Spellcasting,The deva's spellcasting ability is Charisma (s...,0,
Djinni,Innate Spellcasting,The djinni's innate spellcasting ability is Ch...,0,


In [27]:
from itertools import islice

def parse_spellcasting(desc):
    header, *levels = desc.splitlines()
    print(header)
    print()
    for level in islice(levels, 1, None):
        print(level)
    print()
    return pd.DataFrame([header, levels[1:]]) # TODO: need to fix this

spellcasting_df.desc.apply(parse_spellcasting)

The acolyte is a 1st-level spellcaster. Its spellcasting ability is Wisdom (spell save DC 12, +4 to hit with spell attacks). The acolyte has following cleric spells prepared:

• Cantrips (at will): light, sacred flame, thaumaturgy
• 1st level (3 slots): bless, cure wounds, sanctuary

The sphinx is a 12th-level spellcaster. Its spellcasting ability is Wisdom (spell save DC 18, +10 to hit with spell attacks). It requires no material components to cast its spells. The sphinx has the following cleric spells prepared:

• Cantrips (at will): sacred flame, spare the dying, thaumaturgy
• 1st level (4 slots): command, detect evil and good, detect magic
• 2nd level (3 slots): lesser restoration, zone of truth
• 3rd level (3 slots): dispel magic, tongues
• 4th level (3 slots): banishment, freedom of movement
• 5th level (2 slots): flame strike, greater restoration
• 6th level (1 slot): heroes' feast

The archmage is an 18th-level spellcaster. Its spellcasting ability is Intelligence (spell save D

name           special_ability
Acolyte        Spellcasting                                                     ...
Androsphinx    Spellcasting                                                     ...
Archmage       Spellcasting                                                     ...
Cult Fanatic   Spellcasting                                                     ...
Druid          Spellcasting                                                     ...
Guardian Naga  Spellcasting                                                     ...
Gynosphinx     Spellcasting                                                     ...
Lich           Spellcasting                                                     ...
Mage           Spellcasting                                                     ...
Mummy Lord     Spellcasting                                                     ...
Priest         Spellcasting                                                     ...
Spirit Naga    Spellcasting                  