## Load data

Due to the OGL, the data we're considering is free to use (provided we follow the license). Although the data was originally published in [PDF format](http://media.wizards.com/2016/downloads/DND/SRD-OGL_V5.1.pdf), we can take advantage of the work of two others:
* Redditor [droiddruid](https://www.reddit.com/user/droiddruid), who published a list of [monsters from the 5e SRD](https://dl.dropboxusercontent.com/s/iwz112i0bxp2n4a/5e-SRD-Monsters.json).
* GitHub user [vorpalhex](https://github.com/vorpalhex), who extracted a list of [spells from the 5e SRD](https://github.com/vorpalhex/srd_spells).

As both of the above are JSON files, these secondary sources are considerably easier to work with.

**Remark:** This notebook is essentially a heavily commented version of the module `etl`.

In [1]:
import json
import numpy as np
import os
import pandas as pd
import re
import requests

%matplotlib inline

We'll use the following helper function to simplify loading our data sources.

In [2]:
def load_data(filepath, url):
    # Load data from either local file or remote copy. If the local file does not exist,
    # create a local copy of the data.
    if os.path.exists(filepath):
        with open(filepath, 'r') as f:
            data = json.load(f)
    else:
        response = requests.get(url)
        data = json.loads(response.text)
        with open(filepath, 'w') as f:
            json.dump(data, f)
    return data

### Load monster data

In [3]:
def load_monsters():
    # Load monster data.
    filepath = 'data/5e-SRD-Monsters.json'
    url = 'https://dl.dropboxusercontent.com/s/iwz112i0bxp2n4a/5e-SRD-Monsters.json'
    data = load_data(filepath, url)
    monsters = data[:-1]
    ogl = data[-1]
    return monsters, ogl

In [4]:
monsters, ogl = load_monsters()

### Load spell data

In [5]:
def load_spells():
    # Load spell data.
    filepath = 'data/5e-SRD-spells.json'
    url = 'https://raw.githubusercontent.com/vorpalhex/srd_spells/master/spells.json'
    return load_data(filepath, url)

In [6]:
spells = load_spells()

## Cleaning and tidying data

[TODO]: Some remarks on Hadley Wickham's paper? Goal of this notebook.

### Cleaning the spell list
We first consider the list of `spells`. Introspection shows that `spells` is a list of dictionaries, which we'll load into a pandas dataframe.

In [7]:
spells_df = pd.DataFrame(spells)
spells_df = spells_df.set_index('name')
spells_df.head()

Unnamed: 0_level_0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Acid Splash,1 action,"[sorcerer, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You hurl a bubble of acid. Choose one creature...,Instantaneous,,cantrip,60 feet,False,Conjuration,"[sorcerer, wizard, cantrip]",Conjuration cantrip
Alarm,1 action,"[ranger, wizard]","{'material': True, 'materials_needed': ['a tin...",You set an alarm against unwanted intrusion. C...,8 hours,,1,30 feet,True,abjuration,"[ranger, wizard, level1]",1st-level abjuration (ritual)
Animal Friendship,1 action,"[bard, druid, ranger]","{'material': True, 'materials_needed': ['a mor...",This spell lets you convince a beast that you ...,24 hours,When you cast this spell using a spell slot of...,1,30 feet,False,enchantment,"[bard, druid, ranger, level1]",1st-level enchantment
Bane,1 action,"[bard, cleric]","{'material': True, 'materials_needed': ['a dro...",Up to three creatures of your choice that you ...,"Concentration, up to 1 minute",When you cast this spell using a spell slot of...,1,30 feet,False,enchantment,"[bard, cleric, level1]",1st-level enchantment
Blade Ward,1 action,"[bard, sorcerer, warlock, wizard]","{'material': False, 'raw': 'V, S', 'somatic': ...",You extend your hand and trace a sigil of ward...,1 Round,,cantrip,Self,False,Abjuration,"[bard, sorcerer, warlock, wizard, cantrip]",Abjuration cantrip


The spell list is unfortunatly both incomplete, and contains material not covered by the OGL. We'll first find the missing rows of the table.

In [8]:
from reference import (BARD_SPELLS, CLERIC_SPELLS, DRUID_SPELLS,
                       PALADIN_SPELLS, RANGER_SPELLS, SORCERER_SPELLS,
                       WARLOCK_SPELLS, WIZARD_SPELLS)

srd_spells = set()
srd_spells.update(BARD_SPELLS, CLERIC_SPELLS, DRUID_SPELLS,
                  PALADIN_SPELLS, RANGER_SPELLS, SORCERER_SPELLS,
                  WARLOCK_SPELLS, WIZARD_SPELLS)
srd_spells = srd_spells

One simple way to find our missing values (which might be misnamed) is to sort our list of spells and do a bisection search.

In [9]:
from bisect import bisect

missing = srd_spells - set(spells_df.index)

spells = sorted(spells_df.index)
for spell in missing:
    i = bisect(spells, spell)
    print(spell, spells[i-1:i+1])

Arcane Hand ['Arcane Gate', 'Arcane Lock']
Freezing Sphere ['Freedom of Movement', 'Friends']
Tiny Hut ['Time Stop', 'Tongues']
Private Sanctum ['Prismatic Wall', 'Produce Flame']
Hideous Laughter ['Hex', 'Hold Monster']
Floating Disk ['Flesh to Stone', 'Fly']
Resilient Sphere ['Remove Curse', 'Resistance']
Irresistible Dance ['Invisibility', 'Jump']
Black Tentacles ['Bestow Curse', 'Blade Barrier']
Pass without Trace ['Pass Without Trace', 'Passwall']
Faithful Hound ['Faerie Fire', 'False Life']
Magnificent Mansion ['Magic Weapon', 'Major Image']
Arcane Sword ['Arcane Lock', 'Astral Projection']
Instant Summons ['Insect Plague', 'Investiture of Flame']
Secret Chest ['Searing Smite', 'See Invisibility']
Detect Poison and Disease ['Detect Magic', 'Detect Poison or Disease']
Meld into Stone ['Meld Into Stone', 'Mending']
Telepathic Bond ['Telekinesis', 'Telepathy']
Acid Arrow ['Absorb Elements', 'Acid Splash']
Arcanist's Magic Aura ['Arcane Lock', 'Astral Projection']


In [10]:
rename = {'Pass Without Trace': 'Pass without Trace',
          'Detect Poison or Disease': 'Detect Poison and Disease',
          'Meld Into Stone': 'Meld into Stone'}

spells_df = spells_df.rename(index=rename)
missing -= {'Pass without Trace', 'Detect Poison and Disease', 'Meld into Stone'}

In order to add the missing data, we'll manually enter the data.

In [11]:
from reference import missing

missing_df = pd.DataFrame.from_dict(missing).transpose()
spells_df = pd.concat([spells_df, missing_df])
spells_df = spells_df.sort_index()

We now remove the non-SRD data, which was pulled from a variety of sources. 

In [12]:
not_in_srd = set(spells_df.index) - srd_spells
spells_df.drop(not_in_srd, inplace=True)

Now we'll work column by column to clean and tidy this data frame.

#### `casting_time`

We can summarize the various values of this column by using the following:

    spells_df.groupby('casting_time').count()['level']
    
Notice that there are some minor formatting inconsistencies and typos, which are easily corrected.

Furthermore, the data in this column form a partially ordered set.

In [13]:
# spells_df.groupby('casting_time').count()['level']

In [14]:
spells_df.casting_time = spells_df.casting_time.str.rstrip('.')

In [15]:
spells_df[spells_df.casting_time.str.match('1 minue') | spells_df.casting_time.str.match('1 minutes')]

Unnamed: 0,casting_time,classes,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Augury,1 minue,[cleric],"{'material': True, 'materials_needed': ['speci...","* *Weal*, for good results\n\n* *Woe*, for bad...",Instantaneous,,2,Self,True,divination,"[cleric, level2]",2nd-level divination (ritual)
Find the Path,1 minutes,"[bard, cleric, druid]","{'material': True, 'materials_needed': ['a set...","This spell allows you to find the shortest, mo...","Concentration, up to 1 day",,6,Self,False,divination,"[bard, cleric, druid, level6]",6th-level divination


In [16]:
spells_df.loc[['Augury', 'Find the Path'], 'casting_time'] = '1 minute'

In [17]:
spells_df['casting_time'] = spells_df.casting_time.astype('category')

#### `classes`

Each entry is a list of "player classes", and indicates whether a spell can be cast by a member of that class. We'll remove this data from the `spells_df` and collect it in a separate data frame.

In [18]:
classes = set()
for classlist in spells_df.classes:
    classes.update(classlist)
print(sorted(classes))

['bard', 'cleric', 'druid', 'paladin', 'ranger', 'sorcerer', 'warlock', 'wizard']


In [19]:
assert all(spells_df.classes.apply(type) == list) # every entry is a list

In [20]:
assert len(spells_df.classes[spells_df.classes.apply(len) == 0]) == 0 # every spell can be cast by some class

In [21]:
class_spells_df = pd.DataFrame({class_: spells_df['classes'].apply(lambda x: class_ in x) for class_ in classes})
spells_df.drop('classes', axis=1, inplace=True)

class_spells_df.head()

Unnamed: 0,bard,cleric,druid,paladin,ranger,sorcerer,warlock,wizard
Acid Arrow,False,False,False,False,False,False,False,True
Acid Splash,False,False,False,False,False,True,False,True
Aid,False,True,False,True,False,False,False,False
Alarm,False,False,False,False,True,False,False,True
Alter Self,False,False,False,False,False,True,False,True


[TODO] we can verify this data, and will do so in the `etl` module.

#### `components`

Each entry should be a dictionary, with either three or four keys, one of which is the raw text value. We'll normalize this by only keeping the raw value (from which the other values are derived).

In [22]:
spells_df[spells_df.components.isnull()]

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Clone,1 hour,,This spell grows an inert duplicate of a livin...,Instantaneous,,8,Touch,False,necromancy,"[wizard, level8]",8th-level necromancy


Note that one of the values is null. We'll replace that value with the text from the primary source document.

In [23]:
spells_df.loc['Clone', 'components'] = {'raw': 'V, S, M (a diamond worth at least 1,000 gp and at least 1 cubic inch of flesh of the creature that is to be cloned, which the spell consumes, and a vessel worth at least 2,000 gp that has a sealable lid and is large enough to hold a Medium creature, such as a huge urn, coffin, mud filled cyst in the ground, or crystal container filled with salt water)'}

In [24]:
spells_df['components'] = spells_df.components.apply(lambda x: x['raw'])

#### `description`

This field contains the raw text description of a spell's effects as a str. There are two spells which are lacking a description field.

In [25]:
spells_df[spells_df.description.isnull()]

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Enhance Ability,1 action,"V, S, M (fur or a feather from a beast)",,"Concentration, up to 1 hour",When you cast this spell using a spell slot of...,2,Touch,False,transmutation,"[bard, cleric, druid, sorcerer, level2]",2nd-level transmutation
Protection from Energy,1 action,"V, S",,"Concentration, up to 1 minute",,3,Touch,False,abjuration,"[cleric, druid, ranger, sorcerer, wizard, level3]",3rd-level abjuration


In [26]:
spells_df.loc['Protection from Energy', 'description'] = 'For the duration, the willing creature you touch has resistance to one damage type of your choice: acid, cold, fire, lightning, or thunder.'

In [27]:
spells_df.loc['Enhance Ability', 'description'] = """You touch a creature and bestow upon it a magical enhancement. Choose one of the following effects; the target gains that effect until the spell ends.
Bear’s Endurance. The target has advantage on Constitution checks. It also gains 2d6 temporary hit points, which are lost when the spell ends.
Bull’s Strength. The target has advantage on Strength checks, and his or her carrying capacity doubles.
Cat’s Grace. The target has advantage on Dexterity checks. It also doesn’t take damage from falling 20 feet or less if it isn’t incapacitated.
Eagle’s Splendor. The target has advantage on Charisma checks.
Fox’s Cunning. The target has advantage on Intelligence checks.
Owl’s Wisdom. The target has advantage on Wisdom checks."""

In [28]:
assert all(spells_df.description.apply(type)==str)

In [29]:
assert all(spells_df.description.apply(len) > 0)

#### `duration`

There is some inconsistencies in formatting here, which we should correct.

In [30]:
# spells_df.groupby('duration').count()['level']

In [31]:
spells_df.duration = spells_df.duration.str.rstrip('.')

In [32]:
spells_df.duration.loc[spells_df.duration.str.contains('1 Minute|1 Round|1 hours|one')]

Mirror Image                           1 Minute
Water Walk                              1 hours
Weird           Concentration, up to one minute
Name: duration, dtype: object

In [33]:
spells_df.loc['Weird', 'duration'] = 'Concentration, up to 1 minute'
spells_df.loc['Mirror Image', 'duration'] = '1 minute'
spells_df.loc['Water Walk', 'duration'] = '1 hour'

#### `higher_levels`

[TODO] This will require some extra processing, since a good number of spells have stronger versions available. However, for those that do not, we'll simply fill the `NaN` values with empty strings.

In [34]:
spells_df[spells_df.higher_levels.notnull()].shape

(88, 11)

In [35]:
spells_df.higher_levels.fillna('', inplace=True)

#### `level`

The level of the spell. We'll use some of this data in our training. Cantrips are the lowest level of spells, the rest vary between 1 and 9.

In [36]:
spells_df.groupby('level').count()[spells_df.columns[0]]

level
1           2
2           2
3           1
4           5
5           2
6           3
7           2
1          47
2          52
3          41
4          26
5          35
6          28
7          18
8          15
9          15
cantrip    24
wind        1
Name: casting_time, dtype: int64

What. How is `wind` a level?

In [37]:
spells_df[spells_df.level=='wind']

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Control Weather,10 minutes,"V, S, M (burning incense and bits of earth and...",You take control of the weather within 5 miles...,"Concentration, up to 8 hours",,wind,Self (5-mile radius),False,,"[cleric, druid, wizard, level8]",Wind


In [38]:
spells_df.loc['Control Weather', 'level'] = '8'

For the rest, which are all cantrips, we'll cast `level` column as a numeric value, then replace `NaN`s with 0.

In [39]:
spells_df['level'] = pd.to_numeric(spells_df['level'], errors='coerce')
spells_df['level'].fillna(0, inplace=True)

#### `range`

This columns contains some minor typos and formatting issues which are easily cleaned.

In [40]:
# spells_df.groupby('range').count()['level']

In [41]:
spells_df[spells_df.range.str.match('self') |
          spells_df.range.str.match('touch') |
          spells_df.range.str.match('Self \(30-foot radius$') |
          spells_df.range.str.contains('sphere') |
          spells_df.range.str.contains('1OO')]

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Antimagic Field,1 action,"V, S, M (a pinch of powdered iron or iron fili...",A 10-foot-radius invisible sphere of antimagic...,"Concentration, up to 1 hour",,8.0,Self (10-foot-radius sphere),False,abjuration,"[cleric, wizard, level8]",8th-level abjuration
Blur,1 action,V,"Your body becomes blurred, shifting and waveri...","Concentration, up to 1 minute",,2.0,self,False,illusion,"[sorcerer, wizard, level2]",2nd-Level illusion
Branding Smite,1 action,V,The next time you hit a creature with a weapon...,"Concentration, up to 1 minute",When you cast this spell using a spell slot of...,2.0,self,False,evocation,"[paladin, level2]",2nd-Level evocation
Detect Thoughts,1 action,"V, S, M (a copper piece)","For the duration, you can read the thoughts of...","Concentration, up to 1 minute",,2.0,self,False,divination,"[bard, sorcerer, wizard, level2]",2nd-Level divination
Lightning Bolt,1 action,"V, S, M (a bit of fur and a rod of amber, crys...",A stroke of lightning forming a line 100 feet ...,Instantaneous,When you cast this spell using a spell slot of...,3.0,Self (1OO-foot line),False,evocation,"[sorcerer, wizard, level3]",3rd-level evocation
Speak with Plants,1 action,"V, S",You imbue plants within 30 feet of you with li...,10 minutes,,3.0,Self (30-foot radius,False,transmutation,"[bard, druid, ranger, level3]",3rd-level transmutation
Tiny Hut,1 minute,"V, S, M (a small crystal bead)",A 10 foot radius immobile dome of force spring...,8 hours,,3.0,Self (10-foot radius hemisphere),True,evocation,,


In [42]:
spells_df.loc[['Blur', 'Branding Smite', 'Detect Thoughts'], 'range'] = 'Self'
spells_df.loc['Speak with Plants', 'range'] = 'Self (30-foot radius)'
spells_df.loc['Antimagic Field', 'range'] = 'Self (10-foot radius)'
spells_df.loc['Lightning Bolt', 'range'] = 'Self (100-foot line)'

In [43]:
spells_df['range'] = spells_df.range.astype('category')

#### `ritual`

Looks clean. Everything is a bool.

In [44]:
assert all(spells_df.ritual.notnull())
assert all(spells_df.ritual.apply(type) == bool)

#### `school`

Again, some inconsistent formating, and one null value which needs to be corrected.

In [45]:
# spells_df.groupby('school').count()['level']

In [46]:
spells_df[spells_df.school.isnull()]

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Control Weather,10 minutes,"V, S, M (burning incense and bits of earth and...",You take control of the weather within 5 miles...,"Concentration, up to 8 hours",,8.0,Self (5-mile radius),False,,"[cleric, druid, wizard, level8]",Wind


In [47]:
spells_df.loc['Control Weather', 'school'] = 'transmutation'

In [48]:
spells_df.school = spells_df.school.str.lower()

In [49]:
spells_df[spells_df.school.str.contains('transmuation')]

Unnamed: 0,casting_time,components,description,duration,higher_levels,level,range,ritual,school,tags,type
Barkskin,1 action,"V, S, M (a handful of oak bark)",You touch a willing creature. Until the spell ...,"Concentration, up to 1 hour",,2.0,Touch,False,transmuation,"[druid, ranger, level2]",2nd-level transmuation


In [50]:
spells_df.loc['Barkskin', 'school'] = 'transmutation'

In [51]:
spells_df['school'] = spells_df.school.astype('category')

#### `tags` and `type`

Both of these columns mostly repeat information found in other columns.

In [52]:
spells_df.drop(['tags', 'type'], inplace=True, axis=1)

### Finalizing the `spells_df` and related data frames

We can joing the `spells_df` and `class_spells_df` data frames, if we're interested.

In [53]:
spells_df = pd.concat([spells_df, class_spells_df], keys=['spell_data', 'class_can_cast'], axis=1)
spells_df.head()

Unnamed: 0_level_0,spell_data,spell_data,spell_data,spell_data,spell_data,spell_data,spell_data,spell_data,spell_data,class_can_cast,class_can_cast,class_can_cast,class_can_cast,class_can_cast,class_can_cast,class_can_cast,class_can_cast
Unnamed: 0_level_1,casting_time,components,description,duration,higher_levels,level,range,ritual,school,bard,cleric,druid,paladin,ranger,sorcerer,warlock,wizard
Acid Arrow,1 action,"V, S, M (powdered rhubarb leaf and an adder's ...",A shimmering green arrow streaks toward a targ...,Instantaneous,At Higher Levels. When you cast this spell usi...,2.0,90 feet,False,evocation,False,False,False,False,False,False,False,True
Acid Splash,1 action,"V, S",You hurl a bubble of acid. Choose one creature...,Instantaneous,,0.0,60 feet,False,conjuration,False,False,False,False,False,True,False,True
Aid,1 action,"V, S, M (a tiny strip of white cloth)",Your spell bolsters your allies with toughness...,8 hours,When you cast this spell using a spell slot of...,2.0,30 feet,False,abjuration,False,True,False,True,False,False,False,False
Alarm,1 action,"V, S, M (a tiny bell and a piece of fine silve...",You set an alarm against unwanted intrusion. C...,8 hours,,1.0,30 feet,True,abjuration,False,False,False,False,True,False,False,True
Alter Self,1 action,"V, S",You assume a different form. When you cast the...,"Concentration, up to 1 hour",,2.0,Self,False,transmutation,False,False,False,False,False,True,False,True


### Cleaning the monster list

We next focus on the `monsters` dataset. The json file contains data for 325 creatures (called _stat blocks_), as well as the OGL licence under which they are released. Each stat block is a dictionary; in order to tidy our data we'll work with these dictionaries.

In [54]:
type(monsters), len(monsters)

(list, 325)

In [55]:
def get_monster_df(monsters):
    df = pd.DataFrame(monsters)
    df = df.set_index(['name'])
    df = fix_saves(df)
    df = fix_skills(df)
    df.challenge_rating = df.challenge_rating.apply(fix_challenge_rating)
    df = df.reindex(columns=_column_order)
    columns_with_nan = df.columns[df.isnull().apply(any, axis=0)]
    for column in columns_with_nan:
        replace_nan(df, column, list)
    return df


def fix_saves(df):
    mods = [stat + '_mod' for stat in _stats]
    saves = [stat + '_save' for stat in _stats]
    for stat, mod in zip(_stats, mods):
        df[mod] = np.floor((df[stat] - 10) / 2)
    for mod, save in zip(mods, saves):
        df[save].fillna(df[mod], inplace=True)
    return df


def fix_skills(df):
    for skill, stat in _skills_stats.items():
        df[skill].fillna(df[stat+'_mod'], inplace=True)
    return df


def fix_challenge_rating(cr):
    pattern = re.compile(r'(?P<p>\d)/(?P<q>\d)$|(?P<n>\d+)')
    g = re.match(pattern, cr)
    try:
        x = int(g.group('p')) / int(g.group('q'))
    except:
        x = int(g.group('n'))
    return x


def replace_nan(df, column, func):
    for x in df.loc[df[column].isnull(), column].index:
        df.at[x, column] = func()

We'll also need the following constants, which were (originally) extracted directly from the aggregated stat blocks, or created using domain knowledge.

In [56]:
_stats = ['strength', 'dexterity', 'constitution', 'intelligence', 'wisdom', 'charisma']

_mechanics = ['challenge_rating', 'armor_class', 'hit_dice', 'hit_points',
              'condition_immunities', 'damage_immunities',
              'damage_resistances', 'damage_vulnerabilities', 'actions',
              'reactions', 'legendary_actions', 'special_abilities', 'size',
              'speed', 'senses']

_flavor = ['languages', 'subtype', 'type', 'alignment']

_stat_scores = ['strength', 'strength_mod', 'strength_save', 'dexterity',
                'dexterity_mod', 'dexterity_save', 'constitution',
                'constitution_mod', 'constitution_save', 'intelligence',
                'intelligence_mod', 'intelligence_save', 'wisdom',
                'wisdom_mod', 'wisdom_save', 'charisma', 'charisma_mod',
                'charisma_save']

_skills = ['acrobatics', 'arcana', 'athletics', 'deception', 'history',
           'insight', 'intimidation', 'investigation', 'medicine', 'nature',
           'perception', 'performance',  'persuasion', 'religion', 'stealth',
           'survival']

_column_order = _mechanics + _flavor + _stat_scores + _skills

_skills_stats = {'acrobatics': 'dexterity',
                 'arcana': 'intelligence',
                 'athletics': 'strength',
                 'deception': 'charisma',
                 'history': 'intelligence',
                 'insight': 'wisdom',
                 'intimidation': 'charisma',
                 'investigation': 'intelligence',
                 'medicine': 'wisdom',
                 'nature': 'intelligence',
                 'perception': 'wisdom',
                 'performance': 'charisma',
                 'persuasion': 'charisma',
                 'religion': 'intelligence',
                 'stealth': 'dexterity',
                 'survival': 'wisdom'}


With that preparatory work completed, we can now load the `monster_df`, which contains the stat blocks for each monster.

In [57]:
monster_df = get_monster_df(monsters)

In [58]:
monster_df.head()

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aboleth,10.0,17,18d10,135,,,,,"[{'name': 'Multiattack', 'desc': 'The aboleth ...",[],...,4.0,4.0,2.0,4.0,10.0,4.0,4.0,4.0,-1.0,2.0
Acolyte,0.25,10,2d8,9,,,,,"[{'name': 'Club', 'desc': 'Melee Weapon Attack...",[],...,0.0,0.0,4.0,0.0,2.0,0.0,0.0,2.0,0.0,2.0
Adult Black Dragon,14.0,19,17d12,195,,acid,,,"[{'name': 'Multiattack', 'desc': 'The dragon c...",[],...,3.0,2.0,1.0,2.0,11.0,3.0,3.0,2.0,7.0,1.0
Adult Blue Dracolich,17.0,19,18d12,225,"charmed, exhaustion, frightened, paralyzed, po...","lightning, poison",necrotic,,"[{'name': 'Multiattack', 'desc': 'The dracolic...",[],...,4.0,3.0,2.0,3.0,12.0,4.0,4.0,3.0,0.0,2.0
Adult Blue Dragon,16.0,19,18d12,225,,lightning,,,"[{'name': 'Multiattack', 'desc': 'The dragon c...",[],...,4.0,3.0,2.0,3.0,12.0,4.0,4.0,3.0,5.0,2.0


## Dealing with nested data

A number of the columns (`actions`, `reactions`, et cetera) in the above data frame contain lists of values: it will be easier to analyze this data if it were reformated. In order to do so, we'll use a couple of different patterns:

In [59]:
def make_sub_df(x, columns=None, rename=None, index=None):
    df = pd.DataFrame(x.iloc[0], columns=columns)
    df = df.rename(index=str, columns=rename)
    df = df.set_index(index)
    return df

In [60]:
def get_index(df, column):
    index = set()
    for y in df[column]:
        index.update(y.split(', '))
    index.discard('')
    return index

def split_categorical(x, column='', index=None, splitter=None):
    a = x.iloc[0]
    if splitter:
        categories = splitter(a[column])
    else:
        categories = a[column].split(', ')
    return pd.DataFrame({category: category in categories for category in index},
                        dtype='bool',
                        index=[x.name])

def clean_index(df, fill=None):
    new_df = df.reset_index().set_index('name')
    new_df = new_df.drop('level_1', axis=1)
    return new_df.fillna(value=fill)

### Condition immunities

We first correct one minor issue:

In [61]:
monster_df[monster_df['damage_immunities'] == 'charmed']

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Doppelganger,3.0,14,8d8,52,,charmed,,,"[{'name': 'Multiattack', 'desc': 'The doppelga...",[],...,2.0,0.0,1.0,0.0,1.0,2.0,2.0,0.0,4.0,1.0


In [62]:
monster_df.loc['Doppelganger', 'condition_immunities'] = 'charmed'
monster_df.loc['Doppelganger', 'damage_immunities'] = ''

In [63]:
monster_df[monster_df['damage_resistances'] == 'damage from spells; non magical bludgeoning, piercing, and slashing (from stoneskin)']

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Archmage,12.0,12,18d8,99,,,"damage from spells; non magical bludgeoning, p...",,"[{'name': 'Dagger', 'desc': 'Melee or Ranged W...",[],...,3.0,5.0,2.0,5.0,2.0,3.0,3.0,5.0,2.0,2.0


We can now split the `condition_immunities` column.

In [64]:
index = get_index(monster_df, 'condition_immunities')
condition_immunities_df = monster_df.groupby('name').apply(split_categorical,
                                                           column='condition_immunities',
                                                           index=index)
condition_immunities_df = clean_index(condition_immunities_df, fill='False')

### Damage immunities, damage resistances, and damage vulnerabilities

Unfortunately, the previous pattern doesn't work as nicely for these columns. In order to deal with this, we'll first clean things up.

In [65]:
monster_df[monster_df['damage_resistances'].str.contains('damage')]

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Archmage,12.0,12,18d8,99,,,"damage from spells; non magical bludgeoning, p...",,"[{'name': 'Dagger', 'desc': 'Melee or Ranged W...",[],...,3.0,5.0,2.0,5.0,2.0,3.0,3.0,5.0,2.0,2.0
Grick,2.0,14,6d8,27,,,"bludgeoning, piercing, and slashing damage fro...",,"[{'name': 'Multiattack', 'desc': 'The grick ma...",[],...,-3.0,-4.0,2.0,-4.0,2.0,-3.0,-3.0,-4.0,2.0,2.0


In [66]:
monster_df[monster_df['damage_immunities'].str.contains('damage')]

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Werebear,5.0,10,18d8,135,,"bludgeoning, piercing, and slashing damage fro...",,,"[{'name': 'Multiattack', 'desc': 'In bear form...",[],...,1.0,0.0,1.0,0.0,7.0,1.0,1.0,0.0,0.0,1.0
Wereboar,4.0,10,12d8,78,,"bludgeoning, piercing, and slashing damage fro...",,,[{'name': 'Multiattack (Humanoid or Hybrid For...,[],...,-1.0,0.0,0.0,0.0,2.0,-1.0,-1.0,0.0,0.0,0.0
Wererat,2.0,12,6d8,33,,"bludgeoning, piercing, and slashing damage fro...",,,[{'name': 'Multiattack (Humanoid or Hybrid For...,[],...,-1.0,0.0,0.0,0.0,2.0,-1.0,-1.0,0.0,4.0,0.0
Weretiger,4.0,12,16d8,120,,"bludgeoning, piercing, and slashing damage fro...",,,[{'name': 'Multiattack (Humanoid or Hybrid For...,[],...,0.0,0.0,1.0,0.0,5.0,0.0,0.0,0.0,4.0,1.0
Werewolf,3.0,11,9d8,58,,"bludgeoning, piercing, and slashing damage fro...",,,[{'name': 'Multiattack (Humanoid or Hybrid For...,[],...,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,1.0,0.0


In [67]:
monster_df[monster_df.damage_resistances.str.contains("attacks that aren't silvered")]

Unnamed: 0_level_0,challenge_rating,armor_class,hit_dice,hit_points,condition_immunities,damage_immunities,damage_resistances,damage_vulnerabilities,actions,reactions,...,intimidation,investigation,medicine,nature,perception,performance,persuasion,religion,stealth,survival
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


In [68]:
monster_df['damage_resistances'] = monster_df.damage_resistances.str.replace('slashing damage', 'slashing')
monster_df['damage_immunities'] = monster_df.damage_immunities.str.replace('slashing damage', 'slashing')
monster_df.loc['Archmage', 'damage_resistances'] = \
    'damage from spells; bludgeoning, piercing, and slashing from nonmagical weapons'
monster_df.loc['Imp', 'damage_resistances'] = \
    "cold; bludgeoning, piercing, and slashing from nonmagical weapons that aren't silvered"

Now that our data is cleaned, we'll create a custom indexer and splitter.

In [69]:
damage_xs = set()
for x in monster_df['damage_immunities']:
    damage_xs.add(x)
for x in monster_df['damage_resistances']:
    damage_xs.add(x)
for x in monster_df['damage_vulnerabilities']:
    damage_xs.add(x)

def get_damage_types(x):
    damage_types = set()
    queue = [x]
    while queue:
        y = queue.pop()
        if '; ' in y:
            queue.extend(y.split('; '))
        elif ' and ' in y:
            if ', ' in y:
                z = y.split(', ')
            else:
                z = y.split(' ', maxsplit=1)
            w = z[-1].split()
            z[-1] = w[1]
            end = ' '.join(w[2:])
            queue.extend(' '.join([word, end]) for word in z)
        elif ', ' in y:
            queue.extend(y.split(', '))
        else:
            damage_types.add(y)
    return damage_types

damage_types = set()
for x in damage_xs:
    damage_types.update(get_damage_types(x))
damage_types.discard('')

We now create three data frames: `damage_resistances_df`, `damage_immunities_df`, and `damage_vulnerabilities_df`.

In [70]:
index = sorted(damage_types)

damage_resistances_df = monster_df.groupby('name').apply(split_categorical,
                                                         column='damage_resistances',
                                                         index=index,
                                                         splitter = get_damage_types)
damage_resistances_df = clean_index(damage_resistances_df, fill='False')

damage_immunities_df = monster_df.groupby('name').apply(split_categorical,
                                                        column='damage_immunities',
                                                        index=index,
                                                        splitter = get_damage_types)
damage_immunities_df = clean_index(damage_immunities_df, fill='False')

damage_vulnerabilities_df = monster_df.groupby('name').apply(split_categorical,
                                                             column='damage_vulnerabilities',
                                                             index=index,
                                                             splitter = get_damage_types)
damage_vulnerabilities_df = clean_index(damage_vulnerabilities_df, fill='False')

### Actions

First a little cleanup.

In [71]:
a = monster_df.loc['Bone Devil', 'actions'][::2]
monster_df = monster_df.set_value('Bone Devil', 'actions', a)

In [72]:
a = monster_df.loc['Vrock', 'actions'][:-1]
monster_df = monster_df.set_value('Vrock', 'actions', a)

In [73]:
actions_keys = {tuple(action.keys()) for actions in monster_df.actions for action in actions}
actions_keys

{('name', 'desc', 'attack_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_dice'),
 ('name', 'desc', 'attack_bonus', 'damage_dice', 'damage_bonus')}

In [74]:
columns = ['name', 'desc', 'attack_bonus', 'damage_dice', 'damage_bonus']
rename = {'name':'action'}
index = ['action']

actions_df = monster_df.actions.groupby('name').apply(make_sub_df, columns=columns, rename=rename, index=index)

In [75]:
# drop some non-OGL material

variant = [x[0] for x in actions_df.filter(like='Variant', axis='index').iterrows()]
variant_rows = [x[0] for x in actions_df.loc[variant].iterrows()]
actions_df.drop(variant_rows, inplace=True)

In [76]:
actions_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice,damage_bonus
name,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aboleth,Multiattack,The aboleth makes three tentacle attacks.,0,,
Aboleth,Tentacle,"Melee Weapon Attack: +9 to hit, reach 10 ft., ...",9,2d6,5.0
Aboleth,Tail,"Melee Weapon Attack: +9 to hit, reach 10 ft. o...",9,3d6,5.0
Aboleth,Enslave (3/day),The aboleth targets one creature it can see wi...,0,,
Acolyte,Club,"Melee Weapon Attack: +2 to hit, reach 5 ft., o...",2,1d4,
Adult Black Dragon,Multiattack,The dragon can use its Frightful Presence. It ...,0,,
Adult Black Dragon,Bite,"Melee Weapon Attack: +11 to hit, reach 10 ft.,...",11,2d10 + 1d8,6.0
Adult Black Dragon,Claw,"Melee Weapon Attack: +11 to hit, reach 5 ft., ...",11,2d6,6.0
Adult Black Dragon,Tail,"Melee Weapon Attack: +11 to hit, reach 15 ft.,...",11,2d8,6.0
Adult Black Dragon,Frightful Presence,Each creature of the dragon's choice that is w...,0,,


In [77]:
actions_df.attack_bonus = actions_df.attack_bonus.astype('int32')
actions_df = actions_df.fillna(value={'damage_dice': '', 'damage_bonus': 0.})
actions_df.damage_bonus = actions_df.damage_bonus.astype('int32')

In [78]:
actions_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice,damage_bonus
name,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aboleth,Multiattack,The aboleth makes three tentacle attacks.,0,,0
Aboleth,Tentacle,"Melee Weapon Attack: +9 to hit, reach 10 ft., ...",9,2d6,5
Aboleth,Tail,"Melee Weapon Attack: +9 to hit, reach 10 ft. o...",9,3d6,5
Aboleth,Enslave (3/day),The aboleth targets one creature it can see wi...,0,,0
Acolyte,Club,"Melee Weapon Attack: +2 to hit, reach 5 ft., o...",2,1d4,0


We also normalize a few entries.

In [79]:
actions_df.loc[('Purple Worm', 'Multiattack'), 'desc'] = \
    'The worm makes two attacks: one with its bite and one with its tail stinger.'
actions_df.loc[('Tarrasque', 'Multiattack'), 'desc'] = \
    'The tarrasque can use its Frightful Presence. It then makes five attacks: one with its bite, two with its claws, one with its horns, and one with its tail. It can use its Swallow instead of its bite.'

### Special abilities

We begin with three corrections.

In [80]:
desc = """The giant's innate spellcasting ability is Charisma (spell save DC 15). It can innately cast the following spells, requiring no material components:

At will: detect magic, fog cloud, light
3/day each: feather fall, fly, misty step, telekinesis
1/day each: control weather, gaseous form"""

monster_df.loc['Cloud Giant', 'special_abilities'][1]['desc'] = desc

In [81]:
desc = """The lamia's innate spellcasting ability is Charisma (spell save DC 13). It can innately cast the following spells, requiring no material components.

At will: disguise self (any humanoid form), major image
3/day each: charm person, mirror image, scrying, suggestion
1/day: geas"""

monster_df.loc['Lamia', 'special_abilities'][0]['desc'] = desc

In [82]:
desc = """The djinni's innate spellcasting ability is Charisma (spell save DC 17, +9 to hit with spell attacks). It can innately cast the following spells, requiring no material components:

At will: detect evil and good, detect magic, thunderwave
3/day each: create food and water (can create wine instead of water), tongues, wind walk
1/day each: conjure elemental (air elemental only), creation, gaseous form, invisibility, major image, plane shift"""

monster_df.loc['Djinni', 'special_abilities'][1]['desc'] = desc

Now that those corrections are out of the way, we apply a bit of introspection before making our sub-dataframe.

In [83]:
special_keys = {tuple(ability.keys()) for abilities in monster_df.special_abilities for ability in abilities}
special_keys

{('name', 'desc', 'attack_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_dice')}

In [84]:
columns = ['name', 'desc', 'attack_bonus', 'damage_dice']
rename = {'name':'special_ability'}
index = ['special_ability']

special_abilities_df = monster_df.special_abilities.groupby('name').apply(make_sub_df,
                                                                         columns=columns,
                                                                         rename=rename,
                                                                         index=index)

In [85]:
special_abilities_df.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice
name,special_ability,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aboleth,Amphibious,The aboleth can breathe air and water.,0,
Aboleth,Mucous Cloud,"While underwater, the aboleth is surrounded by...",0,
Aboleth,Probing Telepathy,If a creature communicates telepathically with...,0,
Acolyte,Spellcasting,The acolyte is a 1st-level spellcaster. Its sp...,0,
Adult Black Dragon,Amphibious,The dragon can breathe air and water.,0,


We analyize the most common ability names:

In [86]:
from collections import Counter

abilities = Counter(special_abilities_df.reset_index().special_ability)
abilities.most_common(15)

# special_abilities_df.xs('Magic Resistance', level='special_ability')

[('Magic Resistance', 32),
 ('Amphibious', 30),
 ('Legendary Resistance (3/Day)', 24),
 ('Innate Spellcasting', 20),
 ('Keen Smell', 19),
 ('Pack Tactics', 16),
 ('False Appearance', 15),
 ('Spider Climb', 13),
 ('Keen Hearing and Smell', 13),
 ('Spellcasting', 12),
 ('Magic Weapons', 12),
 ('Charge', 12),
 ('Shapechanger', 11),
 ('Swarm', 10),
 ('Water Breathing', 9)]

We can use

    special_abilities_df.xs(special_ability, level='special_ability')

to further investigate specific special abilities.

#### Spellcasting

In [87]:
spellcasting_df = special_abilities_df.xs('Spellcasting', level='special_ability')

In [88]:
from itertools import islice
from functools import reduce

columns = reduce(lambda x, y: x + y,
                 (['level_{}_spells'.format(level), 'level_{}_slots'.format(level)] for level in range(10)),
                 ['caster_level', 'save_dc', 'to_hit'])
dtypes = {x: y for x, y in zip(columns, ['float64']*3 + ['object', 'float64'] * 10)}
fill = {x: y for x, y in zip(columns, [0] * 3 + ['', 0] * 10)}


def parse_spellcasting(x):
    desc, *_ = x.iloc[0]
    header, *levels = desc.splitlines()

    caster_level = int(re.findall(r'(\d+)\w{2}-level', header)[0])
    save_dc = int(re.findall(r'DC (\d+)', header)[0])
    to_hit = int(re.findall(r'\+(\d+) to hit', header)[0])
    spellbook = {'caster_level': caster_level, 'save_dc': save_dc, 'to_hit': to_hit}

    for line in levels:
        line = line.strip()
        if line.startswith('•') or line.startswith('Cantrips'):
            level, slots, *spells = parse_spells_by_level(line)
            spellbook['level_{}_spells'.format(level)] = spells
            spellbook['level_{}_slots'.format(level)] = slots
        elif line.startswith('*'):
            note = line
    return pd.DataFrame(spellbook, columns=columns, index=[x.name] )


def parse_spells_by_level(x):
    level_and_slots, spells = x.split(':')
    spells = spells.strip()
    if 'Cantrips' in level_and_slots:
        level, slots = '0', 'inf'
    else:
        g = re.search('(\d)\w{2} level \((\d+) slots?\)', level_and_slots)
        level, slots = g.groups()
    return int(level), float(slots), spells

In [89]:
spellbook_df = spellcasting_df.groupby('name').apply(parse_spellcasting)
spellbook_df = clean_index(spellbook_df, fill=fill)

#### Innate spellcasting

In [90]:
innate_spellcasting_df = special_abilities_df.xs('Innate Spellcasting', level='special_ability')
innate_spellcasting_df.head()

Unnamed: 0_level_0,desc,attack_bonus,damage_dice
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cloud Giant,The giant's innate spellcasting ability is Cha...,0,
Couatl,The couatl's spellcasting ability is Charisma ...,0,
Deep Gnome (Svirfneblin),The gnome's innate spellcasting ability is Int...,0,
Deva,The deva's spellcasting ability is Charisma (s...,0,
Djinni,The djinni's innate spellcasting ability is Ch...,0,


In [91]:
from collections import defaultdict

def parse_desc(x):
    x = x.iloc[0]['desc']
    lines = [y.strip() for y in x.split('\n') if y]
    desc = lines[0]
    dc = re.findall('DC (\d+)', desc)[0]
    spellbook = defaultdict(list)
    for line in islice(lines, 1, None):
        freq, spells = line.split(': ')
        for spell in spells.split(', '):
            if '(' in spell:
                spell, _ = spell.split(' (', maxsplit=1)
            pattern = spell.strip('*').strip() + r'$'
            assert spells_df[spells_df.index.str.match(pattern, case=False)].shape[0] == 1
            for x in spells_df[spells_df.index.str.match(pattern, case=False)].iterrows():
                spell_name, spell_level = x[0], x[1].loc['spell_data', 'level']
                spellbook['freq'].append(freq)
                spellbook['dc'].append(dc)
                spellbook['spell_name'].append(spell_name)
                spellbook['spell_level'].append(spell_level)
    spells_known = pd.DataFrame(spellbook, columns=['spell_name', 'spell_level', 'dc', 'freq'])
    spells_known = spells_known.set_index('spell_name')
    spells_known.freq = spells_known.freq.astype('category')
    return(spells_known)

innate_spellbook_df = innate_spellcasting_df.groupby('name').apply(parse_desc)

### Reactions

In [92]:
{tuple(action.keys()) for actions in monster_df.reactions for action in actions}

{('name', 'desc', 'attack_bonus')}

In [93]:
columns = ['name', 'desc', 'attack_bonus']
rename = {'name':'reaction'}
index = ['reaction']

reactions_df = monster_df.reactions.groupby('name').apply(make_sub_df,
                                                          columns=columns,
                                                          rename=rename,
                                                          index=index)

In [94]:
reactions_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus
name,reaction,Unnamed: 2_level_1,Unnamed: 3_level_1
Bandit Captain,Parry,The captain adds 2 to its AC against one melee...,0
Black Pudding,Split,When a pudding that is Medium or larger is sub...,0
Chain Devil,Unnerving Mask,When a creature the devil can see starts its t...,0
Erinyes,Parry,The erinyes adds 4 to its AC against one melee...,0
Gladiator,Parry,The gladiator adds 3 to its AC against one mel...,0


### Legendary actions

In [95]:
legendary_actions_keys = {tuple(action.keys()) for actions in monster_df.legendary_actions for action in actions}
legendary_actions_keys

{('name', 'desc', 'attack_bonus'),
 ('name', 'desc', 'attack_bonus', 'damage_dice')}

In [96]:
columns = ['name', 'desc', 'attack_bonus', 'damage_dice']
rename = {'name':'legendary_action'}
index = ['legendary_action']

legendary_actions_df = monster_df.legendary_actions.groupby('name').apply(make_sub_df,
                                                                          columns=columns,
                                                                          rename=rename,
                                                                          index=index)

In [97]:
legendary_actions_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,desc,attack_bonus,damage_dice
name,legendary_action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aboleth,Detect,The aboleth makes a Wisdom (Perception) check.,0,
Aboleth,Tail Swipe,The aboleth makes one tail attack.,0,
Aboleth,Psychic Drain (Costs 2 Actions),One creature charmed by the aboleth takes 10 (...,0,
Adult Black Dragon,Detect,The dragon makes a Wisdom (Perception) check.,0,
Adult Black Dragon,Tail Attack,The dragon makes a tail attack.,0,


### Finalizing the `monster_df` and related data frames

In [98]:
monster_df.drop(['condition_immunities',
                 'damage_immunities', 
                 'damage_resistances', 
                 'damage_vulnerabilities',
                 'actions',
                 'reactions',
                 'legendary_actions',
                 'special_abilities'],
                axis=1, inplace=True)

## Export data
Now that we've loaded and cleaned our data, we'll save it to a collection of files in the `data` directory.

In [99]:
spells_df.to_pickle('data/spells_df')
monster_df.to_pickle('data/monster_df')
condition_immunities_df.to_pickle('data/condition_immunities_df')
damage_immunities_df.to_pickle('data/damage_immunities_df')
damage_resistances_df.to_pickle('data/damage_resistances_df')
damage_vulnerabilities_df.to_pickle('data/damage_vulnerabilities_df')
actions_df.to_pickle('data/actions_df')
special_abilities_df.to_pickle('data/special_abilities_df')
spellbook_df.to_pickle('data/spellbook_df')
innate_spellbook_df.to_pickle('data/innate_spellbook_df')

# legendary_actions_df
# reactions_df