# Network Analysis of Harry Potter Book Series

## Setup

In [1]:
!pip install -r requirements.txt
from sentiment import *

Collecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 11.4 MB/s 
Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 10.5 MB/s 
[?25hCollecting soupsieve>1.2
  Downloading soupsieve-2.2.1-py3-none-any.whl (33 kB)
Installing collected packages: soupsieve, vaderSentiment, beautifulsoup4
Successfully installed beautifulsoup4-4.9.3 soupsieve-2.2.1 vaderSentiment-3.3.2


## Function Definitions for Text Analysis

`happiness(doc)` takes a list of words and computes the average happiness score using the Hedonometer

`emotion_score(doc)` takes a list of words and computes a dictionary of average emotion scores among the emotions: _Anger, Anticipation,	Disgust, Fear, Joy, Sadness, Surprise_ and _Trust_

`vader_sentiment(doc)` takes a list of words and/or sentences and computes the average VADER compund score

`TF_IDF(docs_to_analyse, all_docs)` computes TF and TF-IDF score of terms in docs_to_analyse. all_docs are used to comput IDF scores 


In [2]:
import pandas as pd
import pickle as pkl
from clean_books import clean_book

In [3]:
chapter_info = pd.read_pickle('data/chapter_dataframe.pkl')
# Add book number to chapters
idx = list(np.where(chapter_info["Local Chapter"].values == 1)[0])
for i in range(7):
    chapter_info.loc[idx[i]:(idx[i+1] if i+1 < len(idx) else None), "Book"] = int(i+1)

chapter_info

Unnamed: 0,Global Chapter,Local Chapter,Title,Approx Story Time,Book
0,1,1,The Boy Who Lived,1981-11-01,1.0
1,2,2,The Vanishing Glass,1991-06-23,1.0
2,3,3,The Letters from No One,1991-07-23,1.0
3,4,4,The Keeper of Keys,1991-07-31,1.0
4,5,5,Diagon Alley,1991-07-31,1.0
...,...,...,...,...,...
194,195,33,The Prince’s Tale,1998-05-02,7.0
195,196,34,The Forest Again,1998-05-02,7.0
196,197,35,King’s Cross,1998-05-02,7.0
197,198,36,The Flaw in the Plan,1998-05-02,7.0


In [4]:
with open('data/characters_by_house.pkl', 'rb') as file:
    characters_by_communities = pkl.load(file)

characters_by_communities

{'Gryffindor': ['Albus Dumbledore',
  'Alicia Spinnet',
  'Andrew Kirke',
  'Angelina Johnson',
  'Bill Weasley',
  'Celestina Warbeck',
  'Charlie Weasley',
  'Cormac McLaggen',
  'Dean Thomas',
  'Demelza Robins',
  'Dennis Creevey',
  'Euan Abercrombie',
  'Fred Weasley',
  'Geoffrey Hooper',
  'George Weasley',
  'Ginny Weasley',
  'Godric Gryffindor',
  'Harry Potter',
  'Jack Sloper',
  'James Potter',
  'James Sirius Potter',
  'Jimmy Peakes',
  'Katie Bell',
  'Kenneth Towler',
  'Lavender Brown',
  'Lee Jordan',
  'Lily Potter',
  'Minerva McGonagall',
  'Natalie McDonald',
  'Nearly-Headless Nick',
  'Neville Longbottom',
  'Oliver Wood',
  'Panju Weasley',
  'Parvati Patil',
  'Patricia Stimpson',
  'Percy Weasley',
  'Peter Pettigrew',
  'Remus Lupin',
  'Ritchie Coote',
  'Romilda Vane',
  'Ron Weasley',
  'Rose Granger-Weasley',
  'Rubeus Hagrid',
  'Seamus Finnigan',
  'Sir Cadogan',
  'Vicky Frobisher',
  'Wormtail',
  'Yann Fredericks'],
 'Hufflepuff': ['Artemisia Lufk

In [7]:
import os, re
path = "data/books/"
books = os.listdir(path)
books.sort()
for i in range(1):
    book = clean_book(path + books[i])
    chapters = [chapter.upper() for chapter in chapter_info.loc[chapter_info["Book"] == i+1, "Title"]]
    regexPattern = '|'.join(chapters)
    book_ = re.split(regexPattern, book)
    print(re.findall(regexPattern, book))
    print(chapters)


['THE BOY WHO LIVED', 'THE LETTERS FROM NO ONE', 'THE SORTING HAT', 'THE POTIONS MASTER', 'THE MIDNIGHT DUEL', 'QUIDDITCH', 'THE MIRROR OF ERISED', 'THE FORBIDDEN FOREST', 'THROUGH THE TRAPDOOR', 'THE MAN WITH TWO FACES']
['THE BOY WHO LIVED', 'THE VANISHING GLASS', 'THE LETTERS FROM NO ONE', 'THE KEEPER OF KEYS', 'DIAGON ALLEY', 'THE JOURNEY FROM PLATFORM NINE AND THREE-QUARTERS', 'THE SORTING HAT', 'THE POTIONS MASTER', 'THE MIDNIGHT DUEL', "HALLOWE'EN", 'QUIDDITCH', 'THE MIRROR OF ERISED', 'NICOLAS FLAMEL', 'NORBERT THE NORWEGIAN RIDGEBACK', 'THE FORBIDDEN FOREST', 'THROUGH THE TRAPDOOR', 'THE MAN WITH TWO FACES']


In [None]:
print(book)

THE BOY WHO LIVED 

Mr. and Mrs. Dursley, of number four, Privet Drive, 
were proud to say that they were perfectly normal, 
thank you very much. They were the last people you’d 
expect to be involved in anything strange or 
mysterious, because they just didn’t hold with such 
nonsense. 

Mr. Dursley was the director of a firm called 
Grunnings, which made drills. He was a big, beefy 
man with hardly any neck, although he did have a 
very large mustache. Mrs. Dursley was thin and 
blonde and had nearly twice the usual amount of 
neck, which came in very useful as she spent so 
much of her time craning over garden fences, spying 
on the neighbors. The Dursley s had a small son 
called Dudley and in their opinion there was no finer 
boy anywhere. 

The Dursleys had everything they wanted, but they 
also had a secret, and their greatest fear was that 
somebody would discover it. They didn’t think they 
could bear it if anyone found out about the Potters. 
Mrs. Potter was Mrs. Dursley’s si

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=85fb65b4-b596-4730-837e-04e86eafe419' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>