#  AntiPoed Logbook

#### Computational Creativity - Assignment 2

##### Titus Oosting (s2683466)

### Overview

In this logbook I describe the process of building an application that writes novel poems. The final output can be seen in the Little AntiPoed Poetry Collection. 

AntiPoed is a combinational-creative application that fuses the work of a base poet with that of the poets they inspired. Inspiration, with a non-linear twist. To do so, it downloads poems from a website then processes the text. AntiPoed employs the spaCy NLP toolset to produce new works of creative art. 

This logbook contains the following sections:
* [Introduction](#Introduction)
* [Technical Tools and Implementation](#Technical-Tools-&-Implementation)
* [Creative Reflection](#Creative-Reflection)
* [Future Development](#Future-Development)


### Introduction

What does creativity mean when talking about poetry? Is it form, theme or vocabulary? How does independent creative inspiration measure against external, time-bound influence?

The skill, dedication and vision of great poets has driven creativity in language and art. In doing so, they will always stand on the linguistic, stylistic and thematic shoulders of their predecessors. Poetic influence is hard to measure but fundamental to the creative process.

The nature of time dictates that this process of inspiration operates in one direction. Themes are debased and transfigured; linguistic liberty establishes new norms; words are reused and reimagined.

Can we imagine this linear flow of poetic creation in a more circular form? What would a budding Edgar Allan Poe, who died two years before Oscar Wilde was born, take for linguistic and thematic inspiration from Oscar that he himself explicitly inspired? How might his gloomy themes, expounded through his mastery of language and punctation, be transformed through the modern lexicon of a near-contemporary poet that he inspired, such as Maya Angelou?

AntiPoed reconfigures these micro-antipodes, subsets of poetic cause and effect, and creates works of poetry that transcend the vagaries of linear time and loss. 

The poems in the Little AntiPoed Poetry Collection are an exploration of what Poe's poems may have been like were he to be influenced by his successors.

## Technical Tools & Implementation


#### Origin and Credits
[Benjamín Durán's scraper and replacer](https://towardsdatascience.com/creating-a-poems-generator-using-word-embeddings-bcc43248de4f) inspired the use of spaCy's [Word Vectors and Semantic Similarity feature](https://spacy.io/usage/vectors-similarity) and offered easily-implementable-code for scraping a specific site for sets of poems of specific poets.
I used and transformed parts of this code to download and save an inspiring set of poems ('inspirers', which in the context of AntiPoed are, of course, poets that were inspired by the original poet) and the base poems for the poetry collection (written by the 'inspiree').

### Technical Overview

#### Base poems

The inspiree's name is given as variable `inspiree`.  For each of the poems that are outputted in a single run of the application, their amount set by `len_poetry_collection`, a base poem is chosen from this poet's existing poems. These poems are then each split into sentences which are subsequently modified with inspirations from the inspiring set using SpaCy's tools. 

#### Selection of poets

For the supplied poetry collection, I used two inspirers - poets that explicitly stated the influence on their works by Edgar Allan Poe. The application allows (at a cost in performance) any amount of inspirers to be included in the set, and the inspiree can also be changed. These names are entered into the `inspirers` variable. The [website used here](http://mypoeticside.com) conveniently lists a the inspirations and inspirees for each author. 

#### Inspiring Set

The full set of poems of each of the inspirers is similarly split into sentences. These sentences are then scanned using spaCy's [part-of-speech tagging](https://spacy.io/usage/linguistic-features#pos-tagging) to fill three lists of strings: one for nouns, one for adjectives, and one for proper nouns. This could be expanded to include other entities or additionaly structural elements. 

#### antipoed_poem_maker 

For each base poem, each of its sentences is scanned for the same grammar types that we collected for the inspiring set: nouns, adjectives and proper nouns. These are replaced by the best-scoring option from the Inspiring Set, using spaCy's [Word Vectors and Semantic Similarity](https://spacy.io/usage/vectors-similarity). 
For reasons of both randomness and performance, a sample of the Inspiring Set's words are used, rather than the entire list. The sample size can be controlled using `size_inspiring_nouns_set` (/adjs/proper_nouns). 


### Outputting

#### Poem Title
AntiPoed goes beyond mere words to [capture the emotions](https://pypi.org/project/text2emotion/) of its digital products. These emotions are outputted in a non-traditional but visually eloquent image-title. `'Happy', 'Angry', 'Surprise', 'Sad', 'Fear'` are respectively mapped to `'orange', 'firebrick', 'cyan', 'slateblue', 'thistle'`. The width of the bars represents the proportion of a particular emotion.

#### Title Page  -> Network Graph
I created a similarity matrix of words relative to each other for each of the adjectives of `inspiring_adjs`. The outputted csv file I then edited in [Gephi](https://gephi.org/) to create a network graph, which serves to give a visual-thematic impression of the word library that the collection's poems are based on. It serves to give a feel and impression of the AntiPoed method. 

#### Poem Display

The list of sentences for each poem is joined back together into one text, adding a capitalization for the very first letter of the poem and a period to close it. The markdown display() function is used for formatting. See Future Development for a further note on punctuation in particular. 


### spaCy features

* [Word Vectors and Semantic Similarity](https://spacy.io/usage/vectors-similarity)
* [Part-of-speech Tagging](https://spacy.io/usage/linguistic-features#pos-tagging)


### spaCy model

* spaCy is loaded with the variable `nlp_model`, where its dataset can be `en_core_web_sm`, `en_core_web_md`, `en_core_web_lg`, the latter of which is most accurate and about 787mb in size. 




## Creative Reflection

Imagine a scholar of Oscar Wilde's poetry. 

Ask them to capture the writer's style, and the distinctiveness of his poetry, and find some ways to transpose these onto the poems of one of his explicit inspirers - those of Edgar Allan Poe. 

Can we call the resulting work - the elegant blend of styles - or Frankenstein's poetic monster - a creative work of art? Is there novelty beyond the combinational if the inspiring set is limited to the templates and words of just two previous creators?

The efforts of our individual scholar would no doubt be considered creative in their own right. Even though they construct their work from such specific building blocks, they have applied their own interpretation and inspiration into producing a novel product: a poem that has never been written before. 

A computer program has, perhaps, a higher threshold to cross to 'feel safe' in its perceived creative credentials. If our scholar were to change only one word from an existing poem, we can imagine some deep reason for and implication of such a change. A piece of code providing a similar output is more likely to be accused of devolution and arbitrariness.

But AntiPoed is no less a creator than our esteemed scholar. We don't label any such attempt made by a human to be non-creative (providing they make a fair effort), but should we not hold non-human entities to the same standard? 



## Future Development

### Automation
#### Improved Poet Crawler
* Poets that are inspired by the base poet ('inspirers' for this application) are currently set manually with an environment variable. With a little additional code, these could be collected automatically from the website's list of inspirees. In this case, a variable should be offered that caps the amount to be downloaded (for performance reasons).

### Poem Generation - Improving the Current Replacement Method
#### Fixes
* Capitalize single 'i' in the poems
* Avoid plural / singular combination of nouns (they tend to score high on similarity)

#### Rhyming
* Edgar Allan Poe employed rhyming schemes in most of his poetry. AntiPoed ignores these rhymes when it re-inspires words. By using a tool such as [pronouncing](https://pypi.org/project/pronouncing/), replacement words could be scored and chosen not only on vector-similarity but also on rhyming quality.

####  Named Entity Recognition
* Proper noun replacements are based on similarity score, but could be more accurately replaced using spaCy's [Named Entity Recognition](https://spacy.io/usage/linguistic-features#named-entities) feature. This tool recognizes [various types](https://spacy.io/api/annotation#named-entities) of proper nouns, and would allow replacement within those types instead of across the entire spectrum.  

####  Punctuation
* Edgar Allan Poe was particular about his use of punctuation, and had [strong opinions](https://www.newsandtimes.com/2016/12/poe-and-the-all-important-dash/) on its role (contemporary and in general) in the English language. Our sentence-splitter makes rough work of these efforts, and does not read them in any significant way, nor does it have much logic for outputting punctuation. Searching for something like 'nlp punctuation' mainly returns results that help get rid of / clean up punctuation, but there are plenty of ways to read and filter the relevant symbols, including in spaCy, and some logic could be written to deduce and recreate punctuation styles. 

### Poem Generation - Additional Functionalities to Consider
####  Verbs and Grammar
* A primary critique of AntiPoed might be its lack of ability to create poems that are structurally 'brand new', as it doesn't modify the base poems' structure beyond the meter. It might be desirable for a future version to do some derivation of structure from the inspirers. While verbs could easily be replaced by applying the same trick as AntiPoed does for nouns / adjectives / proper nouns, we can imagine an even bigger impact on the poem's coherence, so some additional logic would be desirable. 

####  Mutations
* Additional variation both in structure and meaning could be made through random specific events, comparable to mutations in a genetic algorithm. Verses could be shuffled around, deleted, or entirely new verses could be added. Specific words could be changed through separate logic. Each of these steps would ideally have some logic or meaning behind it, where tools for sentiment analysis, image recognition (for structure) and grammar rules could be useful.


# Todo

* fix titles placement
* fix poem text

* Make Network graph if possible
* Presententaion(title page has network graph of words) - centering, font?  / PDF or webpage?



# Imports

In [1]:
import urllib.request
import warnings
warnings.filterwarnings('ignore')
from bs4 import BeautifulSoup
import pandas as pd
from dataclasses import dataclass, field
from typing import List
import os
import pandas as pd
import numpy as  np
import re
import spacy
from IPython.display import Image, display, Markdown
import random
import matplotlib.pyplot as plt
import pandas as pd
import text2emotion as te
plt.style.use('dark_background')

ModuleNotFoundError: No module named 'spacy'

### Helper Functions

In [None]:
class AppURLopener(urllib.request.FancyURLopener): 
    version = "Mozilla/82.0.2" 


## AntiPoed Class

In [None]:
@dataclass
class AntiPoed:
    poetry_collection_name: str = "Little AntiPoed Poetry Collection"
    len_poetry_collection: int = 4
    inspiree: str =  "edgar-allan-poe-poems" # inspiree
    inspirers: List =  field(default_factory=lambda: ["maya-angelou-poems", "oscar-wilde-poems"] )
    source_website: str = 'https://mypoeticside.com/poets/'
    nlp_model =  spacy.load("en_core_web_lg")
    load_inspiring_set:bool = False
    size_inspiring_nouns_set: int = 60
    size_inspiring_adjs_set: int = 60
    size_inspiring_proper_nouns_set: int = 60
    inspiring_nouns: list = field(default_factory=list)
    inspiring_adjs: list = field(default_factory=list)
    inspiring_proper_nouns: list = field(default_factory=list)
    base_poems: list = field(default_factory=list)
    new_poems: list = field(default_factory=list)   
        
    
    
    def read_poems_to_csv(self, poet: str)->None:
        """Take a poet's name & read poems to csv."""
        data = self.opener.open(self.source_website + poet).read().decode()
        soup =  BeautifulSoup(data, 'html.parser')
        poem_list = soup.find(class_= "list-poems")
        links = poem_list.findAll('a')
        results = ["https:"+link.get('href') for link in links]
        titles = []
        corpus = []
        for page in results:
            data = self.opener.open(page).read().decode()
            soup = BeautifulSoup(data, 'html.parser')
            title = soup.find(class_='title-poem')
            poem = soup.find(class_='poem-entry')
            titles.append(title.getText())
            corpus.append(poem.find('p').getText())
            poems = pd.DataFrame({'title' : titles,'text' : corpus})
            poems.to_csv('poems_' + poet.replace('-poems','')+ '.csv')

        
    def inspiring_nouns_adjs(self, file_path:str, split=r"\n"):
            """Gathers the nouns & adjectives for inspiring set."""
            df_poems = pd.read_csv(file_path)
            number_poems = df_poems.shape[0]

            for i in range(number_poems):
                text = df_poems.text[i]
                #dictionary to replace unwanted elements
                replace_dict = {'?«' :  '«', '(' :  '', ')' : '', ':' : ',', '.' : ',', 
                    ',,,' : ',', '"': '', '\r': ''}
                for x,y in replace_dict.items():
                    text = text.replace(x, y)
                text = text.lower()   
                #split into sentences
                sentences = re.split(split, text)
                for sentence in sentences:
                    doc = self.nlp_model(sentence)
                    for token in doc:
                        if token.pos_ =='NOUN':
                            self.inspiring_nouns.append(token.text)
                        elif token.pos_ == 'ADJ':
                            self.inspiring_adjs.append(token.text)
                        elif token.pos_ == 'PROPN':
                            self.inspiring_proper_nouns.append(token.text)
                            
    
    
    def prepare_inspiring_set(self) ->None:
        """Gets & prepares the inspiring set for poem generation."""
        
        list_poets = self.inspirers + [self.inspiree]
        poem_files = [f'poems_'+ poet.replace('-poems','') + '.csv' for poet in list_poets]
        
        # download data
        if self.load_inspiring_set:
            self.opener = AppURLopener()
            for poet in list_poets:
                self.read_poems_to_csv(poet)
        
        for poet_file in poem_files:
            self.inspiring_nouns_adjs(file_path=poet_file, split=r"\n")
    
    def base_poem_prepare(self, poem, split=r"\n") ->dict:
        """Select base poem from inspiree set & prepare template"""
        text = poem['text'].iloc[0]
        #dictionary to replace unwanted elements
        replace_dict = {'?«' :  '«', '(' :  '', ')' : '', ':' : ',', '.' : ',', 
            ',,,' : ',', '"': '', '\r': ''}
        for x,y in replace_dict.items():
            text = text.replace(x, y)
        text = text.lower()   
        #split into sentences
        base_sentences = re.split(split, text)
        return {"title": poem['title'].iloc[0], "poem": base_sentences }
    
    def antipoed_poem_maker(self, base_poem:dict)-> dict:
        """Makes a new poem for each of the base set."""
        sentences = base_poem["poem"].copy()
        for index, sentence in enumerate(sentences):
            doc = self.nlp_model(sentence)
            new_sentence = sentence
            for token in doc:
                if token.pos_ == 'NOUN':
                    sim_dict = {}
                    for noun in random.sample(self.inspiring_nouns, self.size_inspiring_nouns_set):
                        noun = self.nlp_model(noun)
                        similarity = self.nlp_model(token.text).similarity(noun)
                        sim_dict[noun] =similarity
                    replacement = str(max(sim_dict, key=sim_dict.get))
                    new_sentence = new_sentence.replace(token.text,replacement)
                elif token.pos_ == 'ADJ':
                    sim_dict = {}
                    for adj in random.sample(self.inspiring_adjs, self.size_inspiring_adjs_set):
                        adj = self.nlp_model(adj)
                        similarity = self.nlp_model(token.text).similarity(adj)
                        sim_dict[adj] =similarity
                    replacement = str(max(sim_dict, key=sim_dict.get))
                    new_sentence = new_sentence.replace(token.text,replacement)
                elif token.pos_ == 'PROPN':
                    sim_dict = {}
                    for pprnoun in random.sample(self.inspiring_proper_nouns, self.size_inspiring_proper_nouns_set):
                        pprnoun = self.nlp_model(pprnoun)
                        similarity = self.nlp_model(token.text).similarity(pprnoun)
                        sim_dict[pprnoun] =similarity
                    replacement = str(max(sim_dict, key=sim_dict.get))
                    new_sentence = new_sentence.replace(token.text,replacement)
            if any(token.pos_ in ['NOUN', 'ADJ','PROPN'] for token in doc):
                sentences[index] = new_sentence
        sentences[0] = '      ' + sentences[0]
        str_poem = ("\n".join(sentences))
        new_poem = {"title": base_poem["title"], "poem": str_poem}
        return new_poem
        

    
    @staticmethod
    def format_poem(text:str)->str:
        text = text[0].upper() + text[1:]
        text = text[:-1] + '.'
        return text 
    
    @staticmethod
    def make_poem_title(str_poem:str)-> None:
        """Makes a title visualisation based on prevalent emotions."""
        df = pd.DataFrame(te.get_emotion(str_poem),index=[0])
        ax = df.plot.barh(stacked=True, color =["orange", "firebrick",  "cyan",  "slateblue", "thistle"])
        ax.set_axis_off()
        ax.patch.set_visible(False)
        ax.set_frame_on(False)
        ax.get_legend().remove()
        plt.show()

    
    def antipoed_poem(self, poem: dict) ->None:
        """Present individual poem."""
    
        ## gets title (the emotions prevelant) & displays
        self.make_poem_title(poem.get('poem'))
        display(Markdown(self.format_poem(poem.get('poem'))))
    
    def antipoed_poetry_collection(self):
        """Displays anti-poed poetry collection."""
        
        # generate & prepare inspring set
        self.prepare_inspiring_set()
        
        # choose base poems form inspire
        df = pd.read_csv(f'poems_'+ self.inspiree.replace('-poems','') + '.csv')
        for i in range(self.len_poetry_collection):
            poem = df.sample()
            self.base_poems.append(self.base_poem_prepare(poem))
        
        # generate new poems
        for i in range(self.len_poetry_collection):
            self.new_poems.append(self.antipoed_poem_maker(self.base_poems[i]))
        
        # display 
        display(Markdown("# " + self.poetry_collection_name))
        image_name = 'wordweb'
        display(Image(filename=f'pics/{image_name}.png'))
        for new_poem in self.new_poems:
            self.antipoed_poem(new_poem)
        

In [None]:
AP = AntiPoed()
AP.antipoed_poetry_collection()

#### Network Graph

The code used to generate the network data (in csv format) for creating the image for the title image. 

In [None]:
adjs_sample = random.sample (AP.inspiring_adjs, 300 )
matrix_rows = []

for token1 in adjs_sample:
    row = [AP.nlp_model(token1).similarity(AP.nlp_model(token2)) for token2 in adjs_sample]
    matrix_rows.append(row)

similarity_matrix = np.array(matrix_rows)
df = pd.DataFrame(data = similarity_matrix, columns = adjs_sample, index = adjs_sample)
df.to_csv('gephi.csv', sep = ',')