# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

# Tim Gormly

### 3/18/2024:

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [18]:
import requests
import json
import pickle

result = json.loads(requests.get('https://api.lyrics.ovh/v1/The White Stripes/My Doorbell').text)

## pickle dictionary
serialized_dict = pickle.dumps(result)

## write serialized dictionary to a file
try:
    with open("lyrics.txt", "wb") as file:
        file.write(serialized_dict)
    print("Serialized data has been written to 'lyrics.txt'.")
except IOError:
    print("Error occurred while writing to the file.")

Serialized data has been written to 'lyrics.txt'.


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [19]:
import pickle
import spacy

from spacy.tokens import Doc, Span
from spacytextblob.spacytextblob import SpacyTextBlob

# read in file
try:
    with open("lyrics.txt", "rb") as file:
        lyrics = pickle.load(file)

        # print("Deserialized dictionary:")
        # print(lyrics)

except IOError:
    print("Error occurred while reading from the file.")
except pickle.UnpicklingError:
    print("Error occurred while unpickling the data.")
              
# print lyrics only

print("Lyrics of loaded file:")
print(lyrics['lyrics'])
print('--------------------')

# use nlp with spacytextblob to find sentiment polarity score of lyrics

# start spacy pipeline by loading pre-trained english model
nlp = spacy.load("en_core_web_sm")

# add spaceytextblob to the pipeline
nlp.add_pipe("spacytextblob")

# Procoss lyrics
doc = nlp(lyrics['lyrics'])

# show polarity score
print(f"Polarity: {doc._.blob.polarity}")

# Other items from spaCy documentation:
# print(f"Subjectivity: {doc._.blob.subjectivity}")                        # Subjectivity: 0.9
# print(f"Assessments: {doc._.blob.sentiment_assessments.assessments}") # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
# print(f"NGrams: {doc._.blob.ngrams()}")



# Does the song have a positive or negative connotation?

# The polarity is close to 0 at 0.13, and I think that matches the song pretty well.  The song 
# is largely about someone wanting more, but it isn't particularly sad or happy.


Lyrics of loaded file:
Paroles de la chanson My Doorbell par The White Stripes
I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it
I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it
I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it
Yeah, I been thinkin' about my doorbell

Oh, well

Well women and children need kisses
Not the man in my life I know
And I been goin' to Mister and Miss
I respect the art at the show
Take back what you said little girl
And while you're at it take yourself back too

I'm tired of sittin' and waitin'
Woman, whatcha gonna do now, whatcha gonna do about it

I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it
Yeah, I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it oh
I'm thinkin' about my doorbell
When ya gonna ring it, when ya gonna ring it
Yeah, I been thinkin' about my doorbell

Oh, well

You don't seem to come around
Point your finger an

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [20]:
def lyrics_from_api_request(artist, song, filename):

    # make api requests with parameters from function call
    result = json.loads(requests.get(f'https://api.lyrics.ovh/v1/{artist}/{song}').text)

    # serialize the result
    serialized_result = pickle.dumps(result)

    ## write serialized dictionary to a file
    try:
        with open(f"{filename}.txt", "wb") as file:
            file.write(serialized_result)
        print(f"Serialized data has been written to '{artist}_{song}_lyrics.txt'.")
    except IOError:
        print("Error occurred while writing to the file.")
        
lyrics_from_api_request("The White Stripes", "Blue Orchid", "Blue_Orchid")
lyrics_from_api_request("The White Stripes", "The Nurse", "The_Nurse")
lyrics_from_api_request("The White Stripes", "Little Ghost", "Little_Ghost")
lyrics_from_api_request("The White Stripes", "Forever For Her (Is Over For Me)", "Forever_For_Her")

Serialized data has been written to 'The White Stripes_Blue Orchid_lyrics.txt'.
Serialized data has been written to 'The White Stripes_The Nurse_lyrics.txt'.
Serialized data has been written to 'The White Stripes_Little Ghost_lyrics.txt'.
Serialized data has been written to 'The White Stripes_Forever For Her (Is Over For Me)_lyrics.txt'.


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [21]:
import pickle
import spacy

from spacy.tokens import Doc, Span
from spacytextblob.spacytextblob import SpacyTextBlob    

def analyze_lyrics_from_file(filename):
    # read in the serialized file
    try:
        with open(f"{filename}.txt", "rb") as file:
            lyrics = pickle.load(file)

            # print("Deserialized dictionary:")
            # print(lyrics)

    except IOError:
        print("Error occurred while reading from the file.")
    except pickle.UnpicklingError:
        print("Error occurred while unpickling the data.")

    nlp = spacy.load("en_core_web_sm")

    # add spaceytextblob to the pipeline
    nlp.add_pipe("spacytextblob")

    # Procoss lyrics
    doc = nlp(lyrics['lyrics'])

    # return polarity score
    return f"{filename} - Polarity: {doc._.blob.polarity}"

print(analyze_lyrics_from_file("Blue_Orchid"))
print(analyze_lyrics_from_file("Forever_For_Her"))
print(analyze_lyrics_from_file("Little_Ghost"))
print(analyze_lyrics_from_file("The_Nurse"))    

Blue_Orchid - Polarity: -0.05434782608695651
Forever_For_Her - Polarity: 0.19145299145299147
Little_Ghost - Polarity: 0.09387254901960786
The_Nurse - Polarity: -0.0762962962962963


### Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?

All four scores are very near zero.  For Blue Orchid and The Nurse, I expect a neutral score.  I don't think either song conveys a strong positive or negative message.  Forever For Her is a fairly sad song, but it had the strongest polarity of any of the songs at 0.191.  I would have expected this song to be the most clearly negative song of the four (the opening lyrics are "I blew it // And if i knew what to do, then i'd do it").  It is largely neutral, but I'm surprised it is as positive as it is.  Similarly, Little Ghost repeats the following lines 4 times: 

Little ghost, little ghost <br>
One I'm scared of the most <br>
Can you scare me up a little bit of love? <br>
I'm the only one that sees you <br>
And I can't do much to please you <br>
And it's not yet time to meet the Lord above

I think a lot of the words feel negative to me, and I don't see too much that is positive.  