#####**Important Note:** To Run all cells please use **Google Colab** and upload all five poems in sample_data folder. (Please be careful with names of file especially 'Juliet's Soliloguy' because file paths does not like apostrophy ')

#  NLTK VADER
# Sentiment Analysis with VADER in NLTK for Shakespeare’s Poems:

####1.1 (2.0 points) Calculate the overall sentiment score for each file and list the filename with corresponding scores.


In [1]:
!pip install vaderSentiment
!pip install nltk

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [10]:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
from nltk import sent_tokenize
nltk.download('punkt')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [12]:
# Define the sentiment analysis function
def calculate_sentiment(filename):
    with open(filename, 'r', encoding='utf-8') as file:
        text = file.read()

    # Tokenize text into sentences
    sentences = sent_tokenize(text)

    # Create SentimentIntensityAnalyzer object
    sia = SentimentIntensityAnalyzer()

    # Compute sentiment scores
    compound_scores = [sia.polarity_scores(sentence)['compound'] for sentence in sentences]

    # Calculate average compound score
    overall_score = sum(compound_scores) / len(compound_scores)

    return overall_score

# List of files
files = [
    '/content/sample_data/Shakespeare_A Fair Song.txt',
    '/content/sample_data/Shakespeare_Blow, Blow, Thou Winter Winda.txt',
    '/content/sample_data/Shakespeare_Fear No More.txt',
    '/content/sample_data/Shakespeare_Juliets Soliloquy.txt',
    '/content/sample_data/Shakespeare_Sonnet 130.txt'
]

# Compute sentiment scores for each file
sentiment_scores = {file.split('/')[-1]: calculate_sentiment(file) for file in files}
sentiment_scores

{'Shakespeare_A Fair Song.txt': 0.1604,
 'Shakespeare_Blow, Blow, Thou Winter Winda.txt': 0.27687999999999996,
 'Shakespeare_Fear No More.txt': -0.18086249999999998,
 'Shakespeare_Juliets Soliloquy.txt': -0.4098538461538462,
 'Shakespeare_Sonnet 130.txt': 0.46145}

####1.2 (0.5 points) Rank the overall sentiment score for each file from the positive score to the negative score. Print all the filenames with corresponding scores.

In [13]:
# Sort the files by sentiment score, from most positive to most negative
sorted_scores = sorted(sentiment_scores.items(), key=lambda item: item[1], reverse=True)

# Print the filenames with their corresponding sentiment scores
for filename, score in sorted_scores:
    print(f"{filename}: {score:.2f}")

Shakespeare_Sonnet 130.txt: 0.46
Shakespeare_Blow, Blow, Thou Winter Winda.txt: 0.28
Shakespeare_A Fair Song.txt: 0.16
Shakespeare_Fear No More.txt: -0.18
Shakespeare_Juliets Soliloquy.txt: -0.41


####1.3 (0.5 points) Your program should be robust and able to dynamically process any number of files as inputs to generate sentiment values for each file and then automatically rank all of them.

In [14]:
def calculate_sentiment(filename):
    with open(filename, 'r', encoding='utf-8') as file:
        text = file.read()

    # Tokenize text into sentences
    sentences = sent_tokenize(text)

    # Create SentimentIntensityAnalyzer object
    sia = SentimentIntensityAnalyzer()

    # Compute sentiment scores
    compound_scores = [sia.polarity_scores(sentence)['compound'] for sentence in sentences]

    # Calculate average compound score
    return sum(compound_scores) / len(compound_scores) if compound_scores else 0

def process_directory(directory):
    import os
    # Retrieve all text files in the directory
    files = [os.path.join(directory, f) for f in os.listdir(directory) if f.endswith('.txt')]

    # Compute sentiment scores for each file
    sentiment_scores = {os.path.basename(file): calculate_sentiment(file) for file in files}

    # Sort the files by sentiment score, from most positive to most negative
    sorted_scores = sorted(sentiment_scores.items(), key=lambda item: item[1], reverse=True)

    # Print the filenames with their corresponding sentiment scores
    for filename, score in sorted_scores:
        print(f"{filename}: {score:.2f}")

# Specify the directory containing the text files
directory_path = '/content/sample_data'
process_directory(directory_path)

Shakespeare_Sonnet 130.txt: 0.46
Shakespeare_Blow, Blow, Thou Winter Winda.txt: 0.28
Shakespeare_A Fair Song.txt: 0.16
Shakespeare_Fear No More.txt: -0.18
Shakespeare_Juliets Soliloquy.txt: -0.41


# 1.4 (1.0 points) Briefly describe what you can find from the results.

# 1.4 (1.0 points) Briefly describe what you can find from the results.

####1.4.1 (0.3 points) Double check each overall score if they make sense for each poem based on your understanding for the poem.

Analysis of Scores:

- Shakespeare_Juliets Soliloquy: -0.41
  The poem reflects sadness, fear, and worry as Juliet contemplates her fate.The negative score aligns well with the somber theme.
  
- Shakespeare_Sonnet 130: 0.46
  This poem uses humor and irony to describe love. The positive score captures the poem's playful and satirical tone.
  
- Shakespeare_A Fair Song: 0.16
  The poem portrays the fairy queen with admiration. Although positive, the score indicates a less intense sentiment compared to other poems.

- Shakespeare_Blow, Blow, Thou Winter Wind: 0.28
  The poem conveys a negative outlook, highlighting the bitterness of winter and the insincerity of love.

- Shakespeare_Fear No More: -0.18
  The poem reflects on mortality and suffering, mixed with a sense of acceptance. The negative score reflects the contemplation of life's hardships and inevitable end.



# 1.4.2 (0.3 points) Briefly explain the statistical methods you choose to get an overall sentiment score are feasible.

The statistical method chosen involves computing the compound sentiment score using NLTK's `SentimentIntensityAnalyzer` for each sentence in the text. The average compound score across all sentences in a file provides the overall sentiment score for the text.

This approach is feasible as it accounts for the nuanced sentiment expressed in the text by considering the valence (positive/negative intensity) of words and phrases. The averaging of compound scores provides a balanced view of the text's sentiment, capturing both positive and negative emotions.



# 1.4.3 (0.4 points) Briefly describe more possible extended applications based on this assignment.

Sentiment analysis techniques explored in this assignment have diverse applications:

- **Content Recommendation Systems:** Utilize sentiment analysis to tailor content recommendations based on user preferences and emotional states.
  
- **Customer Feedback Analysis:** Gain insights into customer sentiments from reviews and feedback to enhance product/service quality and customer satisfaction.

- **Market Research:** Analyze public sentiment towards products, services, or brands to inform marketing strategies and campaign planning.

- **Literary and Cultural Studies:** Automate sentiment analysis of literary works to identify themes, trends, and shifts in societal emotions over time.

These applications highlight the versatility and impact of sentiment analysis across various domains, enabling data-driven decision-making and enhancing user experiences.