# Sentiment-Based Book Recommendation Logic (Non-Interactive)

This notebook demonstrates the core logic of a sentiment aware book recommender system.
It uses TextBlob to analyze review sentiment and suggests alternative books that the user may enjoy.

### Importing Needed Libraries

In [19]:

import pandas as pd

from transformers import pipeline

from textwrap import wrap

from collections import Counter

# For language detection
from langdetect import detect

### Step 1: Load the Data

In [2]:
df = pd.read_csv(r"C:\book-sentiment-project\data\Book Reviews.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date
0,0,To Kill a Mockingbird,/// gentle reminder that this is not the time ...,"March 24, 2022"
1,1,To Kill a Mockingbird,\n|\n|6.0 stars. I know I am risking a serious...,"May 24, 2011"
2,2,To Kill a Mockingbird,\n|\n|Looking for a new book but don't want to...,"December 10, 2020"
3,3,To Kill a Mockingbird,"To Kill a Mockingbird, Harper Lee|To Kill a Mo...","July 1, 2022"
4,4,To Kill a Mockingbird,Why is it when I pick up | To Kill A Mockingbi...,"October 25, 2009"


### Step 2: Clean the Data

In [3]:
# Remove unwanted characters
df["Review"] = df["Review"].str.replace("\n", "", regex=True)
df["Review"] = df["Review"].str.replace("[/|]", "", regex=True)
df["Review"] = df["Review"].str.strip()
df.head()

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date
0,0,To Kill a Mockingbird,gentle reminder that this is not the time to r...,"March 24, 2022"
1,1,To Kill a Mockingbird,6.0 stars. I know I am risking a serious “FILM...,"May 24, 2011"
2,2,To Kill a Mockingbird,Looking for a new book but don't want to commi...,"December 10, 2020"
3,3,To Kill a Mockingbird,"To Kill a Mockingbird, Harper LeeTo Kill a Moc...","July 1, 2022"
4,4,To Kill a Mockingbird,Why is it when I pick up To Kill A Mockingbir...,"October 25, 2009"


In [4]:
# Checking if there are any nulls
df.isnull().sum()

# There are 309 rows missing in the "Review" column

Unnamed: 0       0
Book             0
Review         309
Review Date      0
dtype: int64

In [5]:
# Remove rows where "Review" column is null and update df
df = df.dropna(subset=["Review"])

In [6]:
# Checking if all nulls were dropped
df.isnull().sum()

Unnamed: 0     0
Book           0
Review         0
Review Date    0
dtype: int64

In [7]:
def detect_language(text):
    try:
        # Try to detect language of the input text
        return detect(text)
    except:
        # If an error occurs, return 'xx'
        return 'xx'

# Create a full copy of the original df to safely work with it
df_with_lan = df.copy()

# Apply the detect_language function to each review
# and store the result in a new column called 'language'
df_with_lan['language'] = df_with_lan['Review'].apply(detect_language)

df_with_lan.head()

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date,language
0,0,To Kill a Mockingbird,gentle reminder that this is not the time to r...,"March 24, 2022",en
1,1,To Kill a Mockingbird,6.0 stars. I know I am risking a serious “FILM...,"May 24, 2011",en
2,2,To Kill a Mockingbird,Looking for a new book but don't want to commi...,"December 10, 2020",en
3,3,To Kill a Mockingbird,"To Kill a Mockingbird, Harper LeeTo Kill a Moc...","July 1, 2022",fa
4,4,To Kill a Mockingbird,Why is it when I pick up To Kill A Mockingbir...,"October 25, 2009",en


In [13]:
# pd.set_option('display.max_colwidth', None) 
# df_with_lan.head()

In [8]:
# Keep only the English reviews
df_with_lan = df_with_lan[df_with_lan['language'] == 'en']

# Reset Index
df_with_lan.reset_index(drop=True, inplace=True)

df_with_lan.head()

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date,language
0,0,To Kill a Mockingbird,gentle reminder that this is not the time to r...,"March 24, 2022",en
1,1,To Kill a Mockingbird,6.0 stars. I know I am risking a serious “FILM...,"May 24, 2011",en
2,2,To Kill a Mockingbird,Looking for a new book but don't want to commi...,"December 10, 2020",en
3,4,To Kill a Mockingbird,Why is it when I pick up To Kill A Mockingbir...,"October 25, 2009",en
4,5,To Kill a Mockingbird,I had a much longer review written for this bo...,"December 17, 2020",en


In [9]:
new_df = df_with_lan.drop(columns=['language'])
new_df.head(10)

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date
0,0,To Kill a Mockingbird,gentle reminder that this is not the time to r...,"March 24, 2022"
1,1,To Kill a Mockingbird,6.0 stars. I know I am risking a serious “FILM...,"May 24, 2011"
2,2,To Kill a Mockingbird,Looking for a new book but don't want to commi...,"December 10, 2020"
3,4,To Kill a Mockingbird,Why is it when I pick up To Kill A Mockingbir...,"October 25, 2009"
4,5,To Kill a Mockingbird,I had a much longer review written for this bo...,"December 17, 2020"
5,7,To Kill a Mockingbird,With endless books and infinitely more to be w...,"March 11, 2019"
6,8,To Kill a Mockingbird,While the plot was very gripping and well-writ...,"April 18, 2012"
7,9,To Kill a Mockingbird,"In the course of 5 years, I’ve read this book ...","May 4, 2015"
8,10,To Kill a Mockingbird,So... I don't really know what to say.I think ...,"November 12, 2015"
9,11,To Kill a Mockingbird,Beautiful book.,"October 20, 2016"


### Step 3: Analyze Sentiment Using Transformers

In [13]:
# Load pre-trained sentiment analysis model 
sentiment_analyzer = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")

Device set to use cpu


In [None]:
# Analyzing if we can drop rows above 512

# Create a new column with the character length of each review
new_df ['Review_Length'] = new_df['Review'].astype(str).apply(len)

# Checking reviews that are below, equal, or above 512
below_512 = (new_df['Review_Length'] < 512).sum()
equal_512 = (new_df['Review_Length'] == 512).sum()
above_512 = (new_df['Review_Length'] > 512).sum()

print(below_512)
print(equal_512)
print(above_512)

4626
10
18420


In [None]:
# Function to split and analyze each review in chunks
def analyze_full_review(review, chunk_size=512):
    chunks =  wrap(review, chunk_size)                  # split into chunks
    
    labels=[]
    for chunk in chunks:
        result = sentiment_analyzer(chunk, truncation=True)
        labels.append(result[0]['label'])
        print(labels)

    most_common = Counter(labels).most_common(1)[0][0]
    return most_common



['LABEL_1']
['LABEL_1', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1', 'LABEL_0']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_0']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1']
['LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1', 'LABEL_0', 'LABEL_0', 'LABEL_1', 'LABEL_1

In [11]:
polarities = []
for review in new_df['Review']:
    score = sentiment_analyzer(review)    
    polarities.append(score)  # gets the polarity score

# Stores the polarity score of each review in a new column
new_df['Polarity of Review'] = polarities

new_df.head()

RuntimeError: The expanded size of the tensor (1743) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 1743].  Tensor sizes: [1, 514]

In [None]:
pd.set_option('display.max_colwidth', None) 
new_df.head()

Unnamed: 0.1,Unnamed: 0,Book,Review,Review Date,Polarity of Review
23050,32070,In the Heart of the Country,"2.5 Stars. A dark, brutal story about the daughter of a South African farmer during colonial times. Magda, a spinster, has only ever had contact with her cruel Father and the African workers on their farm. Over time she being to lose her mind. I found it hard to distinguish between reality and fantasy in this story and found it really depressing. Thanks to Text Publishing for my paperback copy.","January 4, 2020",-0.8937
23051,32071,In the Heart of the Country,"4.5Really exceptional. Language that feels bereft of time. It is as if the nineteenth century stylist merged with the modernist technician and birthed a gory but very much alive insane baby. Coetzee luxuriates in the mind of a colonial woman on the brink of madness. Magda is on the verge of a nervous breakdown but, unlike the Almodovar film, in a deeply uncomfortable and noncomic fashion. She imagines murder, imagines the barren landscape of the South African countryside as a hellish space of epiphanies and pillagings, truths and deceptions, to the point that she becomes the voice of the endlessly unstable reality of white South Africa in the 70s. How does a group respond its forefathers' frightening dominations? Murder them? Imagine them dead? Take care of them to their dying day? How do people growing up in the colonizer's homestead relate to black South Africans? Bring in some psychosexual dynamics, and the book attempts to complexify it all. Thrilling, intense stuff.My only qualm is that there is a section during which Magda hears voices, and what they say to her are quotes from Robespierre, Simone Weil, Hegel, and Rousseau, but I really do not understand the necessity of such a section. It made Coetzee's pitch-perfect balance of abstract and concrete description tip more to the former side to a degree of opaqueness I could not begin to ascertain. I could understand the quotes, whether I had come across them before or not, but the meaning remains elusive.","February 21, 2022",-0.9861
23052,32075,In the Heart of the Country,"In the Heart of the Country is a staggeringly goddamn powerful novel. An espresso: short and dark and intense. And it'll keep you awake once you've finished it.I can't fault the quality of the writing. (Of course I can't: Coetzee is a brilliant writer.) But I would say: this is not his most ambitious novel. Why? Because it's all couched in the first person — in the (extreme, vibrant, crackling) voice of a character who is deeply troubled, mentally unstable. This has been done before (albeit not in this context, imbued with the racial tensions of colonial South Africa). And, as voices go, it is perhaps *slightly* easy. Because it is so extreme.Subtlety is harder. Normality — mundane, humdrum normality — is harder. And what I *really* admire (and what Coetzee gives us, incidentally, in a novel like Disgrace) is literature that illuminates — and I really mean *illuminates*: literature that sets a halo around the stuff of everyday humanity. Without ever having to resort to extreme subject matter.Because a great artist can make beauty and drama out of the most humble constituent parts.That said, Coetzee inhabits his narrator's hysterical voice with outstanding skill. He is very convincing indeed.Which means that this is an horrific novel — in its bleakness, its darkness. Sad, harrowing, terrifying.","September 4, 2011",0.9192
23053,32076,In the Heart of the Country,"I didn't review this one at the time but it's certainly stuck with me since. Having recently tackled a lot of McCarthy I've found myself coming back to Coetzee's desolate veld as a comparison to McCarthy's bleak landscapes. In another life I'd love to do a thesis on the textual violence in these two authors' environments. Coetzee presents such a sparse style here, really all interiority as far as narrative, giving the environment, by contrast, more ominous weight. Of course brutality has often been explored in terms of externalities - the elements, raw landscapes, forces of nature - so it's quite a feat to give so much space to an interior monologue, especially through the muddied voice of fantasy, in exploring malice and cruelty. It's not always clear where our narrator is in her own narrative, whether we are being asked to understand something that really happened or whether we are abetting a deceit. And of course the whole thing is fiction so does it matter anyway? Coetzee's great skill here and in much of his work is that he goes beyond the unreliable narrator and makes the text itself problematic (something he gets into more deeply in Foe). Definitely one of his strongest works, but also one of the most enjoyable to read.","February 15, 2012",-0.4376
23054,32077,In the Heart of the Country,One of the weirdest books I’ve ever read,"June 3, 2021",-0.2263
