# Sentiment Analysis & Text Classification

Polarity - Gives the emotion of the text 

    1. (-1,0) - Negative Sentiment 
    
    2. (0,1) - Positive Sentiment

Subjectivity - Level of objectivity or personalisaton & Opinion
 
    1. 1 - Very Much Subjective - Too Personalised and Opinionated
    
    2. 0 - Very Factual - Not Personalised 

In [1]:
# Initialising the Text Blob 

from textblob import TextBlob

In [2]:
text_1 = 'Vinod is very happy today'

In [3]:
blob_1 = TextBlob(text_1)

In [4]:
blob_1.sentiment

Sentiment(polarity=1.0, subjectivity=1.0)

In [5]:
text_2 = 'The Movie was not on expected line. I did not enjoy the film at all. It was a waste of time'

In [6]:
blob_2 = TextBlob(text_2)

In [7]:
blob_2.sentiment

Sentiment(polarity=-0.16666666666666666, subjectivity=0.3)

In [8]:
text_3 = 'The Sun is going to set at 6pm'

In [9]:
blob_3 = TextBlob(text_3)

In [10]:
blob_3.sentiment

Sentiment(polarity=0.0, subjectivity=0.0)

In [11]:
# So text_3 is a neutral statement and subjectivity wise it is a factual Statement 

In [15]:
text = '''Ever since Finance Minister Nirmala Sitharaman presented her fifth and last full budget of the Modi government’s second term on February 1, 2023, every aspect of the budget has been analysed thread-bare by stakeholders, experts, and members of the commentariat. This author, himself, starting with the analysis of the economic survey, analysed the budget story through his framework of a five-part series.
Make no mistake- along with the finance minister’s budget speech and demands for grants, the outcome budget (OB) forms the trinity of the most important budget document and still, only a few have decided to deep dive into it. And the reasons are as follows-
The outcome budget, in its current avatar, is of recent origin and has not yet received the proper attention of the commentariat.
It is a complex, confusing and lengthy document that is difficult to decode. The FY24 outcome budget is 280 pages long, while FY23 one was 297 pages long.
Unlike demands of grants which provides actuals of previous, revised estimate of current year and the budget estimate for next year, the outcome budget is a standalone document that just enumerates the budgeted output and outcome targets for selected schemes under ministries and departments. It throws no light on what was the target of the previous year, nor does it throw light on past years’ achievements against the target.
Advertisement. It is natural then that the outcome budget does not get the attention of the analysts it deserves. And this in itself was reason enough for this member of the commentariat to deep dive into the outcome budget of FY24 and to arrive at meaningful insight to also examine the outcome budget of FY23 critically. And here goes my analysis. As two years’ outcome budget documents are so voluminous, this analysis covers only outcome budget of two Ministries with an attempt to grasp whether the budgetary outlay for FY24 is properly aligned with monitorable outputs and outcomes or whether the outcome budget has turned into a standalone document falling short on its basic premise, losing its rigour and seriousness.
The key aspect that I seek to address is whether output, outcome and key milestones to be achieved in the financial year are synced seamlessly and explained synchronously because, in its absence, the budget outlays remain what they are— simple annual expenditure targets defeating the very purpose of having an outcome budget.
THE OUTCOME BUDGET
For the uninitiated, I begin with a primer on- what is the outcome budget.
Till recently, before FY2017-18, as part of the Union budget, only the financial outlays of schemes of various ministries were part of the budget document while the expected outputs and outcomes of schemes were prepared and presented separately by each ministry (initiated by P. Chidambaram as Finance Minister in FY2008.)'''

In [16]:
blob_4 = TextBlob(text)

In [17]:
blob_4.sentiment

Sentiment(polarity=0.009873188405797106, subjectivity=0.4061335403726708)

# Text Classification

In [18]:
import numpy as np
import pandas as pd 
from sklearn.datasets import fetch_20newsgroups

In [19]:
train = fetch_20newsgroups(subset='train')

In [20]:
test = fetch_20newsgroups(subset='test')

In [21]:
train

{'data': ["From: lerxst@wam.umd.edu (where's my thing)\nSubject: WHAT car is this!?\nNntp-Posting-Host: rac3.wam.umd.edu\nOrganization: University of Maryland, College Park\nLines: 15\n\n I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n   ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n",
  "From: guykuo@carson.u.washington.edu (Guy Kuo)\nSubject: SI Clock Poll - Final Call\nSummary: Final call for SI clock reports\nKeywords: SI,acceleration,clock,upgrade\nArticle-I.D.: shelley.1qvfo9INNc3s\nOrganization: University of Washingto

In [22]:
test

{'data': ['From: v064mb9k@ubvmsd.cc.buffalo.edu (NEIL B. GANDLER)\nSubject: Need info on 88-89 Bonneville\nOrganization: University at Buffalo\nLines: 10\nNews-Software: VAX/VMS VNEWS 1.41\nNntp-Posting-Host: ubvmsd.cc.buffalo.edu\n\n\n I am a little confused on all of the models of the 88-89 bonnevilles.\nI have heard of the LE SE LSE SSE SSEI. Could someone tell me the\ndifferences are far as features or performance. I am also curious to\nknow what the book value is for prefereably the 89 model. And how much\nless than book value can you usually get them for. In other words how\nmuch are they in demand this time of year. I have heard that the mid-spring\nearly summer is the best time to buy.\n\n\t\t\tNeil Gandler\n',
  'From: Rick Miller <rick@ee.uwm.edu>\nSubject: X-Face?\nOrganization: Just me.\nLines: 17\nDistribution: world\nNNTP-Posting-Host: 129.89.2.33\nSummary: Go ahead... swamp me.  <EEP!>\n\nI\'m not familiar at all with the format of these "X-Face:" thingies, but\nafter se

In [23]:
train.keys()

dict_keys(['data', 'filenames', 'target_names', 'target', 'DESCR'])

In [24]:
test.keys()

dict_keys(['data', 'filenames', 'target_names', 'target', 'DESCR'])

In [25]:
train['target_names']

['alt.atheism',
 'comp.graphics',
 'comp.os.ms-windows.misc',
 'comp.sys.ibm.pc.hardware',
 'comp.sys.mac.hardware',
 'comp.windows.x',
 'misc.forsale',
 'rec.autos',
 'rec.motorcycles',
 'rec.sport.baseball',
 'rec.sport.hockey',
 'sci.crypt',
 'sci.electronics',
 'sci.med',
 'sci.space',
 'soc.religion.christian',
 'talk.politics.guns',
 'talk.politics.mideast',
 'talk.politics.misc',
 'talk.religion.misc']

In [26]:
np.unique(train['target'])

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [27]:
print(train['data'][1])

From: guykuo@carson.u.washington.edu (Guy Kuo)
Subject: SI Clock Poll - Final Call
Summary: Final call for SI clock reports
Keywords: SI,acceleration,clock,upgrade
Article-I.D.: shelley.1qvfo9INNc3s
Organization: University of Washington
Lines: 11
NNTP-Posting-Host: carson.u.washington.edu

A fair number of brave souls who upgraded their SI clock oscillator have
shared their experiences for this poll. Please send a brief message detailing
your experiences with the procedure. Top speed attained, CPU rated speed,
add on cards and adapters, heat sinks, hour of usage per day, floppy disk
functionality with 800 and 1.4 m floppies are especially requested.

I will be summarizing in the next two days, so please add to the network
knowledge base if you have done the clock upgrade and haven't answered this
poll. Thanks.

Guy Kuo <guykuo@u.washington.edu>



In [28]:
print(train['data'][10])

From: irwin@cmptrc.lonestar.org (Irwin Arnstein)
Subject: Re: Recommendation on Duc
Summary: What's it worth?
Distribution: usa
Expires: Sat, 1 May 1993 05:00:00 GMT
Organization: CompuTrac Inc., Richardson TX
Keywords: Ducati, GTS, How much? 
Lines: 13

I have a line on a Ducati 900GTS 1978 model with 17k on the clock.  Runs
very well, paint is the bronze/brown/orange faded out, leaks a bit of oil
and pops out of 1st with hard accel.  The shop will fix trans and oil 
leak.  They sold the bike to the 1 and only owner.  They want $3495, and
I am thinking more like $3K.  Any opinions out there?  Please email me.
Thanks.  It would be a nice stable mate to the Beemer.  Then I'll get
a jap bike and call myself Axis Motors!

-- 
-----------------------------------------------------------------------
"Tuba" (Irwin)      "I honk therefore I am"     CompuTrac-Richardson,Tx
irwin@cmptrc.lonestar.org    DoD #0826          (R75/6)
-------------------------------------------------------------------

In [29]:
print(train['data'][1000])

From: dabl2@nlm.nih.gov (Don A.B. Lindbergh)
Subject: Diamond SS24X, Win 3.1, Mouse cursor
Organization: National Library of Medicine
Lines: 10


Anybody seen mouse cursor distortion running the Diamond 1024x768x256 driver?
Sorry, don't know the version of the driver (no indication in the menus) but it's a recently
delivered Gateway system.  Am going to try the latest drivers from Diamond BBS but wondered
if anyone else had seen this.

post or email

--Don Lindbergh
dabl2@lhc.nlm.nih.gov



In [30]:
test.keys()

dict_keys(['data', 'filenames', 'target_names', 'target', 'DESCR'])

# Building a Text Classification Model 

In [31]:
# Feature Set 

train['data']

["From: lerxst@wam.umd.edu (where's my thing)\nSubject: WHAT car is this!?\nNntp-Posting-Host: rac3.wam.umd.edu\nOrganization: University of Maryland, College Park\nLines: 15\n\n I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n   ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n",
 "From: guykuo@carson.u.washington.edu (Guy Kuo)\nSubject: SI Clock Poll - Final Call\nSummary: Final call for SI clock reports\nKeywords: SI,acceleration,clock,upgrade\nArticle-I.D.: shelley.1qvfo9INNc3s\nOrganization: University of Washington\nLines: 

In [32]:
# Target 

train['target']

array([7, 4, 4, ..., 3, 1, 8])

In [33]:
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB

In [34]:
from sklearn.pipeline import make_pipeline

In [35]:
mnb = make_pipeline(TfidfVectorizer(), MultinomialNB())

In [36]:
## Training

mnb.fit(train['data'], train['target'])

Pipeline(steps=[('tfidfvectorizer', TfidfVectorizer()),
                ('multinomialnb', MultinomialNB())])

In [37]:
pred = mnb.predict(test['data'])

In [38]:
pred

array([ 7, 11,  0, ...,  9,  3, 15])

# Evaluating the Performance

In [40]:
from sklearn.metrics import classification_report, confusion_matrix

In [41]:
report = classification_report(test['target'], pred)
cm = confusion_matrix(test['target'], pred)

print('The report:\n', report)
print('\n\n')
print('The confusion matrix:\n', cm)

The report:
               precision    recall  f1-score   support

           0       0.80      0.52      0.63       319
           1       0.81      0.65      0.72       389
           2       0.82      0.65      0.73       394
           3       0.67      0.78      0.72       392
           4       0.86      0.77      0.81       385
           5       0.89      0.75      0.82       395
           6       0.93      0.69      0.80       390
           7       0.85      0.92      0.88       396
           8       0.94      0.93      0.93       398
           9       0.92      0.90      0.91       397
          10       0.89      0.97      0.93       399
          11       0.59      0.97      0.74       396
          12       0.84      0.60      0.70       393
          13       0.92      0.74      0.82       396
          14       0.84      0.89      0.87       394
          15       0.44      0.98      0.61       398
          16       0.64      0.94      0.76       364
          17  

# Showing the text topic

In [49]:
def predict_news_group(doc):
    group_pred = mnb.predict([doc])
    return test['target_names'][group_pred[0]]

In [50]:
text = 'Nowadays, there is a lot of mixing of politics and religion. We need to relook at the impact of this. Will it help the politics in the long run?'

In [51]:
predict_news_group(text)

'soc.religion.christian'

In [52]:
text_2 = 'Sports is more of an entertainment these days'

In [53]:
predict_news_group(text_2)

'rec.autos'

In [54]:
text_3 = 'Windows came early was difficult to understand. As there was release of Windows, it became more user friendly'

In [55]:
predict_news_group(text_3)

'comp.os.ms-windows.misc'