# AI has many sub-fields
## Natural Language Processing (NLP) deals with sentiment analysis
- We will look at `VADER`

## Machine Learning (ML) deals with predictive analysis 
- We will look at `Linear Regression`

In [3]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [5]:
sa = SentimentIntensityAnalyzer()

In [7]:
sa.polarity_scores("I don't like Monday.")

{'neg': 0.413, 'neu': 0.587, 'pos': 0.0, 'compound': -0.2755}

## When we run the sentiment analyzer, we got 3 values
1. `neg` stands for negative
2. `neu` stands for neutral
3. `pos` stands for positive
4. `compound` is a unidimensional single value that give you an overview of the overall sentiment for the text. It's values range from -1 to 1 where values close to -1 indicate a sad tone and values close to 1 indicate a positive/happy tone. 

In [12]:
import pandas as pd

In [22]:
df = pd.read_csv("Cleaned_Inaugural_Speeches copy.csv")

In [25]:
df

Unnamed: 0,Name,Inaugural Address,Date,text
0,George Washington,First Inaugural Address,"Thursday, April 30, 1789",Fellow-Citizens of the Senate and of the House...
1,George Washington,Second Inaugural Address,"Monday, March 4, 1793",Fellow Citizens: I AM again called upon by the...
2,John Adams,Inaugural Address,"Saturday, March 4, 1797","WHEN it was first perceived, in early times, t..."
3,Thomas Jefferson,First Inaugural Address,"Wednesday, March 4, 1801",Friends and Fellow-Citizens: CALLED upon to un...
4,Thomas Jefferson,Second Inaugural Address,"Monday, March 4, 1805","PROCEEDING, fellow-citizens, to that qualifica..."
5,James Madison,First Inaugural Address,"Saturday, March 4, 1809",UNWILLING to depart from examples of the most ...
6,James Madison,Second Inaugural Address,"Thursday, March 4, 1813",ABOUT to add the solemnity of an oath to the o...
7,James Monroe,First Inaugural Address,"Tuesday, March 4, 1817",I SHOULD be destitute of feeling if I was not ...
8,James Monroe,Second Inaugural Address,"Monday, March 5, 1821",Fellow-Citizens: I SHALL not attempt to descri...
9,John Quincy Adams,Inaugural Address,"Friday, March 4, 1825",IN compliance with an usage coeval with the ex...


In [27]:
text_of_first_row = df ['text'][1]

In [29]:
text_of_first_row

'Fellow Citizens: I AM again called upon by the voice of my country to execute the functions of its Chief Magistrate. When the occasion proper for it shall arrive, I shall endeavor to express the high sense I entertain of this distinguished honor, and of the confidence which has been reposed in me by the people of united America. Previous to the execution of any official act of the President the Constitution requires an oath of office. This oath I am now about to take, and in your presence: That if it shall be found during my administration of the Government I have in any instance violated willingly or knowingly the injunctions thereof, I may (besides incurring constitutional punishment) be subject to the upbraidings of all who are now witnesses of the present solemn ceremony.'

In [31]:
sa.polarity_scores(text_of_first_row)

{'neg': 0.054, 'neu': 0.868, 'pos': 0.079, 'compound': 0.5719}

In [33]:
sa.polarity_scores(df['text'][25])

{'neg': 0.077, 'neu': 0.712, 'pos': 0.211, 'compound': 1.0}

In [45]:
# We want to make a for loop to look at all texts for the dataset 
#
for index, row in df.iterrows():
    sa_for_row = sa.polarity_scores(row['text'])
    print(f"The Sentiment analysis for row{index + 1} is {sa_for_row}")

The Sentiment analysis for row1 is {'neg': 0.046, 'neu': 0.719, 'pos': 0.235, 'compound': 0.9999}
The Sentiment analysis for row2 is {'neg': 0.054, 'neu': 0.868, 'pos': 0.079, 'compound': 0.5719}
The Sentiment analysis for row3 is {'neg': 0.044, 'neu': 0.697, 'pos': 0.259, 'compound': 1.0}
The Sentiment analysis for row4 is {'neg': 0.076, 'neu': 0.701, 'pos': 0.223, 'compound': 0.9999}
The Sentiment analysis for row5 is {'neg': 0.059, 'neu': 0.771, 'pos': 0.169, 'compound': 0.9998}
The Sentiment analysis for row6 is {'neg': 0.048, 'neu': 0.779, 'pos': 0.173, 'compound': 0.9991}
The Sentiment analysis for row7 is {'neg': 0.127, 'neu': 0.726, 'pos': 0.147, 'compound': 0.9737}
The Sentiment analysis for row8 is {'neg': 0.054, 'neu': 0.731, 'pos': 0.215, 'compound': 1.0}
The Sentiment analysis for row9 is {'neg': 0.061, 'neu': 0.749, 'pos': 0.19, 'compound': 1.0}
The Sentiment analysis for row10 is {'neg': 0.055, 'neu': 0.745, 'pos': 0.2, 'compound': 0.9999}
The Sentiment analysis for row1

In [47]:
len(df)

58

In [49]:
df['text'][19]

'Fellow-Countrymen: AT this second appearing to take the oath of the Presidential office there is less occasion for an extended address than there was at the first. Then a statement somewhat in detail of a course to be pursued seemed fitting and proper. Now, at the expiration of four years, during which public declarations have been constantly called forth on every point and phase of the great contest which still absorbs the attention and engrosses the energies of the nation, little that is new could be presented. The progress of our arms, upon which all else chiefly depends, is as well known to the public as to myself, and it is, I trust, reasonably satisfactory and encouraging to all. With high hope for the future, no prediction in regard to it is ventured. On the occasion corresponding to this four years ago all thoughts were anxiously directed to an impending civil war. All dreaded it, all sought to avert it. While the inaugural address was being delivered from this place, devoted 

In [51]:
sa.polarity_scores('war')

{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5994}

### We just demonstrated how to sentiment anlaysis on a complete dataset using a form of AI.
The compound scores give us more clue/insights on the dataset 
`VADER` is good at finding the tone of social media data. 

## Linear Regression (LR)
Its a simple algorithm for machine learning. Its used to do predictive analysis on data.
There are two types:
- Simple LR: y_pred = ax + b
- Multiple LR: y_pred = a1x1 + a2x2 + ... + b

### Simple Linear Regression
- 1  Independant Varibale (what informs the value for the dependant variable)
- 1 Dependant Variable (what we are trying to predict)

### Multiple Linear Regression
- More than 1 Independant Variable
- 1 Dependant Variable

In [63]:
from sklearn.linear_model import LinearRegression

In [65]:
lm = LinearRegression()

In [68]:
df_2 = pd.read_csv("Boston House Prices copy.csv")

In [70]:
df_2.sample(10)

Unnamed: 0,Rooms,Distance,Value
429,6.38,1.9682,9.5
271,6.24,4.429,25.2
32,5.95,3.99,13.2
383,5.52,1.5331,12.3
19,5.727,3.7965,18.2
7,6.172,5.9505,27.1
276,7.267,4.7872,33.2
291,7.148,5.1167,37.3
263,7.327,2.0788,31.0
443,6.485,1.9784,15.4


### Describing the data...
1. `Rooms` is the average no. of rooms in a house
2. `Distance` is the average distance of the house from the top 3 employment centers
3. `Value` is the value of the house in $1,000s

In [73]:
df_2.shape

(506, 3)

In [75]:
# Independant variables
# Remember to put two squares brackets before and after
x = df_2 [['Rooms']]

# Dependant varibale
y = df_2['Value'] 

In [77]:
lm.fit(x, y)

In [81]:
# to get a in y_pred = ax +b:
a = lm.coef_

In [83]:
# to get b in y_pred = ax +b:
b = lm.intercept_

In [85]:
a

array([9.10210898])

In [87]:
b

-34.67062077643851

## My Equation is y_pred = 9.1x - 34.67

In [92]:
y_pred = 9.1 * df_2['Rooms'] - 34.67

In [98]:
lm.predict([[5.790]])



array([18.03059022])

### The approximation was not exactly accurate. How can we measure this?
We use `r squared`

In [101]:
lm.score(x, y)

0.48352545599133423

In this case, `Rooms` informs 48.35% of the variability of `Value`. Lets see a case for multiple linear regression. Does it improve the relationship?

In [105]:
# Independant varibale 
x_2 = df_2[['Rooms', 'Distance']]

# Dependant varibale
y_2 = df_2['Value']

In [107]:
lm_2 = LinearRegression()

In [109]:
lm_2.fit(x_2, y_2)