# Model Infernce

# Introduction
___
___
Nama    : Vicky Eldora Wuisan

---
---

> Problem Statement :   

In today's world, people can easily express their opinions on social media. Therefore, as a company operating in the social media sector, specifically Twitter, we want to identify and classify tweets posted by Twitter users into three categories: neutral, positive, and negative. We aim to do this because it is observed that some people misuse the platform to tweet hateful content __[source](https://news.detik.com/berita/d-2830824/duel-di-depan-istora-senayan-karena-twitwar-ini-klarifikasi-redinparis)__. Twitter is trying to tackle this problem. Hence, we will attempt to create a robust NLP-based classification model to distinguish negative tweets and block such tweets, and if necessary, block the accounts that post these negative tweets.

> Objective :   

The main objective of this project is to develop an NLP-based classification model aimed at predicting neutral, negative, and positive tweets based on the given dataset. The method used will be a Recurrent Neural Network (RNN). Model evaluation will be conducted using the accuracy metric to determine whether the model is a good fit, underfit, or overfit, which will then facilitate its application for predictive purposes.

# Import Libraries

In [1]:
# Library Load Model
import pandas as pd
import numpy as np

# import pickle
from tensorflow.keras.models import load_model
import tensorflow as tf
import tensorflow_hub as tf_hub

# Library Pre-Processing
import nltk
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
from nltk.stem import WordNetLemmatizer
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from gensim.models import Word2Vec

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")




# Load model

In [2]:
# Load the Models
model = load_model('model_improve')





# Define preprocess function

In [3]:
# Additional Stopwords
additional_stopwords = ['to', 'I','the','a','my','and','i', 'you', 'is', 'for', 'in', 'of',
 'it', 'on', 'have', 'that', 'me', 'so', 'with', 'be', 'but',
 'at', 'was', 'just', 'I`m', 'not', 'get', 'all', 'this', 'are',
 'out', 'like', 'day', '-', 'up', 'go', 'your', 'good', 'got', 'from',
 'do', 'going', 'no', 'now', 'love', 'work', '****', 'will', 'about',
 'one', 'really', 'it`s', 'u', 'don`t', 'some', 'know', 'see', 'can',
 'too', 'had', 'am', 'back', '&', 'time', 'what', 'its', 'want', 'we',
 'new', 'as', 'im', 'think', 'can`t', '2', 'if', 'when', 'an', 'more',
 'still', 'today', 'miss', 'has', 'they', 'much', 'there', 'last',
 'need', 'My', 'how', 'been', 'home', 'lol', 'off', 'Just', 'feel',
 'night', 'i`m', 'her', 'would', 'The']

# Setting stopwords english
stpwds_eng = list(set(stopwords.words('english')))
for i in additional_stopwords:
    stpwds_eng.append(i)

In [4]:
# Membuat fungsi cleaning
cleaning_pattern = "@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+"
lemmatizer = WordNetLemmatizer()
stpwds_eng = list(set(stopwords.words('english')))

# build text cleaning function
def text_proses(text):

    # Mengubah text ke Lowercase
    text = text.lower()

    # Menghilangkan mention, link, dan karakter non-alfanumerik
    text = re.sub(cleaning_pattern, ' ', text)

    # Menghilangkan Mention
    text = re.sub("@[A-Za-z0-9_]+", " ", text)

    # Menghilangkan Hashtag
    text = re.sub("#[A-Za-z0-9_]+", " ", text)

    # Menghilangkan \n (newline)
    text = re.sub(r"\\n", " ",text)

    # Menghilangkan kata dibawah 3 character
    text = re.sub(r'\b\w{1,3}\b', " ",text)

    # URL removal
    text = re.sub(r"http\S+", " ", text)
    text = re.sub(r"www.\S+", " ", text)

    # Menghilangkan Whitespace di awal dan akhir
    text = text.strip()

    # Non-letter removal (such as emoticon, symbol (like μ, $, 兀), etc
    text = re.sub("[^A-Za-z\s']", " ", text)

    # Menghilangkan double space
    text = re.sub("\s\s+" , " ", text)

    # Melakukan Tokenisasi
    tokens = word_tokenize(text)

    # Menghilangkan Stopwords
    text = ' '.join([word for word in tokens if word not in stpwds_eng])

    # Melakukan Lemmatizer
    text = lemmatizer.lemmatize(text)

    return text

# Inference data

In [5]:
# Create New Data 

data_inf = {
    'text' : '''
hello, what's up brother? today i go to supermarket.
    '''}

data_inf = pd.DataFrame([data_inf])

# show new data
data_inf

Unnamed: 0,text
0,"\nhello, what's up brother? today i go to supe..."


# Preprocess inference

In [6]:
data_inf['text'] = data_inf['text'].apply(lambda x: text_proses(x))
data_inf

Unnamed: 0,text
0,hello brother today supermarket


# Making predictions

In [7]:
# Predict using ANN
y_pred_inf = model.predict(data_inf)
y_pred_inf = np.argmax(y_pred_inf)
if y_pred_inf == 0:
    print(f'That is negative Tweet')
elif y_pred_inf == 1:
    print(f'That is neutral Tweet')
elif y_pred_inf == 2:
    print(f'That is positive Tweet')

That is positive Tweet
