<a href="https://colab.research.google.com/github/tosindaykeay/Data-Augmentation-with-Python/blob/main/Text_Moderator_AppV1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step 1: Install Required Libraries

We'll use the following libraries for our project:

nltk for natural language processing
scikit-learn for machine learning
pandas for data manipulation

In [21]:
!pip install -U nltk scikit-learn pandas




Step 2 We need to download some NLTK data to use for text processing. Run the following code:

In [6]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')


[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


True

Step 3: Create a Sample Dataset

In [3]:
# Positive reviews
positive_reviews = [
    "I loved the product!",
    "This is amazing!",
    "The customer service was great!"
]

# Negative reviews
negative_reviews = [
    "I'm disappointed with the product.",
    "It's okay, I guess.",
    "The customer service was terrible."
]


Step 4: Preprocess Text Data

In [7]:
import nltk
from nltk.tokenize import word_tokenize

# Function to preprocess text
def preprocess_text(text):
    tokens = word_tokenize(text)
    tokens = [t for t in tokens if t.isalpha()]
    return ' '.join(tokens)

# Preprocess positive reviews
positive_preprocessed = [preprocess_text(review) for review in positive_reviews]

# Preprocess negative reviews
negative_preprocessed = [preprocess_text(review) for review in negative_reviews]


Step 5: Create a Text Moderation Model e.g sentiment analysis model to classify text as either positive or negative

In [9]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

# Function to calculate sentiment score
def calculate_sentiment(text):
    sia = SentimentIntensityAnalyzer()
    return sia.polarity_scores(text)

# Calculate sentiment scores for positive reviews
positive_sentiments = [calculate_sentiment(review) for review in positive_preprocessed]

# Calculate sentiment scores for negative reviews
negative_sentiments = [calculate_sentiment(review) for review in negative_preprocessed]


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


Step 6: Create a Classification Model, using sklearn to predict whether its positive or negative

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

# Create a TF-IDF vectorizer
vectorizer = TfidfVectorizer()

# Fit the vectorizer to our text data
X_train = vectorizer.fit_transform(positive_preprocessed + negative_preprocessed)
y_train = [0] * len(positive_reviews) + [1] * len(negative_reviews)

# Split our data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Create a Naive Bayes classifier
clf = MultinomialNB()

# Train the classifier on our training data
clf.fit(X_train, y_train)


Step 7: Test Our Model

In [22]:
# New text sample to classify
new_text = "I'm hate this product!"

# Preprocess the new text
new_text_preprocessed = preprocess_text(new_text)

# Calculate the sentiment score for the new text
new_sentiment_score = calculate_sentiment(new_text_preprocessed)

# Predict whether the new text is positive or negative
prediction = clf.predict(vectorizer.transform([new_text_preprocessed]))[0]

print("Prediction:", "Positive" if prediction == 1 else "Negative")
print("Sentiment score:", new_sentiment_score)


Prediction: Positive
Sentiment score: {'neg': 0.649, 'neu': 0.351, 'pos': 0.0, 'compound': -0.5719}
