### Domain Introduction:
NLP on social media analyzes user-generated content for insights on behavior, sentiments, and trends, informing marketing strategies and brand management.

### Problem Introduction:
Social media customer support entails addressing customer concerns and issues via platforms like Twitter, Facebook, Instagram, and LinkedIn. Teams monitor channels, engage with customers, and offer timely solutions, fostering loyalty and enhancing brand reputation.

###Importing data

In [None]:
import pandas as pd
df = pd.read_csv('/content/tweets.csv')


In [None]:
df.sample(10)

Unnamed: 0,Tweet,Output
163,Why is my Samsung TV 📺 remote not responding? ...,1
389,My Samsung phone 📱 keeps restarting! @SamsungS...,1
211,Why is my Samsung TV 📺 screen black? @SamsungS...,1
284,Experience the thrill of Samsung Odyssey G5 mo...,0
210,Thinking of upgrading to the new Samsung Galax...,0
338,My Samsung phone 📱 keeps overheating! @Samsung...,1
80,My Samsung phone 📱 is not receiving calls! @Sa...,1
114,My Samsung tablet 📱 is not charging! @SamsungS...,1
317,Why is my Samsung microwave 🍔 not heating up? ...,1
213,My Samsung tablet 📱 is not turning on! @Samsun...,1


### Problems in the data handling/Problems in the data processing:
- *Volume*
- *Variety*
- *Noise*
- *Imbalance*


### Pre-processing Steps:
- *Cleaning*
- *Stop word removal*
- *Punctuation Removal*
- *Emoji Removal*

###Text cleaning

In [None]:
import nltk
nltk.download('stopwords')
import string
from nltk.corpus import stopwords

def text_process(mess):
    STOPWORDS = stopwords.words('english') + ['u', 'ü', 'ur', '4', '2', 'im', 'dont', 'doin', 'ure']

    # Check characters to see if they are in punctuation
    nopunc = [char for char in mess if char not in string.punctuation]

    # Join the characters again to form the string.
    nopunc = ''.join(nopunc)

    # Now just remove any stopwords
    return ' '.join([word for word in nopunc.split() if word.lower() not in STOPWORDS])

df['clean_msg'] = df.Tweet.apply(text_process)

df

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Unnamed: 0,Tweet,Output,clean_msg
0,Why is my Samsung refrigerator 🍎 not cooling p...,1,Samsung refrigerator 🍎 cooling properly Samsun...
1,Just pre-ordered the new Samsung Galaxy Book! ...,0,preordered new Samsung Galaxy Book 📚🚀 Samsung
2,"Hey @SamsungSupport, my Samsung tablet 📱 keeps...",1,Hey SamsungSupport Samsung tablet 📱 keeps cras...
3,Experience the thrill of Samsung Gear VR! 🕶️🎮 ...,0,Experience thrill Samsung Gear VR 🕶️🎮 Samsung
4,My Samsung dishwasher 🍽️ is leaking! @SamsungS...,1,Samsung dishwasher 🍽️ leaking SamsungSupport 😞
...,...,...,...
464,Experience the thrill of Samsung Odyssey G5 mo...,0,Experience thrill Samsung Odyssey G5 monitor 🎮...
465,My Samsung tablet 📱 is not updating! @SamsungS...,1,Samsung tablet 📱 updating SamsungSupport 😞
466,Just bought the new Samsung Galaxy S21 Ultra! ...,0,bought new Samsung Galaxy S21 Ultra 📱🚀 Samsung
467,Why is my Samsung microwave 🍔 not heating up? ...,1,Samsung microwave 🍔 heating SamsungSupport 😣


###Handling emojis

In [None]:
import pandas as pd
import re
# Function to remove emojis from text
def remove_emojis(text):
    emoji_pattern = re.compile("["
                               u"\U0001F600-\U0001F64F"  # emoticons
                               u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                               u"\U0001F680-\U0001F6FF"  # transport & map symbols
                               u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                               u"\U00002500-\U00002BEF"  # chinese char
                               u"\U00002702-\U000027B0"
                               u"\U00002702-\U000027B0"
                               u"\U000024C2-\U0001F251"
                               u"\U0001f926-\U0001f937"
                               u"\U00010000-\U0010ffff"
                               u"\u2640-\u2642"
                               u"\u2600-\u2B55"
                               u"\u200d"
                               u"\u23cf"
                               u"\u23e9"
                               u"\u231a"
                               u"\ufe0f"  # dingbats
                               u"\u3030"
                               "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', text)

# Remove emojis from 'Text' column
df['clean_msg'] = df['clean_msg'].apply(remove_emojis)

df

Unnamed: 0,Tweet,Output,clean_msg
0,Why is my Samsung refrigerator 🍎 not cooling p...,1,Samsung refrigerator cooling properly Samsung...
1,Just pre-ordered the new Samsung Galaxy Book! ...,0,preordered new Samsung Galaxy Book Samsung
2,"Hey @SamsungSupport, my Samsung tablet 📱 keeps...",1,Hey SamsungSupport Samsung tablet keeps crash...
3,Experience the thrill of Samsung Gear VR! 🕶️🎮 ...,0,Experience thrill Samsung Gear VR Samsung
4,My Samsung dishwasher 🍽️ is leaking! @SamsungS...,1,Samsung dishwasher leaking SamsungSupport
...,...,...,...
464,Experience the thrill of Samsung Odyssey G5 mo...,0,Experience thrill Samsung Odyssey G5 monitor ...
465,My Samsung tablet 📱 is not updating! @SamsungS...,1,Samsung tablet updating SamsungSupport
466,Just bought the new Samsung Galaxy S21 Ultra! ...,0,bought new Samsung Galaxy S21 Ultra Samsung
467,Why is my Samsung microwave 🍔 not heating up? ...,1,Samsung microwave heating SamsungSupport


In [None]:
df=df[['clean_msg','Output']]
df

Unnamed: 0,clean_msg,Output
0,Samsung refrigerator cooling properly Samsung...,1
1,preordered new Samsung Galaxy Book Samsung,0
2,Hey SamsungSupport Samsung tablet keeps crash...,1
3,Experience thrill Samsung Gear VR Samsung,0
4,Samsung dishwasher leaking SamsungSupport,1
...,...,...
464,Experience thrill Samsung Odyssey G5 monitor ...,0
465,Samsung tablet updating SamsungSupport,1
466,bought new Samsung Galaxy S21 Ultra Samsung,0
467,Samsung microwave heating SamsungSupport,1


In [None]:
df['Output'].value_counts()

Output
0    268
1    201
Name: count, dtype: int64

###TF-IDF :Representing the importance of words in documents.

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the 'Text' column to obtain TF-IDF representation
tfidf_matrix = tfidf_vectorizer.fit_transform(df['clean_msg'])

# Convert TF-IDF matrix to DataFrame
tfidf_df = pd.DataFrame(tfidf_matrix.toarray(), columns=tfidf_vectorizer.get_feature_names_out())

# Print TF-IDF DataFrame
tfidf_df['Output']=df['Output']
tfidf_df


Unnamed: 0,8k,a52,a7,a71,active,battery,black,blurry,book,boot,...,us,vr,washing,watch,webinar,week,wifi,wont,working,Output
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.584776,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.595802,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
464,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
466,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
467,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


In [None]:
tfidf_df.shape

(469, 142)

In [None]:
X=tfidf_df.drop('Output',axis=1)
Y=tfidf_df['Output']

In [None]:
X

Unnamed: 0,8k,a52,a7,a71,active,battery,black,blurry,book,boot,...,upgrading,us,vr,washing,watch,webinar,week,wifi,wont,working
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.584776,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.595802,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
464,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
466,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
467,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
Y

0      1
1      0
2      1
3      0
4      1
      ..
464    0
465    1
466    0
467    1
468    0
Name: Output, Length: 469, dtype: int64

### Modelling and Evaluation:
Various models can be applied to address social media customer support tasks:
- **Logistic Regression**: Suitable for binary classification tasks.
- **Naive Bayes**: Effective for text classification tasks, especially with a large number of features.
- **Recurrent Neural Networks (RNN)**: Designed for sequential data, RNNs excel in capturing temporal patterns and context, making them perfect for tasks like language modeling and time series analysis.
- **Convolutional Neural Networks (CNN)**: Effective for processing sequential data such as text, capturing local patterns.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report,accuracy_score


# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)


# Initialize and train Logistic Regression model
logreg_model = LogisticRegression()
logreg_model.fit(X_train, y_train)

# Predictions
y_pred_logreg = logreg_model.predict(X_test)

# Evaluation
print("Logistic Regression Results:")
print(accuracy_score(y_test, y_pred_logreg))


Logistic Regression Results:
1.0


In [None]:
# Preprocess the new sentence
print("Enter text:")
user_sentence=input()
# Vectorize the preprocessed sentence using the TF-IDF vectorizer trained on the training data
sentence_tfidf = tfidf_vectorizer.transform([user_sentence])

# Predict with Logistic Regression model
prediction_logreg = logreg_model.predict(sentence_tfidf)

# Predict with Naive Bayes model
print("Logistic Regression Prediction:", prediction_logreg)



Enter text:
join telegram for ipl predicitons
Logistic Regression Prediction: [0]


In [None]:
from sklearn.naive_bayes import MultinomialNB

# Initialize and train Naive Bayes model
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# Predictions
y_pred_nb = nb_model.predict(X_test)

# Evaluation
print("Naive Bayes Results:")
print(accuracy_score(y_test, y_pred_nb))


Naive Bayes Results:
0.9893617021276596


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df['clean_msg'])
X_seq = tokenizer.texts_to_sequences(df['clean_msg'])

# Pad sequences
maxlen = 100  # Adjust according to your data
X_pad = pad_sequences(X_seq, maxlen=maxlen)

# Define model
cnn_model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=100, input_length=maxlen),
    Conv1D(128, 5, activation='relu'),
    GlobalMaxPooling1D(),
    Dense(1, activation='sigmoid')
])

# Compile model
cnn_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_pad, df['Output'], test_size=0.2, random_state=42)

# Train model
cnn_model.fit(X_train, y_train, epochs=5, batch_size=32)

# Evaluate model
print("CNN Results:")
cnn_loss, cnn_accuracy = cnn_model.evaluate(X_test, y_test)
print(f"Loss: {cnn_loss}, Accuracy: {cnn_accuracy}")


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
CNN Results:
Loss: 0.008285418152809143, Accuracy: 1.0


In [None]:
from tensorflow.keras.layers import LSTM

# Define model
rnn_model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=100, input_length=maxlen),
    LSTM(128),
    Dense(1, activation='sigmoid')
])

# Compile model
rnn_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train model
rnn_model.fit(X_train, y_train, epochs=5, batch_size=32)

# Evaluate model
print("RNN Results:")
rnn_loss, rnn_accuracy = rnn_model.evaluate(X_test, y_test)
print(f"Loss: {rnn_loss}, Accuracy: {rnn_accuracy}")


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
RNN Results:
Loss: 0.01830017752945423, Accuracy: 0.9893617033958435


**Conclusion**:
Overall, all the models performed exceptionally well in addressing different aspects of social media customer support tasks. Logistic Regression, Naive Bayes, CNN, and RNN each demonstrated strengths in various areas, suggesting that a combination of these models could provide a comprehensive solution for efficiently handling and processing large volumes of social media data to deliver timely and effective customer support responses.

### Applications:
The applications of social media customer support extend across various industries, including:
- *E-commerce*: Handling customer inquiries, complaints, and product support.
- *Finance*: Addressing customer queries related to banking, transactions, and account management.
- *Telecommunications*: Providing support for service disruptions, billing inquiries, and account management.
- *Hospitality*: Managing guest feedback, reservations, and service-related issues.
