<a href="https://colab.research.google.com/github/ummeamunira/NLP-LLM/blob/main/Text-classification/Text_Classification_Problem_in_the_Manufacturing_Industry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In the manufacturing industry, companies often receive various types of textual data from different sources such as maintenance logs, quality control reports, customer feedback, and incident reports. Automatically categorizing these documents into predefined categories can help streamline operations, improve maintenance strategies, enhance product quality, and quickly address customer concerns.

**Goal:**

Develop a text classifier to automatically categorize manufacturing-related documents into categories such as "Maintenance", "Quality Control", "Customer Feedback", and "Incident Report".

**Data Collection**

In [None]:
import pandas as pd

# Example dataset
data = {
    'document': [
        "Routine maintenance completed on assembly line.",
        "Quality control check identified defective units.",
        "Customer feedback: The new model has improved performance.",
        "Incident report: Minor injury due to equipment malfunction.",
        "Scheduled maintenance for conveyor belt system.",
        "Quality control: Issues found in the welding process.",
        "Customer feedback: Delays in shipment were unacceptable.",
        "Incident report: Equipment breakdown caused a production halt.",
        "Maintenance required for the cooling system.",
        "Quality control check passed with no issues found."
    ],
    'category': [
        "Maintenance", "Quality Control", "Customer Feedback", "Incident Report",
        "Maintenance", "Quality Control", "Customer Feedback", "Incident Report",
        "Maintenance", "Quality Control"
    ]
}

df = pd.DataFrame(data)


**Data Preprocessing**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df['document'], df['category'], test_size=0.2, random_state=42)

# Vectorize the text data using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)


In [None]:
X_train

5    Quality control: Issues found in the welding p...
0      Routine maintenance completed on assembly line.
7    Incident report: Equipment breakdown caused a ...
2    Customer feedback: The new model has improved ...
9    Quality control check passed with no issues fo...
4      Scheduled maintenance for conveyor belt system.
3    Incident report: Minor injury due to equipment...
6    Customer feedback: Delays in shipment were una...
Name: document, dtype: object

In [None]:
X_train_tfidf

<8x34 sparse matrix of type '<class 'numpy.float64'>'
	with 43 stored elements in Compressed Sparse Row format>

 **Model Training**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report

# Create and train the model using a pipeline
model = Pipeline([
    ('vectorizer', TfidfVectorizer(stop_words='english')),
    ('classifier', LogisticRegression(random_state=42))
])

model.fit(X_train, y_train)


**Evaluation**

In [None]:
# Predict the categories of the test set
y_pred = model.predict(X_test)

# Print the classification report
print(classification_report(y_test, y_pred))


                 precision    recall  f1-score   support

    Maintenance       1.00      1.00      1.00         1
Quality Control       1.00      1.00      1.00         1

       accuracy                           1.00         2
      macro avg       1.00      1.00      1.00         2
   weighted avg       1.00      1.00      1.00         2



To deploy the model, we can create an API using Flask:



In [None]:
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Save the trained model
joblib.dump(model, 'text_classifier_model.pkl')

# Load the model
model = joblib.load('text_classifier_model.pkl')

@app.route('/classify', methods=['POST'])
def classify():
    data = request.get_json(force=True)
    document = data['document']

    # Predict the category
    category = model.predict([document])[0]

    return jsonify({'category': category})

if __name__ == '__main__':
    app.run(debug=True)


To use the classifier, send a POST request to the /classify endpoint with a document to be classified:

In [None]:
curl -X POST -H "Content-Type: application/json" -d '{"document": "Routine maintenance completed on assembly line."}' http://127.0.0.1:5000/classify
